candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 11:56:45 +00:00

Author	SHA1	Message	Date
zachcp	386fd8abb4	Module Docs (#2624 ) * update whisper * update llama2c * update t5 * update phi and t5 * add a blip model * qlamma doc * add two new docs * add docs and emoji * additional models * openclip * pixtral * edits on the model docs * update yu * update a fe wmore models * add persimmon * add model-level doc * names * update module doc * links in heira * remove empty URL * update more hyperlinks * updated hyperlinks * more links * Update mod.rs --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>	2024-11-18 14:19:23 +01:00
zachcp	f689ce5d39	Documentation Pass for Models (#2617 ) * links in chinese_clip * links for clip model * add mod docs for flux and llava * module doc for MMDIT and MIMI * add docs for a few more modesl * mod docs for bert naser and beit * add module docs for convmixer colpali codegeex and chatglm * add another series of moddocs * add fastvit-llama2_c * module docs mamba -> mobileone * module docs from moondream-phi3 * mod docs for quantized and qwen * update to yi * fix long names * Update llama2_c.rs * Update llama2_c_weights.rs * Fix the link for mimi + tweaks --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>	2024-11-15 08:30:15 +01:00
ilookee	fdc2622686	fix: qwen2 lm_head loading #2443 (#2445 ) Co-authored-by: Yi Xu <xuyi@me.com>	2024-08-23 16:50:02 +02:00
Laurent Mazare	54ff971e35	Support for the new Qwen2 models. (#2257 ) * Support for the new Qwen2 models. * Add more models.	2024-06-07 10:51:50 +01:00
Yin Guobing	349c3e806a	Support embedding model gte-Qwen1.5-7B-instruct (#2190 ) * Support embedding model gte-Qwen1.5-7B-instruct This is a text embedding model based on Qwen2. They share same model architecture except the last MLP module. This commit brings in minimal modification of the old Qwen2 implementation to support both models. An example is provided, and had been verified according to the official PyTorch implementation. * Avoid doing the 'last-token filtering' based on the absence of attention mask. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-05-16 21:34:10 +02:00
Laurent Mazare	96a48e5cc4	Add argsort. (#2132 ) * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.	2024-04-27 20:17:35 +02:00
Laurent Mazare	3ad4770eb6	Use cat for faster MQA computation. (#2043 ) * Use cat for faster MQA computation. * Move the function to utils + use it in mistral. * Use the shared repeat-kv in a few more models. * Fix.	2024-04-12 09:15:10 +02:00
Jani Monoses	90fc82211f	Use a common with_tracing::RmsNorm in a few models. (#1871 ) * Add RmsNorm with tracing. * Use with_tracing::RmsNorm in some models.	2024-03-18 21:40:06 +01:00
Laurent Mazare	5657e596cd	Add the Qwen2 model (#1684 ) * Initial check-in for the qwen2 model. * More qwen2 inference. * Polish the qwen example. * Fix the rope basis. * Get the inference to work. * Support different model sizes.	2024-02-09 15:02:49 +01:00

9 Commits