candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-14 09:57:10 +00:00

Author	SHA1	Message	Date
Kyle Birnbaum	0224a749f0	Add Qwen3 MoE (#2934 ) * qwen-moe rebase * lint * fixed rebase error * swapped normal MoE model with CausalMoE Model in example, and swapped the tie word embeddings if statement * updated readme	2025-05-31 15:33:28 +02:00
Kyle Birnbaum	1fdfb58de5	Updating `Add qwen3` (PR 2903) to use HF weights (#2930 ) * add Qwen3.rs * fixed compile error * attempting to gett pr 2903 working with qwen weights * different qwen variants working * added moe model * clippy * added additional eos token * translated Korean comments to English as well as I can * removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm * replaced custom repeat_kv implementation with candle's repeat_kv implementation * replace linear with linear_b in attention initalization * replaced custom custom kv_cache implementation with candle kv_cache * style * replaced explicit broadcast add with normal add in decoder layer * removed keeping the Rotary embedding layer in the model struct * used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM * removed duplicate code from qwen3_moe * removed sliding window from qwen3 attention * removed MoE code * removed unused option * Fixed Typo Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com> * fixed tie word embeddings to use the correct embedding weights instead of the opposite --------- Co-authored-by: Max <naturale@hufs.ac.kr> Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>	2025-05-02 06:05:53 +02:00
Laurent Mazare	54ff971e35	Support for the new Qwen2 models. (#2257 ) * Support for the new Qwen2 models. * Add more models.	2024-06-07 10:51:50 +01:00
Yin Guobing	349c3e806a	Support embedding model gte-Qwen1.5-7B-instruct (#2190 ) * Support embedding model gte-Qwen1.5-7B-instruct This is a text embedding model based on Qwen2. They share same model architecture except the last MLP module. This commit brings in minimal modification of the old Qwen2 implementation to support both models. An example is provided, and had been verified according to the official PyTorch implementation. * Avoid doing the 'last-token filtering' based on the absence of attention mask. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-05-16 21:34:10 +02:00
Laurent Mazare	eb1b27abcd	Readme fix. (#1961 )	2024-03-28 23:24:46 +01:00
Laurent Mazare	708e422456	Qwen MoE model. (#1960 ) * Qwen MoE model. * Add the MoE model to the example. * Fix the scaling. * Readme updates. * Readme tweaks.	2024-03-28 23:10:57 +01:00
Nicolas Patry	74497e6bf7	Fixing the qwen tokenizer location. (#1693 ) Using the chatglm one causes a bug where the "<\|endoftext\|>" is not found.	2024-02-11 08:52:36 +01:00
Laurent Mazare	1c8d61f051	ChatGLM custom tokenizer. (#1687 )	2024-02-10 10:47:04 +01:00
Laurent Mazare	40ce16001b	Use the proper endoftext token for gwen. (#1685 )	2024-02-09 17:02:03 +01:00
Laurent Mazare	5657e596cd	Add the Qwen2 model (#1684 ) * Initial check-in for the qwen2 model. * More qwen2 inference. * Polish the qwen example. * Fix the rope basis. * Get the inference to work. * Support different model sizes.	2024-02-09 15:02:49 +01:00

10 Commits