candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 02:16:37 +00:00

Author	SHA1	Message	Date
Kyle Birnbaum	6ff0a6999c	Fixed Gemma3 model and example (#2917 ) * gemma3: changed RotaryEmbedding base freq based on layer and sliding window * Changed attention mask per layer, either normal or sliding * made attention mask creation slightly more efficient by only creating them once per model iteration * changed is_sliding to an Option * clippy * changed to stop on both <eos> and <end_of_turn> instead of either or	2025-04-25 05:35:08 +02:00
André Cipriani Bandarra	cbf5fc80c2	Add Gemma 3 1b IT toe Gemma examples (#2809 ) - Updates the Gemma example to include Gemma 3 1b instruction tuned.	2025-03-16 17:00:48 +01:00
Laurent Mazare	111edbc4ea	Gemma 3 initial setup (text only). (#2802 ) * Gemma 3 initial setup (text only). * Use the rotating kv cache for the sliding window.	2025-03-14 07:56:02 +01:00
Laurent Mazare	c1b9e07e35	Add support for gemma-2. (#2425 ) * Add gemma-2. * Support a couple more models. * Sliding window support. * Example + readme updates. * Update the main readme.	2024-08-17 20:31:23 +02:00
Laurent Mazare	7ebc3548e1	Use flash-attn in gemma. (#2195 ) * Use flash-attn in gemma. * Fix flash-attn for head dim 256.	2024-05-18 19:18:59 +02:00
Laurent Mazare	a0460cd2b1	Add the code-gemma models. (#2038 ) * Add the code-gemma models. * Tweak to the gemma config.	2024-04-10 21:19:21 +02:00
Laurent Mazare	33c9b66554	Add the new gemma models. (#2023 ) * Add the new gemma models. * Revert the lightning changes. * Support for the 1.1 models.	2024-04-06 21:25:38 +02:00
Laurent Mazare	21f1d04976	Add the instruction finetuned gemma variants. (#1790 )	2024-03-02 18:56:59 +01:00
Laurent Mazare	8d04f70f4d	Fix the eos token for gemma. (#1753 )	2024-02-24 11:07:02 +01:00
Laurent Mazare	45d5322d62	Add the Gemma models. (#1741 ) * Add the Gemma models. * Add the gemma example. * Adapt the RmsNorm. * Get the 2b model to work. * 7b support. * Use the config head dim. * Yet another fix. * Make the matrixes contiguous. * Also get the 7b model to work. * And add to the readme.	2024-02-21 22:02:50 +01:00

10 Commits