Add a quantized version of recurrent-gemma. (#2054)

* Add a quantized version of recurrent-gemma. * Share the rglru part. * Get the quantized gemma model to work.
2025-06-16 02:38:10 +00:00 · 2024-04-13 20:07:01 +02:00
parent 4c88c3ce06
commit 50e49ecc5f
6 changed files with 521 additions and 67 deletions
--- a/README.md
+++ b/README.md
@ -63,8 +63,9 @@ We also provide a some command line based examples using state of the art models
 - [LLaMA and LLaMA-v2](./candle-examples/examples/llama/): general LLM, includes
  the SOLAR-10.7B variant.
 - [Falcon](./candle-examples/examples/falcon/): general LLM.
- [Gemma](./candle-examples/examples/gemma/): 2b and 7b general LLMs from Google
-  Deepmind.
+- [Gemma](./candle-examples/examples/gemma/): 2b and 7b general LLMs from Google Deepmind.
+- [RecurrentGemma](./candle-examples/examples/recurrent-gemma/): 2b and 7b
+  Griffin based models from Google that mix attention with a RNN like state.
 - [Phi-1, Phi-1.5, and Phi-2](./candle-examples/examples/phi/): 1.3b and 2.7b general LLMs with performance on par with LLaMA-v2 7b.
 - [StableLM-3B-4E1T](./candle-examples/examples/stable-lm/): a 3b general LLM
  pre-trained on 1T tokens of English and code datasets. Also supports