Support for MQA for llama v2. (#205)

* Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.
2025-06-16 10:38:54 +00:00 · 2023-07-20 07:39:04 +02:00
parent c34f932319
commit 12d6dc018d
3 changed files with 123 additions and 110 deletions
--- a/README.md
+++ b/README.md
@ -13,7 +13,7 @@ let c = a.matmul(&b)?;
 Check out our [examples](./candle-examples/examples/):

 - [Whisper](./candle-examples/examples/whisper/)
- [Llama](./candle-examples/examples/llama/)
+- [Llama and Llama-v2](./candle-examples/examples/llama/)
 - [Bert](./candle-examples/examples/bert/) (Useful for sentence embeddings)
 - [Falcon](./candle-examples/examples/falcon/)