Support for MQA for llama v2. (#205)

* Support for MQA for llama v2.

* More llama-v2.

* Move the rotary embedding precomputation in the cache.

* Add a v2 flag.

* Use the hf model.
This commit is contained in:
Laurent Mazare
2023-07-20 07:39:04 +02:00
committed by GitHub
parent c34f932319
commit 12d6dc018d
3 changed files with 123 additions and 110 deletions

View File

@ -13,7 +13,7 @@ let c = a.matmul(&b)?;
Check out our [examples](./candle-examples/examples/):
- [Whisper](./candle-examples/examples/whisper/)
- [Llama](./candle-examples/examples/llama/)
- [Llama and Llama-v2](./candle-examples/examples/llama/)
- [Bert](./candle-examples/examples/bert/) (Useful for sentence embeddings)
- [Falcon](./candle-examples/examples/falcon/)