mirror of
https://github.com/huggingface/candle.git
synced 2025-06-16 18:48:51 +00:00

* Support for MQA for llama v2. * More llama-v2. * Move the rotary embedding precomputation in the cache. * Add a v2 flag. * Use the hf model.