mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 18:28:24 +00:00

* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.
* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.