mirror of
https://github.com/huggingface/candle.git
synced 2025-06-17 11:08:52 +00:00

* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.
* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.