Quantized version of mistral. (#1009)

* Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.
2025-06-17 11:08:52 +00:00 · 2023-09-30 19:25:47 +02:00
parent 06207332bc
commit deee7612da
7 changed files with 507 additions and 37 deletions
--- a/candle-transformers/src/models/mod.rs
+++ b/candle-transformers/src/models/mod.rs
@ -7,6 +7,7 @@ pub mod llama;
 pub mod mistral;
 pub mod mixformer;
 pub mod quantized_llama;
+pub mod quantized_mistral;
 pub mod quantized_mixformer;
 pub mod quantized_t5;
 pub mod segment_anything;