Quantized version of mistral. (#1009)

* Quantized version of mistral.

* Integrate the quantized mistral variant.

* Use the quantized weight files.

* Tweak the quantization command.

* Fix the dtype when computing the rotary embeddings.

* Update the readme with the quantized version.

* Fix the decoding of the remaining tokens.
This commit is contained in:
Laurent Mazare
2023-09-30 19:25:47 +02:00
committed by GitHub
parent 06207332bc
commit deee7612da
7 changed files with 507 additions and 37 deletions

View File

@ -7,6 +7,7 @@ pub mod llama;
pub mod mistral;
pub mod mixformer;
pub mod quantized_llama;
pub mod quantized_mistral;
pub mod quantized_mixformer;
pub mod quantized_t5;
pub mod segment_anything;