6 Commits

Author SHA1 Message Date
deee7612da Quantized version of mistral. (#1009)
* Quantized version of mistral.

* Integrate the quantized mistral variant.

* Use the quantized weight files.

* Tweak the quantization command.

* Fix the dtype when computing the rotary embeddings.

* Update the readme with the quantized version.

* Fix the decoding of the remaining tokens.
2023-09-30 18:25:47 +01:00
06207332bc Streaming mode for reporting the generated tokens (#1007)
* Token streaming.

* Use the token output stream.

* Flush the output.

* Ensure that the last characters get reported.
2023-09-30 15:04:11 +01:00
4021272875 Use flash-attn for mistral. (#1004) 2023-09-30 12:15:10 +01:00
87e3a4e175 Mistral: exit on eos token. (#1001)
* Mistral: exit on eos token.

* Print the proper stats.

* Also add a short flag.
2023-09-30 07:07:06 +01:00
6f17ef82be Mistral: print the generated text. (#992) 2023-09-29 10:56:11 +01:00
ada8851a23 Add the mistral example. (#984)
* Add the mistral example.

* Use the two model files.

* Adjust the dtype.

* Tweak the weight paths.

* Remove the end of text token.

* Get the mistral model to generate some text.
2023-09-28 16:19:18 +01:00