mirror of
https://github.com/huggingface/candle.git
synced 2025-06-19 19:58:35 +00:00
Quantized version of mistral. (#1009)
* Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.
This commit is contained in:
@ -6,6 +6,9 @@ as of 2023-09-28. Weights (and the original Python model code) are released unde
|
||||
- [Blog post](https://mistral.ai/news/announcing-mistral-7b/) from Mistral announcing the model release.
|
||||
- [Model card](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the
|
||||
HuggingFace Hub.
|
||||
This example supports the initial model as well as a quantized variant.
|
||||
|
||||
## Running the example
|
||||
|
||||
```bash
|
||||
$ cargo run --example mistral --release --features cuda -- --prompt 'Write helloworld code in Rust' --sample-len 150
|
||||
@ -38,3 +41,50 @@ fn main() {
|
||||
|
||||
This example is released under the terms
|
||||
```
|
||||
|
||||
## Running the quantized version of the model
|
||||
|
||||
```bash
|
||||
$ cargo run --example mistral --features accelerate --release -- \
|
||||
$ --prompt "Here is a sample quick sort implementation in rust " --quantized -n 400
|
||||
avx: false, neon: true, simd128: false, f16c: false
|
||||
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
|
||||
retrieved the files in 562.292µs
|
||||
loaded the model in 1.100323667s
|
||||
Here is a sample quick sort implementation in rust
|
||||
|
||||
``rust
|
||||
fn quick_sort(arr: &mut [i32]) {
|
||||
if arr.len() <= 1 {
|
||||
return;
|
||||
}
|
||||
|
||||
let pivot = arr[0];
|
||||
let mut left = vec![];
|
||||
let mut right = vec![];
|
||||
|
||||
for i in 1..arr.len() {
|
||||
if arr[i] < pivot {
|
||||
left.push(arr[i]);
|
||||
} else {
|
||||
right.push(arr[i]);
|
||||
}
|
||||
}
|
||||
|
||||
quick_sort(&mut left);
|
||||
quick_sort(&mut right);
|
||||
|
||||
let mut i = 0;
|
||||
for _ in &left {
|
||||
arr[i] = left.pop().unwrap();
|
||||
i += 1;
|
||||
}
|
||||
|
||||
for _ in &right {
|
||||
arr[i] = right.pop().unwrap();
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
``
|
||||
226 tokens generated (10.91 token/s)
|
||||
```
|
||||
|
Reference in New Issue
Block a user