mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 18:28:24 +00:00

* Add the Mixtral model. * Add more of the mixtral layers. * Add the final layers for mixtral. * Sketch the expert selection. * Add some expert routing logic. * Hopefully finish the routing logic for mixtral. * Add the mixtral example. * Fix the weight filenames. * Bugfix. * Another fix. * Yet another fix + remove the unused pragma. * Shape fix. * Support for quantized mixtral. * Support mixtral in the quantized example. * Mlp or moe type. * Fix the expert field namings. * Refactor the mlp bit. * More MoE logic. * Add the MoE quantized logic. * Fix the experts length.
candle-quantized-llama: Fast Inference of quantized LLaMA models
This example provides a quantized LLaMA model similar to llama.cpp. This is based on candle built-in quantization methods. Supported features include:
- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support.
- SIMD optimizations on Apple Silicon and x86.
- Support using the
gguf
andggml
file formats.
The weights are automatically downloaded for you from the HuggingFace
Hub on the first run. There are various command line
flags to use local files instead, run with --help
to learn about them.
Running some example.
cargo run --example quantized --release -- --prompt "The best thing about coding in rust is "
> avx: true, neon: false, simd128: false, f16c: true
> temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64
> loaded 291 tensors (3.79GB) in 2.17s
> params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 }
> The best thing about coding in rust is 1.) that I don’t need to worry about memory leaks, 2.) speed and 3.) my program will compile even on old machines.
Command-line flags
Run with --help
to see all options.
--which
: specify the model to use, e.g.7b
,13-chat
,7b-code
.--prompt interactive
: interactive mode where multiple prompts can be entered.--model mymodelfile.gguf
: use a local model file rather than getting one from the hub.