Qwen3 quantized implementation (#2939)

* fixed quantized_phi3 implementation * quantized_qwen3 implementation * Update quantized_phi3.rs * Update quantized_phi3.rs * add quantized_qwen3 example * Clippy fixes. * Cleanup. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-06-20 20:09:50 +00:00 · 2025-05-08 08:06:10 -05:00
parent 637473cb5e
commit 3d05f5cf3d
5 changed files with 755 additions and 1 deletions
--- a/candle-examples/examples/quantized-qwen3/README.md
+++ b/candle-examples/examples/quantized-qwen3/README.md
@ -0,0 +1,11 @@
+# candle-quantized-qwen3
+
+[Qwen3]((https://qwenlm.github.io/blog/qwen3/)) is an upgraded version of Qwen2.5, released by Alibaba Cloud.
+
+## Running the example
+
+```bash
+cargo run --example quantized-qwen3 --release -- --prompt "Write a function to count prime numbers up to N."
+```
+
+0.6b is used by default, 1.7b, 4b, 8b, 14b, and 32b models are available via `--model` argument.