candle

huggingface/candle

Fork 0

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Commit Graph

Author	SHA1	Message	Date
Snake	485ddf2996	Fixed Quantized Qwen3 Model (#2951 ) * optimize KV cache to reduce GPU memory usage * revert to using candle_nn::kv_cache::KvCache with initial capacity of 512	2025-05-13 05:53:42 +02:00
Lucien Thomas	3d05f5cf3d	Qwen3 quantized implementation (#2939 ) * fixed quantized_phi3 implementation * quantized_qwen3 implementation * Update quantized_phi3.rs * Update quantized_phi3.rs * add quantized_qwen3 example * Clippy fixes. * Cleanup. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-05-08 15:06:10 +02:00

Author

SHA1

Message

Date

Snake

485ddf2996

Fixed Quantized Qwen3 Model (#2951 )

* optimize KV cache to reduce GPU memory usage

* revert to using candle_nn::kv_cache::KvCache with initial capacity of 512

2025-05-13 05:53:42 +02:00

Lucien Thomas

3d05f5cf3d

Qwen3 quantized implementation (#2939 )

* fixed quantized_phi3 implementation

* quantized_qwen3 implementation

* Update quantized_phi3.rs

* Update quantized_phi3.rs

* add quantized_qwen3 example

* Clippy fixes.

* Cleanup.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>

2025-05-08 15:06:10 +02:00

2 Commits