2385 Commits

Author SHA1 Message Date
17313a4226 Fix cuda memory error for Qwen3 non-quantized (#2987)
* Update KvCache initialization in Qwen3 model to use a fixed max position embedding value of 512

* add doc
2025-06-07 16:02:58 +02:00
0224a749f0 Add Qwen3 MoE (#2934)
* qwen-moe rebase

* lint

* fixed rebase error

* swapped normal MoE model with CausalMoE Model in example, and swapped the tie word embeddings if statement

* updated readme
2025-05-31 15:33:28 +02:00
cd7b877d6b candle-onnx: Implement Trilu and ScatterND ops (#2952)
* onnx attention

* setup an example, adding and fixing onnx ops bit by bit

* model working, output is garbage data

* trilu working

* close but not quite, Issues still with scatterND

* closer but the outputs are still slightly wrong

* added tests for trilu and scatterND

* lint

* readme

* clippy

* removed unnessisary comments

* changed device selection, took hyperparameters from model config
2025-05-30 07:36:09 +02:00
5aed817f1b feat: enhance linear algebra operations (#2972)
- Add `dot()` for vector/matrix products
- Implement the `Frobenius` norm
- Add `mv()` for matrix-vector multiply
2025-05-29 09:41:01 +02:00
1a183c988a Add fine-tuned text classifier to xlm roberta example (#2969) 2025-05-28 06:17:07 +02:00
cac51fe16a (hotfix) fix the doc test for indexer (#2970) 2025-05-28 06:13:26 +02:00
61ddb9535e Use a tanh activation in the xlm-roberta classification head. (#2968) 2025-05-26 08:54:31 +02:00
9a62c91643 Proper support for phi-4 (#2960)
* Add phi-4 support.

* Long-rope support.

* Get clippy to be happy.:
2025-05-21 10:18:33 +02:00
92106c8762 Fixes for clippy 1.87. (#2956) 2025-05-15 21:50:27 +02:00
9ce4fe6194 Fix docs quantized qwen3 (#2955)
* fixed docs quantized-qwen3 README

* fixed docs quantized-qwen2-instruct README
2025-05-15 07:58:03 +02:00
450a49ed1a Olmo 2 model (#2954)
* OLMo 2 model

* Update olmo-2 to example

* Clippy fix.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2025-05-14 19:18:02 +02:00
6bd61727bc Make tensor contiguous before the repeat_kv calls to avoid strided copies (#2953) 2025-05-14 10:47:28 +02:00
485ddf2996 Fixed Quantized Qwen3 Model (#2951)
* optimize KV cache to reduce GPU memory usage

* revert to using candle_nn::kv_cache::KvCache with initial capacity of 512
2025-05-13 05:53:42 +02:00
36508a2c93 Add Resize to onnx ops (#2946)
* added resize to candle-onnx, not currently working

* changed unreachable to bail, and bailed when both scales and sizes are set

* cleanup and added other unused options for this op

* cleanup

* fixed image loading to make output work

* cleanup and removed unused variables

* removed path path creation code, and changed unwrap to ?
2025-05-10 07:05:03 +02:00
3d05f5cf3d Qwen3 quantized implementation (#2939)
* fixed quantized_phi3 implementation

* quantized_qwen3 implementation

* Update quantized_phi3.rs

* Update quantized_phi3.rs

* add quantized_qwen3 example

* Clippy fixes.

* Cleanup.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-05-08 15:06:10 +02:00
637473cb5e Bump cudarc to 0.16.3. (#2942) 2025-05-04 09:14:28 +02:00
e27b4700ad Indexing with max-value results in zero/no-op. (#2940)
* Indexing with max-value results in zero/no-op.

* Add some testing.

* Also adapt the metal kernels.

* Another test.

* Fix.
2025-05-03 11:36:31 +02:00
1fdfb58de5 Updating Add qwen3 (PR 2903) to use HF weights (#2930)
* add Qwen3.rs

* fixed compile error

* attempting to gett pr 2903 working with qwen weights

* different qwen variants working

* added moe model

* clippy

* added additional eos token

* translated Korean comments to English as well as I can

* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm

* replaced custom repeat_kv implementation with candle's repeat_kv implementation

* replace linear with linear_b in attention initalization

* replaced custom custom kv_cache implementation with candle kv_cache

* style

* replaced explicit broadcast add with normal add in decoder layer

* removed keeping the Rotary embedding layer in the model struct

* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM

* removed duplicate code from qwen3_moe

* removed sliding window from qwen3 attention

* removed MoE code

* removed unused option

* Fixed Typo

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

* fixed tie word embeddings to use the correct embedding weights instead of the opposite

---------

Co-authored-by: Max <naturale@hufs.ac.kr>
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
2025-05-02 06:05:53 +02:00
cd96fa80da Add a scattered kv cache. (#2936)
* Add a scattered kv cache.

* Update some comments.
0.9.1
2025-05-01 10:20:48 +02:00
8a19bb7df2 Bump the candle version to 0.9.1. (#2935) 2025-05-01 10:08:16 +02:00
38fc86621c Add support for Helium-v1. (#2932) 2025-04-30 19:38:44 +02:00
5029ac52bb Added tracing page to the candle book. (#2922)
* tracing page

* warned about asynchronous execution

* cleanup

* added Nsignt Systems recommendation
2025-04-29 21:35:36 +02:00
de23d34a28 Switch Tensor::full to return a contiguous tensor. (#2929) 2025-04-28 21:36:39 +02:00
d4bac37a61 Fix the gumbel softmax by casting to f32. (#2928) 2025-04-28 19:48:51 +02:00
e98754fc5a Optimize Tensor::new when called on nested Vec<..>. (#2927)
* Optimize Tensor::new when called on nested Vec<..>.

* Improve performance.

* Similar flattening for the 4d case.

* More tweaks.

* Add some dummy test.
2025-04-28 09:19:45 +02:00
e3db30021f Support for "unbatched" rope. (#2926)
* Support for (un)-batched rope.

* Use 3d rope in the rope/ropei/rope_thd functions.

* Get the CPU versions to work.

* Fix the cuda version.

* Adapt the metal side.

* Fix the metal tests.
2025-04-27 15:12:02 +02:00
6e0646c208 Remove redundant mlx gemm dtype check (#2925) 2025-04-27 06:14:57 +02:00
fbaf0b0e32 Bump the crate version to 0.9.0. (#2924) 0.9.0 2025-04-26 11:01:21 +02:00
a2e925462c Add the scatter in place ops. (#2923)
* Add the scatter_set op.

* Metal op.

* Cuda version.

* Merge the checks.

* Add the actual ops.
2025-04-26 07:36:49 +02:00
3827685524 Add the scatter op. (#2921)
* Add the scatter op.

* Backprop support.

* Cuda support.
2025-04-25 21:46:58 +02:00
3aeb9575c7 Fixed Quantized Gemma3 Model and example (#2918)
* removed scale factor from computation and made quantized gemma3 work similarly to non-quantized gemma3

* created default consts, replaced is_sliding with Option holding a window_size
2025-04-25 05:47:48 +02:00
6ff0a6999c Fixed Gemma3 model and example (#2917)
* gemma3: changed RotaryEmbedding base freq based on layer and sliding window

* Changed attention mask per layer, either normal or sliding

* made attention mask creation slightly more efficient by only creating them once per model iteration

* changed is_sliding to an Option

* clippy

* changed to stop on both <eos> and <end_of_turn> instead of either or
2025-04-25 05:35:08 +02:00
82def7ae38 Cudarc update. (#2915) 2025-04-23 07:03:26 +02:00
99bd69f383 fixed quantized-gemma example (#2914)
* fixed quantized-gemma example

* lint
2025-04-23 05:39:03 +02:00
a4c56a958e Add the const-set op. (#2910)
* Add the const-set op.

* Cuda implementation.

* Bugfix.

* Metal cleanup.

* Add the metal kernels.

* Add some testing.

* Finish the metal implementation.

* Bump the version.
2025-04-19 10:07:02 +02:00
b2904a830b implemented quantized-gemma3 (#2902)
* implemented quantized-gemma, inference not working

* Fixed a few modeling bugs: outputing the correct tokens for a few iterations then garbage

* lint

* clippy

* quantized-gemma3 example working

* added readme

* clippy
2025-04-19 07:46:41 +02:00
21055b5697 Add PRelu operation (#2904)
* Add PRelu operation

* Apply rustfmt.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2025-04-19 07:24:10 +02:00
9dbaf958dc Add an enum for scalar values. (#2909)
* Add a scalar enum type.

* Add a bit more to the scalar type.

* Small tweak.

* More scalar usage.
2025-04-18 22:13:38 +02:00
ce5f8dd129 Check the bounds in the cuda indexing kernels. (#2908)
* Check the bounds in the cuda indexing kernels.

* Another check.
2025-04-18 20:08:17 +02:00
9954981327 Allow from_vec/from_slice to use a ShapeWithOneHole as shape. (#2905) 2025-04-17 08:59:18 +02:00
7f0f83a7c1 Rotating kv cache positions (#2901)
* Retrieve the current positions for rotating KV caches.

* Add the function to the kv cache too.

* More testing.
0.9.0-alpha.4
2025-04-15 23:09:26 +02:00
76e565c4ab Updated candle-book: Introduction, Installation, MNIST guide, and added CONTRIBUTING.md (#2897)
* added CONTRIBUTING.md to candle-book

* added description to candle-book introduction

* Updated formatting and added different features to candle-book installation

* mnist guide first draft candle-book

* updated mnist guide syntax and grammar for candle-book

* changed HelloWorld - Mnist to Tutorial - Mnist in SUMMARY.md

* updated intro to mnist guide in candle-book
2025-04-15 21:41:10 +02:00
e4e7b0b2da Use cudarc 0.16. (#2900)
* Use cudarc 0.16.

* Allow for disabling event tracking.

* Tweaks.

* Bump the ug version.

* And bump the candle version too.
2025-04-15 21:40:18 +02:00
b01ebbad8a Use cudarc 0.15.2. (#2896) 2025-04-14 20:47:52 +02:00
1d1d6d4fe6 Bump the crate version. (#2895) 0.9.0-alpha.3 2025-04-14 15:52:11 +02:00
2653002f29 Gumbel-Softmax sampling. (#2894)
* Gumbel-Softmax sampling.

* Add a sampling test.

* Share the gumbel-softmax bits.
2025-04-14 15:42:42 +02:00
a52b76ae82 Expose the cudnn algo in the conv ops. (#2892)
* Set the algo.

* Expose the cudnn preferred algo for conv ops.
2025-04-14 08:25:32 +02:00
fb660b8d43 Add a cudnn feature to candle-nn/candle-transformers. (#2890) 0.9.0-alpha.2 2025-04-13 17:43:41 +02:00
2f9606b187 Exclude candle-book to avoid some CI failures. (#2889)
* Exclude candle-book to avoid some CI failures.

* Remove the book CIs.
2025-04-13 17:11:41 +02:00
f3a73f80d1 Support for cudnn conv1d. (#2888)
* Support for cudnn conv1d.

* More conv1d work.

* Get the conv1d to work with cudnn.

* Cleanup.
2025-04-13 16:47:37 +02:00