* qwen-moe rebase
* lint
* fixed rebase error
* swapped normal MoE model with CausalMoE Model in example, and swapped the tie word embeddings if statement
* updated readme
* onnx attention
* setup an example, adding and fixing onnx ops bit by bit
* model working, output is garbage data
* trilu working
* close but not quite, Issues still with scatterND
* closer but the outputs are still slightly wrong
* added tests for trilu and scatterND
* lint
* readme
* clippy
* removed unnessisary comments
* changed device selection, took hyperparameters from model config
* added resize to candle-onnx, not currently working
* changed unreachable to bail, and bailed when both scales and sizes are set
* cleanup and added other unused options for this op
* cleanup
* fixed image loading to make output work
* cleanup and removed unused variables
* removed path path creation code, and changed unwrap to ?
* add Qwen3.rs
* fixed compile error
* attempting to gett pr 2903 working with qwen weights
* different qwen variants working
* added moe model
* clippy
* added additional eos token
* translated Korean comments to English as well as I can
* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm
* replaced custom repeat_kv implementation with candle's repeat_kv implementation
* replace linear with linear_b in attention initalization
* replaced custom custom kv_cache implementation with candle kv_cache
* style
* replaced explicit broadcast add with normal add in decoder layer
* removed keeping the Rotary embedding layer in the model struct
* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM
* removed duplicate code from qwen3_moe
* removed sliding window from qwen3 attention
* removed MoE code
* removed unused option
* Fixed Typo
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* fixed tie word embeddings to use the correct embedding weights instead of the opposite
---------
Co-authored-by: Max <naturale@hufs.ac.kr>
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Optimize Tensor::new when called on nested Vec<..>.
* Improve performance.
* Similar flattening for the 4d case.
* More tweaks.
* Add some dummy test.
* Support for (un)-batched rope.
* Use 3d rope in the rope/ropei/rope_thd functions.
* Get the CPU versions to work.
* Fix the cuda version.
* Adapt the metal side.
* Fix the metal tests.
* removed scale factor from computation and made quantized gemma3 work similarly to non-quantized gemma3
* created default consts, replaced is_sliding with Option holding a window_size
* gemma3: changed RotaryEmbedding base freq based on layer and sliding window
* Changed attention mask per layer, either normal or sliding
* made attention mask creation slightly more efficient by only creating them once per model iteration
* changed is_sliding to an Option
* clippy
* changed to stop on both <eos> and <end_of_turn> instead of either or
* Add the const-set op.
* Cuda implementation.
* Bugfix.
* Metal cleanup.
* Add the metal kernels.
* Add some testing.
* Finish the metal implementation.
* Bump the version.
* implemented quantized-gemma, inference not working
* Fixed a few modeling bugs: outputing the correct tokens for a few iterations then garbage
* lint
* clippy
* quantized-gemma3 example working
* added readme
* clippy
* added CONTRIBUTING.md to candle-book
* added description to candle-book introduction
* Updated formatting and added different features to candle-book installation
* mnist guide first draft candle-book
* updated mnist guide syntax and grammar for candle-book
* changed HelloWorld - Mnist to Tutorial - Mnist in SUMMARY.md
* updated intro to mnist guide in candle-book