* qwen-moe rebase
* lint
* fixed rebase error
* swapped normal MoE model with CausalMoE Model in example, and swapped the tie word embeddings if statement
* updated readme
* onnx attention
* setup an example, adding and fixing onnx ops bit by bit
* model working, output is garbage data
* trilu working
* close but not quite, Issues still with scatterND
* closer but the outputs are still slightly wrong
* added tests for trilu and scatterND
* lint
* readme
* clippy
* removed unnessisary comments
* changed device selection, took hyperparameters from model config
* added resize to candle-onnx, not currently working
* changed unreachable to bail, and bailed when both scales and sizes are set
* cleanup and added other unused options for this op
* cleanup
* fixed image loading to make output work
* cleanup and removed unused variables
* removed path path creation code, and changed unwrap to ?
* add Qwen3.rs
* fixed compile error
* attempting to gett pr 2903 working with qwen weights
* different qwen variants working
* added moe model
* clippy
* added additional eos token
* translated Korean comments to English as well as I can
* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm
* replaced custom repeat_kv implementation with candle's repeat_kv implementation
* replace linear with linear_b in attention initalization
* replaced custom custom kv_cache implementation with candle kv_cache
* style
* replaced explicit broadcast add with normal add in decoder layer
* removed keeping the Rotary embedding layer in the model struct
* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM
* removed duplicate code from qwen3_moe
* removed sliding window from qwen3 attention
* removed MoE code
* removed unused option
* Fixed Typo
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* fixed tie word embeddings to use the correct embedding weights instead of the opposite
---------
Co-authored-by: Max <naturale@hufs.ac.kr>
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* gemma3: changed RotaryEmbedding base freq based on layer and sliding window
* Changed attention mask per layer, either normal or sliding
* made attention mask creation slightly more efficient by only creating them once per model iteration
* changed is_sliding to an Option
* clippy
* changed to stop on both <eos> and <end_of_turn> instead of either or
* implemented quantized-gemma, inference not working
* Fixed a few modeling bugs: outputing the correct tokens for a few iterations then garbage
* lint
* clippy
* quantized-gemma3 example working
* added readme
* clippy
* Initial commit: model weights working, prediciton incorrect
* moved distilbertformaskedlm into distilbert modeling file
* made maskedLM like bert example, still incorrect predictions
* finally not getting NaNs, fixed attention mask
* getting correct output sentences
* get top k predictions
* fixed output formatting slightly
* added default arg for model_id
* lint
* moved masked token example code from distilbertformaskedlm example to distilbert example
* lint
* removed distilbertformaskedlm example
* cleanup
* clippy
* removed embedding normalization from example
* made output and model dependent on args instead of prompt
* lint
* replaced or_ok anyhow error with anyhow context
* changed error message for mask token not found
* Add the SNAC audio tokenizer.
* More snac.
* Again more snac.
* Add some example code for snac.
* Get the weights to load.
* Add to the snac model.
* Fixes.
* Get round-tripping to work.
* Save/load code files.
* Clippy fix.
* Fmt fix.
* Add the CSM model.
* Add some code to load the model.
* Load the text tokenizer.
* Add frame generation.
* Get the sampling to work.
* Rope fix.
* Autoregressive generation.
* Generate some audio file.
* Use the actual prompt.
* Support multiple turns.
* Add a very barebone readme.
* Move some of the shared bits to the model.
* added chatGLM readme
* changed wording in readme
* added readme for chinese-clip
* added readme for convmixer
* added readme for custom ops
* added readme for efficientnet
* added readme for llama
* added readme to mnist-training
* added readme to musicgen
* added readme to quantized-phi
* added readme to starcoder2
* added readme to whisper-microphone
* added readme to yi
* added readme to yolo-v3
* added readme to whisper-microphone
* added space to example in glm4 readme
* fixed mamba example readme to run mamba instead of mamba-minimal
* removed slash escape character
* changed moondream image to yolo-v8 example image
* added procedure for making the reinforcement-learning example work with a virtual environment on my machine
* added simple one line summaries to the example readmes without
* changed non-existant image to yolo example's bike.jpg
* added backslash to sam command
* removed trailing - from siglip
* added SoX to silero-vad example readme
* replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3
* added example to falcon readme
* added --which arg to stella-en-v5 readme
* fixed image path in vgg readme
* fixed the image path in the vit readme
* Update README.md
* Update README.md
* Update README.md
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Start updating to cudarc 0.14.
* Adapt a couple more things.
* And a couple more fixes.
* More tweaks.
* And a couple more fixes.
* Bump the major version number.
* Proper module system for the cuda kernels.
* Proper ptx loading.
* Launch the sort kernel.
* Custom op.
* Start using the builder pattern.
* More builder.
* More builder.
* Get candle-core to compile.
* Get the tests to pass.
* Get candle-nn to work too.
* Support for custom cuda functions.
* cudnn fixes.
* Get flash attn to run.
* Switch the crate versions to be alpha.
* Bump the ug dependency.
* added new language pairs to marian-mt
* lint
* seperated python code for converting tokenizers into its own file and and added a reqirements.txt for dependencies, updated instructions in readme and included python version
* Cleanup.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Update main.rs
* Update codegeex4_9b.rs
* Get things to compile.
* Add some default for when rope_ratio is missing.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Update the stable diffusion example with inpainting support for 1.5, 2 and XL.
* Apply cargo fmt.
* Clippy fixes.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>