* add Qwen3.rs
* fixed compile error
* attempting to gett pr 2903 working with qwen weights
* different qwen variants working
* added moe model
* clippy
* added additional eos token
* translated Korean comments to English as well as I can
* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm
* replaced custom repeat_kv implementation with candle's repeat_kv implementation
* replace linear with linear_b in attention initalization
* replaced custom custom kv_cache implementation with candle kv_cache
* style
* replaced explicit broadcast add with normal add in decoder layer
* removed keeping the Rotary embedding layer in the model struct
* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM
* removed duplicate code from qwen3_moe
* removed sliding window from qwen3 attention
* removed MoE code
* removed unused option
* Fixed Typo
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* fixed tie word embeddings to use the correct embedding weights instead of the opposite
---------
Co-authored-by: Max <naturale@hufs.ac.kr>
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>