candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Files

Kyle Birnbaum 1fdfb58de5 Updating Add qwen3 (PR 2903) to use HF weights (#2930 )

* add Qwen3.rs

* fixed compile error

* attempting to gett pr 2903 working with qwen weights

* different qwen variants working

* added moe model

* clippy

* added additional eos token

* translated Korean comments to English as well as I can

* removed specialized Qwen3RmsNorm and replaced with generic Candle RmsNorm

* replaced custom repeat_kv implementation with candle's repeat_kv implementation

* replace linear with linear_b in attention initalization

* replaced custom custom kv_cache implementation with candle kv_cache

* style

* replaced explicit broadcast add with normal add in decoder layer

* removed keeping the Rotary embedding layer in the model struct

* used tie_word_embeddings bool from config instead of relying on existence of weights for lm head in CasualLM

* removed duplicate code from qwen3_moe

* removed sliding window from qwen3 attention

* removed MoE code

* removed unused option

* Fixed Typo

Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

* fixed tie word embeddings to use the correct embedding weights instead of the opposite

---------

Co-authored-by: Max <naturale@hufs.ac.kr>
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>

2025-05-02 06:05:53 +02:00

based

Add Based LLM from Hazy Research. (#2411 )

2024-08-12 21:21:19 +02:00

beit

onnx: fix pad, unsqueeze (#2317 )

2024-07-23 23:10:57 +02:00

bert

bert attention mask (#1934 )

2024-08-01 08:26:19 +02:00

bigcode

…

blip

onnx: fix pad, unsqueeze (#2317 )

2024-07-23 23:10:57 +02:00

chatglm

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

chinese_clip

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

clip

Add the SigLIP model. (#2515 )

2024-09-28 23:48:00 +02:00

codegeex4-9b

fix: fix the codegeex4 model examples and transformers model (#2738 )

2025-01-25 17:41:12 +01:00

colpali

Add ColPali (#2524 )

2024-10-01 11:48:39 +02:00

convmixer

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

convnext

Move the image tensor to the appropriate device. (#1856 )

2024-03-16 22:25:46 +01:00

csm

Add the missing voices for CSM. (#2867 )

2025-04-05 06:52:36 +02:00

custom-ops

Cuda cleanup. (#2880 )

2025-04-11 21:43:35 +02:00

debertav2

Adds DebertaV2/V3 (#2743 )

2025-01-29 08:59:28 +01:00

deepseekv2

Implement DeepSeek V2 (#2744 )

2025-02-19 10:51:01 +01:00

depth_anything_v2

make DepthAnythingV2 more reusable (#2675 )

2024-12-21 12:06:03 +01:00

dinov2

Move the image tensor to the appropriate device. (#1856 )

2024-03-16 22:25:46 +01:00

dinov2reg4

Add DINOv2Reg4 + PlantCLEF2024 (#2293 )

2024-06-29 11:49:15 +02:00

distilbert

Implementing DistilBertForMaskedLM. (#2866 )

2025-04-11 13:25:39 +02:00

efficientnet

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

efficientvit

Move the image tensor to the appropriate device. (#1856 )

2024-03-16 22:25:46 +01:00

encodec

Make the RNN configs accessible from the models. (#2541 )

2024-10-04 14:22:23 +02:00

eva2

onnx: fix pad, unsqueeze (#2317 )

2024-07-23 23:10:57 +02:00

falcon

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

fastvit

FastViT fixes. (#2452 )

2024-08-28 11:20:09 +02:00

flux

Streamline the glm4 example. (#2694 )

2024-12-31 09:21:41 +01:00

gemma

Fixed Gemma3 model and example (#2917 )

2025-04-25 05:35:08 +02:00

glm4

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

granite

Adding Granite 7b Instruct model example (#2487 )

2024-09-21 11:52:01 +02:00

gte-qwen

Support embedding model gte-Qwen1.5-7B-instruct (#2190 )

2024-05-16 21:34:10 +02:00

helium

Add support for Helium-v1. (#2932 )

2025-04-30 19:38:44 +02:00

hiera

Add Hiera vision model. (#2382 )

2024-08-01 11:59:22 +02:00

jina-bert

Fix cargo fmt. (#2383 )

2024-08-01 14:19:41 +02:00

llama

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

llama2-c

Use cudarc 0.16. (#2900 )

2025-04-15 21:40:18 +02:00

llama_multiprocess

Apply rustfmt. (#2421 )

2024-08-16 18:57:14 +02:00

llava

onnx: fix pad, unsqueeze (#2317 )

2024-07-23 23:10:57 +02:00

mamba

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

mamba-minimal

Fix for clippy 1.86. (#2864 )

2025-04-03 19:38:27 +02:00

marian-mt

Added new language pairs to marian-mt example. (#2860 )

2025-04-02 23:50:14 +02:00

metavoice

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

mimi

Make the RNN configs accessible from the models. (#2541 )

2024-10-04 14:22:23 +02:00

mistral

Support for mistral-nemo. (#2396 )

2024-08-04 19:52:40 +02:00

mixtral

Use BF16 on metal when possible. (#2378 )

2024-08-01 10:48:58 +02:00

mnist-training

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

mobileclip

Remove some extra whitelines. (#2513 )

2024-09-28 14:41:28 +02:00

mobilenetv4

MobileCLIP models S1 and S2 (#2454 )

2024-08-29 15:38:58 +02:00

mobileone

Move the image tensor to the appropriate device. (#1856 )

2024-03-16 22:25:46 +01:00

modernbert

ModernBERT model (#2713 )

2025-01-13 08:39:27 +01:00

moondream

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

musicgen

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

nvembed_v2

Add Nvembed v2 model (#2649 )

2024-12-03 10:56:01 +01:00

olmo

Add Olmo models (#2127 )

2024-04-26 11:02:51 +02:00

onnx

…

orpheus

Add the Orpheus TTS. (#2886 )

2025-04-13 12:02:17 +02:00

paligemma

Add PaliGemma. (#2519 )

2024-09-29 19:56:56 +02:00

parler-tts

Fix for parler-tts, do not add the last slice of padding tokens. (#2442 )

2024-08-22 23:22:03 +02:00

phi

phi-4-mini (#2790 )

2025-03-01 10:07:29 +01:00

pixtral

Pixtral polishing. (#2522 )

2024-09-30 21:23:54 +02:00

quantized

Added Deepseekr1 Llama8b variant to quantized example (#2842 )

2025-03-30 10:55:21 +02:00

quantized-gemma

fixed quantized-gemma example (#2914 )

2025-04-23 05:39:03 +02:00

quantized-phi

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

quantized-qwen2-instruct

Added DeepseekR1 Qwen7B variant to quantized-qwen2-instruct example (#2843 )

2025-03-30 10:54:22 +02:00

quantized-t5

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

qwen

Updating Add qwen3 (PR 2903) to use HF weights (#2930 )

2025-05-02 06:05:53 +02:00

recurrent-gemma

Add a quantized version of recurrent-gemma. (#2054 )

2024-04-13 20:07:01 +02:00

reinforcement-learning

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

replit-code

…

repvgg

Move the image tensor to the appropriate device. (#1856 )

2024-03-16 22:25:46 +01:00

resnet

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

rwkv

Fix the model path for rwkv. (#1825 )

2024-03-09 11:21:48 +01:00

segformer

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

segment-anything

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

siglip

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

silero-vad

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

snac

Support more snac variants. (#2871 )

2025-04-07 08:23:47 +02:00

splade

Add BertForMaskedLM to support SPLADE Models (#2550 )

2024-10-07 23:28:21 +02:00

stable-diffusion

upgrade half library to fix rand (#2806 )

2025-03-14 09:01:54 +01:00

stable-diffusion-3

Support Skip Layer Guidance (SLG) for Stable Diffusion 3.5 Medium (#2590 )

2024-11-01 18:10:40 +01:00

stable-lm

Avoid copying the data on squeeze and unsqueeze. (#1884 )

2024-03-20 13:04:36 +01:00

starcoder2

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

stella-en-v5

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

trocr

onnx: fix pad, unsqueeze (#2317 )

2024-07-23 23:10:57 +02:00

vgg

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

vit

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

whisper

Fix for whisper example. rand::distribution is now rand::distr (#2811 )

2025-03-16 19:14:55 +01:00

whisper-microphone

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

wuerstchen

…

xlm-roberta

Added XLMRobertaModel for Reranking (#2686 )

2024-12-30 11:16:57 +01:00

Added readmes to examples (#2835 )

2025-04-03 09:18:29 +02:00

yolo-v3

Expose the cudnn algo in the conv ops. (#2892 )

2025-04-14 08:25:32 +02:00

yolo-v8

Expose the cudnn algo in the conv ops. (#2892 )

2025-04-14 08:25:32 +02:00

onnx_basics.rs

…