candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3c815b1dca	Pin the revision used by moondream. (#2340 )	2024-07-18 10:49:46 +02:00
Laurent Mazare	42891cc613	Add mathstral in the examples. (#2339 )	2024-07-18 08:24:49 +02:00
Zhuo Jinggang	c63048d374	add quantized qwen2 (#2329 ) * add quantized version of qwen2 and corresponding example for qwen2-instruct * fix quantized qwen2 clippy error	2024-07-12 10:00:03 +02:00
Jani Monoses	a226a9736b	Add Mobilenet v4 (#2325 ) * Support different resolutions in load_image() * Added MobilenetV4 model. * Add MobileNetv4 to README	2024-07-09 13:52:20 +02:00
v-espitalier	9cd54aa5d4	Add EVA-02 model ( https://arxiv.org/abs/2303.11331 ) (#2311 ) * Add EVA-02 model ( https://arxiv.org/abs/2303.11331 ) * Clippy fix. * And apply fmt. --------- Co-authored-by: v-espitalier <> Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-07-07 20:09:31 +02:00
v-espitalier	ecff05d72b	Beit: Add the gen_relative_position_index() function (#2306 ) Co-authored-by: v-espitalier <>	2024-07-04 09:45:26 +02:00
v-espitalier	7f1ba8038c	Add Beit model ( https://arxiv.org/abs/2106.08254 ) (#2305 ) Co-authored-by: v-espitalier <>	2024-07-01 22:11:48 +02:00
Czxck001	74e9e41911	make up for the missing last token output of phi2 example (#2299 )	2024-06-29 21:34:42 +02:00
v-espitalier	e27aac0a06	Add DINOv2Reg4 + PlantCLEF2024 (#2293 ) * Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 ) * Remove extra files + update README to download them + remove extra lines * minor fix (README remove extra spaces) * minor fix (README: Fix image url) * Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions ) * Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt' * Another clippy fix. --------- Co-authored-by: x-VEspit <vincent.espitalier@cirad.fr> Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-06-29 11:49:15 +02:00
Jeroen Vlek	242e006bbb	Depth Anything v2 (#2279 ) * define structs * construct ResidualConvUnit * forward() for ResidualConvUnit * implement FeatureFusionBlock * implement Scratch * implement DPTHead * add identity module * implement forward for DTPHead * add get_intermediate_layers to DinoVisionTransformer * implement DepthAnythingV2 * some minor tweaks * fix compile errors * fix var builder prefixes * setup initial example * use fixed patch size of 37 (518 / 14) * debugged until output * print min and max values * add some dynamism to the output location * scale input image * extract prep function * extract output path function * normalize image with magic mean and std * add spectral coloring * squeeze in the right place * make enterpolation optional * use bail instead of panic * omit unnecessary Shape call * remove empty curly braces * use bail instead of assert * use vb and pp * remove closures * extract config object * Apply rustfmt. * Fix some clippy lints. * More lints. * Use the array methods. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-06-24 19:12:52 +02:00
Laurent Mazare	54ff971e35	Support for the new Qwen2 models. (#2257 ) * Support for the new Qwen2 models. * Add more models.	2024-06-07 10:51:50 +01:00
chenwanqq	cd4d941ed1	Add LLaVA support (#2234 ) * first commit * llava * clippy and fmt * some fixes * minor fixes * remove useless file * refactor: Remove llava/constants.rs and update llava/mod.rs * modify variable name * modify code after clippy * Minor tweaks. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-06-03 11:54:09 +02:00
Laurent Mazare	45e235a747	Simplify the KvCache api. (#2207 )	2024-05-23 17:07:21 +02:00
Jani Monoses	77ea479a18	Add Phi-3 Medium (#2205 )	2024-05-23 13:33:17 +02:00
Laurent Mazare	7ebc3548e1	Use flash-attn in gemma. (#2195 ) * Use flash-attn in gemma. * Fix flash-attn for head dim 256.	2024-05-18 19:18:59 +02:00
Laurent Mazare	eefc1c77ef	Support flash-attn in quantized phi3. (#2194 )	2024-05-18 17:12:56 +02:00
Laurent Mazare	01545f7303	Add a slice_set op. (#2193 ) * Add a slice_set op. * Add some testing. * Add the dedicated kv-cache module. * Derive debug and clone. * Expose more kv-cache functions. * Return the current data when appending. * Use the new cache in the quantized phi3 model.	2024-05-18 15:58:18 +02:00
Yin Guobing	349c3e806a	Support embedding model gte-Qwen1.5-7B-instruct (#2190 ) * Support embedding model gte-Qwen1.5-7B-instruct This is a text embedding model based on Qwen2. They share same model architecture except the last MLP module. This commit brings in minimal modification of the old Qwen2 implementation to support both models. An example is provided, and had been verified according to the official PyTorch implementation. * Avoid doing the 'last-token filtering' based on the absence of attention mask. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-05-16 21:34:10 +02:00
Daniel Varga	cc80e065e5	Allow the threshold argumet to be negative in the segment-anything example (#2187 ) Threshold is 0.0 by default, negative values make more points included, expanding the mask. Positive values make it more picky, making the mask smaller. Negative numbers start with a minus sign, which normally makes clap consider it a flag.	2024-05-15 13:17:20 +02:00
Laurent Mazare	a75cd8164f	Force the revision for the phi3-llama quantized models. (#2159 )	2024-05-04 10:41:18 +02:00
Laurent Mazare	b13a82a438	Separate quantized phi-3 implementation. (#2157 ) * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.	2024-05-04 10:14:57 +02:00
Laurent Mazare	59b18d974e	Pin the version used for the quantized phi 3 gguf file. (#2156 )	2024-05-03 15:03:22 +02:00
Laurent Mazare	89f53b9d7b	Bump the version number to 0.5.1. (#2155 ) * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.	2024-05-03 11:17:05 +02:00
Laurent Mazare	a09d451d11	Support top-k in tthe llama example. (#2150 )	2024-05-01 22:25:47 +02:00
Laurent Mazare	ed7b99f525	Add a toggle for F16/BF16 accumulation in gemm. (#2141 ) * Add a toggle to control f16/bf16 gemm precision. * Use the faster variant in the quantized example. * Bugfix.	2024-04-29 09:21:07 +02:00
hardlydearly	c68ed8963f	chore: fix some typos in comments (#2121 ) Signed-off-by: hardlydearly <799511800@qq.com>	2024-04-28 08:34:32 +02:00
Laurent Mazare	3b429f3023	Make the dtype configurable for phi. (#2133 )	2024-04-27 21:32:49 +02:00
Isotr0py	6cf82fd7a3	Add Olmo models (#2127 ) * add olmo support * add olmo readme * Fix fmt. * Fix clippy. * Get olmo to work on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-26 11:02:51 +02:00
Laurent Mazare	cfab6e7616	Mention phi-v3 in the readmes. (#2122 )	2024-04-24 20:54:24 +02:00
Laurent Mazare	11d4a3c588	Add the phi-3 model. (#2120 ) * Add the phi-3 model. * Faster rope. * Bugfix. * Fix the detokenization.	2024-04-24 09:48:13 +02:00
Laurent Mazare	9d3f1c8af5	Add the phi-v3 quantized model. (#2118 ) * Add the phi-v3 quantized model. * Also include phi-3 in the main phi example.	2024-04-24 08:22:23 +02:00
Laurent Mazare	618ecf5e23	Better time measurement for the llama example. (#2106 )	2024-04-22 17:54:27 +02:00
Laurent Mazare	c388be93e7	Updated quantized phi model (#2099 ) * Quantized phi in a separate file. * Add the quantized phi example + rework the model code. * Improve the phi model. * Get some generation out. * Use the appropriate rope shape. * Tweak the default prompt. --------- Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-21 07:37:07 +02:00
Laurent Mazare	587ee3bb6f	Small cleanups to the llama multi-process example. (#2098 )	2024-04-20 22:19:46 +02:00
Laurent Mazare	52ae332910	Use llama v3 by default + add to readme. (#2094 )	2024-04-20 16:11:24 +02:00
Laurent Mazare	8b390ddd29	Only download the weights in the main process (and not in the child processes). (#2093 )	2024-04-20 13:01:23 +02:00
Laurent Mazare	c97d639fa0	Multiprocess/multi-GPU support for llama 3. (#2092 ) * Multiprocess/multi-GPU support for llama 3. * Modernize the mp example a bit.	2024-04-20 12:49:21 +02:00
Laurent Mazare	9c532aef47	Also enable llama-v3 8b instruct. (#2088 )	2024-04-19 08:50:06 +02:00
Thomas Santerre	f7a6468238	Add support for llama3 on the quantized example (#2086 ) * add support for l3b, new tokenizer * add todo * Add todo and use k_s model * Use the official tokenizers. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-18 22:52:00 +02:00
Laurent Mazare	e6ee7ba4d4	Llama v3. (#2085 ) * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.	2024-04-18 22:19:54 +02:00
NorilskMajor	4d14777673	Utilize batches in Stable Diffusion (#2071 ) * Utilize batches in Stable Diffusion that were already there, but unutilized. Also refactor out the `save_image` function. * Clippy + cosmetic fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-16 06:49:04 +02:00
Harry Stern	c119600d6e	Move image tensor to device in trocr example (#2063 ) Signed-off-by: Harry Stern <harry@harrystern.net>	2024-04-15 06:50:32 +02:00
Laurent Mazare	50e49ecc5f	Add a quantized version of recurrent-gemma. (#2054 ) * Add a quantized version of recurrent-gemma. * Share the rglru part. * Get the quantized gemma model to work.	2024-04-13 20:07:01 +02:00
Laurent Mazare	26cbbf8d84	Mandatory topk sampling for recurrent-gemma. (#2051 )	2024-04-13 10:31:39 +02:00
Laurent Mazare	2bf413caa3	Add the recurrent-gemma model. (#2039 ) * Start adding the recurrent-gemma model. * More griffin. * Add the example + get the weights to load from the HF version. * More inference code. * Rope + kv-cache on the attention side. * Add to the inference code. * Add more to the recurrent gemma inference. * Get some first inference to run. * Add the softcap on logits. * Fixes. * Use partial rotary embeddings. * Get inference to work. * Add a comment. * And add a readme.	2024-04-13 00:05:21 +02:00
Laurent Mazare	a0460cd2b1	Add the code-gemma models. (#2038 ) * Add the code-gemma models. * Tweak to the gemma config.	2024-04-10 21:19:21 +02:00
Laurent Mazare	b81ecf712d	Support alternative dtypes for mamba (#2036 ) * Allow different dtypes in mamba. * Add a dtype flag.	2024-04-10 18:10:01 +02:00
Laurent Mazare	7f354473cf	Optimize copy-2d for metal. (#2024 ) * Optimize copy-2d for metal. * Add a hacky stopping rule for moondream.	2024-04-07 12:34:16 +02:00
Laurent Mazare	33c9b66554	Add the new gemma models. (#2023 ) * Add the new gemma models. * Revert the lightning changes. * Support for the 1.1 models.	2024-04-06 21:25:38 +02:00
Santiago Medina	ace282e5c2	Add flag to run Moondream in f16 precision (#2015 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2 * Add flag to use f16 * Avoid breaking the quantized version on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-05 07:03:33 +02:00

1 2 3 4 5 ...

680 Commits