candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Author	SHA1	Message	Date
Laurent Mazare	cf9d7bf24c	Add the CSM model. (#2862 ) * Add the CSM model. * Add some code to load the model. * Load the text tokenizer. * Add frame generation. * Get the sampling to work. * Rope fix. * Autoregressive generation. * Generate some audio file. * Use the actual prompt. * Support multiple turns. * Add a very barebone readme. * Move some of the shared bits to the model. 0.9.0-alpha.1	2025-04-04 06:48:03 +02:00
Laurent Mazare	9d31361c4f	Fix for clippy 1.86. (#2864 ) * Fix for clippy 1.86. * More clippy fixes. * More fixes.	2025-04-03 19:38:27 +02:00
Kyle Birnbaum	648596c073	Added readmes to examples (#2835 ) * added chatGLM readme * changed wording in readme * added readme for chinese-clip * added readme for convmixer * added readme for custom ops * added readme for efficientnet * added readme for llama * added readme to mnist-training * added readme to musicgen * added readme to quantized-phi * added readme to starcoder2 * added readme to whisper-microphone * added readme to yi * added readme to yolo-v3 * added readme to whisper-microphone * added space to example in glm4 readme * fixed mamba example readme to run mamba instead of mamba-minimal * removed slash escape character * changed moondream image to yolo-v8 example image * added procedure for making the reinforcement-learning example work with a virtual environment on my machine * added simple one line summaries to the example readmes without * changed non-existant image to yolo example's bike.jpg * added backslash to sam command * removed trailing - from siglip * added SoX to silero-vad example readme * replaced procedure for uv on mac with warning that uv isn't currently compatible with pyo3 * added example to falcon readme * added --which arg to stella-en-v5 readme * fixed image path in vgg readme * fixed the image path in the vit readme * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>	2025-04-03 09:18:29 +02:00
Laurent Mazare	d9904a3baf	Update to cudarc 0.14 (breaking change). (#2858 ) * Start updating to cudarc 0.14. * Adapt a couple more things. * And a couple more fixes. * More tweaks. * And a couple more fixes. * Bump the major version number. * Proper module system for the cuda kernels. * Proper ptx loading. * Launch the sort kernel. * Custom op. * Start using the builder pattern. * More builder. * More builder. * Get candle-core to compile. * Get the tests to pass. * Get candle-nn to work too. * Support for custom cuda functions. * cudnn fixes. * Get flash attn to run. * Switch the crate versions to be alpha. * Bump the ug dependency.	2025-04-03 09:12:19 +02:00
Kyle Birnbaum	d6db305829	Added new language pairs to marian-mt example. (#2860 ) * added new language pairs to marian-mt * lint * seperated python code for converting tokenizers into its own file and and added a reqirements.txt for dependencies, updated instructions in readme and included python version * Cleanup. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-04-02 23:50:14 +02:00
Zack Angelo	b4daa03e59	add as_cuda_slice_mut to CudaStorage and CudaDType (#2859 )	2025-04-01 19:34:52 +02:00
Bryan Lee	9541467d6b	Add `flip` to `tensor` (#2855 ) * Add `flip` to `tensor` * Move the tests to the proper places. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-04-01 09:07:16 +02:00
Kyle Birnbaum	6429609090	Added Deepseekr1 Llama8b variant to quantized example (#2842 ) * added deepseekr1 llama8b variant to quantized example * lint	2025-03-30 10:55:21 +02:00
Kyle Birnbaum	ba473290da	Added DeepseekR1 Qwen7B variant to quantized-qwen2-instruct example (#2843 ) * quantized deepseek qwen generating tokens * removed is_deepseek from Args and replaced prompt if statement with pattern matching	2025-03-30 10:54:22 +02:00
Bryan Lee	59c26195db	Fix CIFAR10 dataset types and dimension ordering (#2845 )	2025-03-30 10:53:25 +02:00
LongYinan	cb02b389d5	Fix reinforcement learning example (#2837 )	2025-03-26 16:27:45 +01:00
Kyle Birnbaum	0d4097031c	fixed rand import for mnist-training (#2833 )	2025-03-26 08:10:03 +01:00
Kyle Birnbaum	10853b803c	fixed rand imports for whisper-microphone example (#2834 )	2025-03-26 08:09:27 +01:00
xkeyC	f3d472952f	fix: `candle-flash-attn` linux and `msvc` build (#2829 ) * fix: candle-flash-attn linux and msvc build * Missing newline at eof. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-03-25 08:45:12 +01:00
Christian Balcom	67b85f79f1	Pickle decoder fix and Long1 opcode addition. (#2824 ) * Pickle decoder changes: added Long1 opcode, fixed tensor offset calculation * Apply rustfmt. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-03-23 08:10:08 +01:00
Benjamin Beurdouche	0b24f7f0a4	Fix for whisper example. rand::distribution is now rand::distr (#2811 )	2025-03-16 19:14:55 +01:00
Laurent Mazare	3afb04925a	Allow for growing the default KV cache when needed. (#2810 )	2025-03-16 17:30:25 +01:00
André Cipriani Bandarra	cbf5fc80c2	Add Gemma 3 1b IT toe Gemma examples (#2809 ) - Updates the Gemma example to include Gemma 3 1b instruction tuned.	2025-03-16 17:00:48 +01:00
Laurent Mazare	468d1d525f	Bump the crate version to 0.8.4. (#2808 ) 0.8.4	2025-03-15 07:42:24 +01:00
Mike Seddon	c930ab7e1a	upgrade half library to fix rand (#2806 ) fix lints	2025-03-14 09:01:54 +01:00
Laurent Mazare	111edbc4ea	Gemma 3 initial setup (text only). (#2802 ) * Gemma 3 initial setup (text only). * Use the rotating kv cache for the sliding window.	2025-03-14 07:56:02 +01:00
Laurent Mazare	e286cf7cc9	Parse the json config for siglip models. (#2800 ) * Parse the json config for siglip models. * Bump the tokenizers dependency. * Add a v2 model. * Support more v2 model.s	2025-03-09 14:01:09 +01:00
Mikhail Panfilov	e4ffb85228	Add ModernBert sentency classifier (#2796 )	2025-03-08 14:48:22 +01:00
Andrew Wason	37db86ff79	Allow ModernBert to be used to generate embeddings. (#2791 )	2025-03-03 12:39:04 +01:00
Jani Monoses	add3a714aa	phi-4-mini (#2790 )	2025-03-01 10:07:29 +01:00
Liang-Chi Hsieh	26c16923b9	Make sorted_nodes pub function (#2780 )	2025-02-22 10:23:45 +01:00
Laurent Mazare	9e8bf70333	Avoid some clippy lints on 1.85. (#2778 ) * Avoid some clippy lints on 1.85. * Upload artifacts v4.	2025-02-22 10:23:22 +01:00
Philip Fabianek	ac9cdbd448	Refactor From<Tuple> implementations by using macros, add tests (#2762 )	2025-02-19 10:58:29 +01:00
Eric Buehler	e6cc76fc37	Implement DeepSeek V2 (#2744 ) * Add deepseek v2 * Fix * Remove unused * Add kv cache * Remove from cargo.toml * Fix dtype selection logic * Fix unnecessary u32->f32->gather->u32 * Remove fromstr impl * Use local scopes for some clarity * Typo * Repeat k_pe * Chain calls to remove mut * Actually, remove all muts * Update readme	2025-02-19 10:51:01 +01:00
Laurent Mazare	fd7f7242a1	Bump the crate version to 0.8.3 (#2772 ) * update to cudarc to v0.13.5 to support cuda 12.8 * Bump the crate version. --------- Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com> 0.8.3	2025-02-15 15:54:48 +01:00
Michael McCulloch	3ddd20a5aa	update to cudarc to v0.13.5 to support cuda 12.8 (#2771 ) Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com>	2025-02-15 15:47:23 +01:00
Amélie Royer	2423d633fc	add dynamic position encoding to Siglip (#2770 ) * add dynamic position encoding * remove debug messages	2025-02-14 13:50:50 +01:00
ivarflakstad	7c2449f623	Metal: Improved reduce and softmax (#1819 ) * Improve reduce perf and add contiguous impl * Improve arg reduce and add contiguous impl * Improve softmax kernel. 33%-39% higher thrpt * fmt * Fixed all bugs. Improved code quality. Added tests. * Stash for debugging * Stash for debugging 2 * Fixing argmax bug and improve performance Co-authored-by: Christopher Fleetwood <45471420+FL33TW00D@users.noreply.github.com> * Fix test and add is_valid_simgroup_reduce_type trait * Online softmax. Improved threadgroup reduce. Tidying up a bit. * Remove redundant threadgroup_barrier from arg reduce * Mostly tidying up. Some improvements * Simplify indexed struct * tidying * Reuse operation operator instead of passing it in as a parameter * Fix how operators are applied to indexed<vec<T,N>> * Vectorized load. Scalar block reduce. Hitting max throughput for f32 reduce. * Vectorized load for online softmax. Involves a reinterpret_cast of src which may be suboptimal. * Metal as_type casting vec<bfloat, N> -> vec<float, N/2> for simd and fast math * Use constant for input instead of const device. Fix strided reduce. * Use contiguous reduce in tests * Rename finalize -> to_scalar * Support integer types max/min (switch with trait-inferred impl later) * Was worried I was skipping work -> shuffling the 1D test cases * Add build.rs to avoid metal kernel jit compile overhead * Improve build. Extract utils * Compile metal kernels for both macos and ios * Fixed over xmas and then forgot about it * Add calculate_reduce_threads util * Remove old reduce.metal * Improve f16/bf16 softmax precision by accumulating in f32 * Remove build.rs (for now) * Move softmax bench to candle-nn * Remove redundant thread calc util fn * Use uint over ushort for indices etc * Use fast exp in MDReduceOp * Remove nested metal define for softmax * Fix some clippy lint. --------- Co-authored-by: Christopher Fleetwood <45471420+FL33TW00D@users.noreply.github.com> Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-02-08 07:27:01 +01:00
Doug A	0af3e428ec	fix: place `ug` dep behind `not wasm32` flag (#2760 ) * place `ug` behind not wasm32 attr so that wasm32 can compile * mv `ug` to conditional target dep assuming every non-wasm32 user wants this	2025-02-01 23:05:52 +01:00
Brady Bonnette	43017539ab	Adds DebertaV2/V3 (#2743 ) * Adds DebertaV2/V3 * Fixes all clippy warnings * Typos. * Addresses PR review findings. Some refactorings * Avoid some unwrap/unwrap_or. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-29 08:59:28 +01:00
A.V.	e142bf9530	use moondream1 model/revision for moondream example (#2748 )	2025-01-28 22:19:54 +01:00
Laurent Mazare	d2c53f4f2f	Remove the MFA gemm library. (#2755 )	2025-01-28 21:48:17 +01:00
Laurent Mazare	2a2852d1c1	Fix flash-attn build. (#2754 )	2025-01-28 18:49:46 +01:00
Laurent Mazare	8f20f2a722	Add the MLX merge sort kernels (#2751 ) * Add some metal sort kernels imported from MLX. * Add another test. * Start adding the multiblock version. * Proper kernel names. * Split out the main metal file. * Multi-block sort. * More sorting. * DType parametrization. * Add a larger test.	2025-01-28 14:09:43 +01:00
Laurent Mazare	ab9019425a	Make the metal sdpa tests deterministic. (#2750 )	2025-01-28 09:05:24 +01:00
Laurent Mazare	da02b59516	Allow using composed strings as metal kernel names. (#2747 )	2025-01-27 22:40:12 +01:00
Laurent Mazare	27996a1a9e	Remove the old MFA gemm kernels. (#2742 ) * Remove the old MFA gemm kernels. * Use bf16 in helium on metal.	2025-01-26 20:36:31 +01:00
Laurent Mazare	1a32107fab	Add a few metal gather ops. (#2740 ) * Add a few metal gather ops. * Fix some compilation issues. * Adjust the tolerance.	2025-01-25 23:31:03 +01:00
唐璜	333d94a19a	fix: fix the codegeex4 model examples and transformers model (#2738 ) * Update main.rs * Update codegeex4_9b.rs * Get things to compile. * Add some default for when rope_ratio is missing. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-25 17:41:12 +01:00
mneilly	3164a19a5d	Add inpainting to the stable diffusion example (#2735 ) * Update the stable diffusion example with inpainting support for 1.5, 2 and XL. * Apply cargo fmt. * Clippy fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-23 10:08:38 +01:00
Sergei Grebnov	e6cd499e98	Fix candle-flash-attn build on Windows (msvc) (#2734 )	2025-01-22 22:19:48 +01:00
Laurent Mazare	77db8396d0	Explicit error when slice-set is called with the same src and dst. (#2733 )	2025-01-22 21:31:49 +01:00
Laurent Mazare	85f0aaefe5	Add serde::serialize to activations. (#2732 )	2025-01-22 10:23:34 +01:00
Guoqing Bao	e4c3a71f11	Fix GLM4 alignment issue (#2723 ) * Fix GLM4 alignment issue * Cleanups. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-20 22:51:46 +01:00
Eric Buehler	17cbbe4286	Sync upstream MLX sdpa vector kernels with mask (#2718 ) * Sync upstream mlx sdpa vector kernels with mask * Dispatch to the 2pass kernel * Format	2025-01-16 11:30:10 +01:00

1 2 3 4 5 ...

2322 Commits