candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3afb04925a	Allow for growing the default KV cache when needed. (#2810 )	2025-03-16 17:30:25 +01:00
André Cipriani Bandarra	cbf5fc80c2	Add Gemma 3 1b IT toe Gemma examples (#2809 ) - Updates the Gemma example to include Gemma 3 1b instruction tuned.	2025-03-16 17:00:48 +01:00
Laurent Mazare	468d1d525f	Bump the crate version to 0.8.4. (#2808 ) 0.8.4	2025-03-15 07:42:24 +01:00
Mike Seddon	c930ab7e1a	upgrade half library to fix rand (#2806 ) fix lints	2025-03-14 09:01:54 +01:00
Laurent Mazare	111edbc4ea	Gemma 3 initial setup (text only). (#2802 ) * Gemma 3 initial setup (text only). * Use the rotating kv cache for the sliding window.	2025-03-14 07:56:02 +01:00
Laurent Mazare	e286cf7cc9	Parse the json config for siglip models. (#2800 ) * Parse the json config for siglip models. * Bump the tokenizers dependency. * Add a v2 model. * Support more v2 model.s	2025-03-09 14:01:09 +01:00
Mikhail Panfilov	e4ffb85228	Add ModernBert sentency classifier (#2796 )	2025-03-08 14:48:22 +01:00
Andrew Wason	37db86ff79	Allow ModernBert to be used to generate embeddings. (#2791 )	2025-03-03 12:39:04 +01:00
Jani Monoses	add3a714aa	phi-4-mini (#2790 )	2025-03-01 10:07:29 +01:00
Liang-Chi Hsieh	26c16923b9	Make sorted_nodes pub function (#2780 )	2025-02-22 10:23:45 +01:00
Laurent Mazare	9e8bf70333	Avoid some clippy lints on 1.85. (#2778 ) * Avoid some clippy lints on 1.85. * Upload artifacts v4.	2025-02-22 10:23:22 +01:00
Philip Fabianek	ac9cdbd448	Refactor From<Tuple> implementations by using macros, add tests (#2762 )	2025-02-19 10:58:29 +01:00
Eric Buehler	e6cc76fc37	Implement DeepSeek V2 (#2744 ) * Add deepseek v2 * Fix * Remove unused * Add kv cache * Remove from cargo.toml * Fix dtype selection logic * Fix unnecessary u32->f32->gather->u32 * Remove fromstr impl * Use local scopes for some clarity * Typo * Repeat k_pe * Chain calls to remove mut * Actually, remove all muts * Update readme	2025-02-19 10:51:01 +01:00
Laurent Mazare	fd7f7242a1	Bump the crate version to 0.8.3 (#2772 ) * update to cudarc to v0.13.5 to support cuda 12.8 * Bump the crate version. --------- Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com> 0.8.3	2025-02-15 15:54:48 +01:00
Michael McCulloch	3ddd20a5aa	update to cudarc to v0.13.5 to support cuda 12.8 (#2771 ) Co-authored-by: Michael McCulloch <michael.james.mcculloch@fastmail.com>	2025-02-15 15:47:23 +01:00
Amélie Royer	2423d633fc	add dynamic position encoding to Siglip (#2770 ) * add dynamic position encoding * remove debug messages	2025-02-14 13:50:50 +01:00
ivarflakstad	7c2449f623	Metal: Improved reduce and softmax (#1819 ) * Improve reduce perf and add contiguous impl * Improve arg reduce and add contiguous impl * Improve softmax kernel. 33%-39% higher thrpt * fmt * Fixed all bugs. Improved code quality. Added tests. * Stash for debugging * Stash for debugging 2 * Fixing argmax bug and improve performance Co-authored-by: Christopher Fleetwood <45471420+FL33TW00D@users.noreply.github.com> * Fix test and add is_valid_simgroup_reduce_type trait * Online softmax. Improved threadgroup reduce. Tidying up a bit. * Remove redundant threadgroup_barrier from arg reduce * Mostly tidying up. Some improvements * Simplify indexed struct * tidying * Reuse operation operator instead of passing it in as a parameter * Fix how operators are applied to indexed<vec<T,N>> * Vectorized load. Scalar block reduce. Hitting max throughput for f32 reduce. * Vectorized load for online softmax. Involves a reinterpret_cast of src which may be suboptimal. * Metal as_type casting vec<bfloat, N> -> vec<float, N/2> for simd and fast math * Use constant for input instead of const device. Fix strided reduce. * Use contiguous reduce in tests * Rename finalize -> to_scalar * Support integer types max/min (switch with trait-inferred impl later) * Was worried I was skipping work -> shuffling the 1D test cases * Add build.rs to avoid metal kernel jit compile overhead * Improve build. Extract utils * Compile metal kernels for both macos and ios * Fixed over xmas and then forgot about it * Add calculate_reduce_threads util * Remove old reduce.metal * Improve f16/bf16 softmax precision by accumulating in f32 * Remove build.rs (for now) * Move softmax bench to candle-nn * Remove redundant thread calc util fn * Use uint over ushort for indices etc * Use fast exp in MDReduceOp * Remove nested metal define for softmax * Fix some clippy lint. --------- Co-authored-by: Christopher Fleetwood <45471420+FL33TW00D@users.noreply.github.com> Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-02-08 07:27:01 +01:00
Doug A	0af3e428ec	fix: place `ug` dep behind `not wasm32` flag (#2760 ) * place `ug` behind not wasm32 attr so that wasm32 can compile * mv `ug` to conditional target dep assuming every non-wasm32 user wants this	2025-02-01 23:05:52 +01:00
Brady Bonnette	43017539ab	Adds DebertaV2/V3 (#2743 ) * Adds DebertaV2/V3 * Fixes all clippy warnings * Typos. * Addresses PR review findings. Some refactorings * Avoid some unwrap/unwrap_or. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-29 08:59:28 +01:00
A.V.	e142bf9530	use moondream1 model/revision for moondream example (#2748 )	2025-01-28 22:19:54 +01:00
Laurent Mazare	d2c53f4f2f	Remove the MFA gemm library. (#2755 )	2025-01-28 21:48:17 +01:00
Laurent Mazare	2a2852d1c1	Fix flash-attn build. (#2754 )	2025-01-28 18:49:46 +01:00
Laurent Mazare	8f20f2a722	Add the MLX merge sort kernels (#2751 ) * Add some metal sort kernels imported from MLX. * Add another test. * Start adding the multiblock version. * Proper kernel names. * Split out the main metal file. * Multi-block sort. * More sorting. * DType parametrization. * Add a larger test.	2025-01-28 14:09:43 +01:00
Laurent Mazare	ab9019425a	Make the metal sdpa tests deterministic. (#2750 )	2025-01-28 09:05:24 +01:00
Laurent Mazare	da02b59516	Allow using composed strings as metal kernel names. (#2747 )	2025-01-27 22:40:12 +01:00
Laurent Mazare	27996a1a9e	Remove the old MFA gemm kernels. (#2742 ) * Remove the old MFA gemm kernels. * Use bf16 in helium on metal.	2025-01-26 20:36:31 +01:00
Laurent Mazare	1a32107fab	Add a few metal gather ops. (#2740 ) * Add a few metal gather ops. * Fix some compilation issues. * Adjust the tolerance.	2025-01-25 23:31:03 +01:00
唐璜	333d94a19a	fix: fix the codegeex4 model examples and transformers model (#2738 ) * Update main.rs * Update codegeex4_9b.rs * Get things to compile. * Add some default for when rope_ratio is missing. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-25 17:41:12 +01:00
mneilly	3164a19a5d	Add inpainting to the stable diffusion example (#2735 ) * Update the stable diffusion example with inpainting support for 1.5, 2 and XL. * Apply cargo fmt. * Clippy fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-23 10:08:38 +01:00
Sergei Grebnov	e6cd499e98	Fix candle-flash-attn build on Windows (msvc) (#2734 )	2025-01-22 22:19:48 +01:00
Laurent Mazare	77db8396d0	Explicit error when slice-set is called with the same src and dst. (#2733 )	2025-01-22 21:31:49 +01:00
Laurent Mazare	85f0aaefe5	Add serde::serialize to activations. (#2732 )	2025-01-22 10:23:34 +01:00
Guoqing Bao	e4c3a71f11	Fix GLM4 alignment issue (#2723 ) * Fix GLM4 alignment issue * Cleanups. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2025-01-20 22:51:46 +01:00
Eric Buehler	17cbbe4286	Sync upstream MLX sdpa vector kernels with mask (#2718 ) * Sync upstream mlx sdpa vector kernels with mask * Dispatch to the 2pass kernel * Format	2025-01-16 11:30:10 +01:00
Laurent Mazare	6fd2f63a15	Bump the ug dependency. (#2720 ) * Bump the ug dependency. * Fix some test. * Fix the ug test.	2025-01-16 09:39:16 +01:00
Laurent Mazare	efd0e6822f	Fix the helium weights download. (#2717 )	2025-01-13 18:21:37 +01:00
Laurent Mazare	158817f230	Helium repo update. (#2716 )	2025-01-13 18:04:14 +01:00
Laurent Mazare	309cd0f7c7	Add the helium model. (#2715 )	2025-01-13 17:39:49 +01:00
Jani Monoses	ab7ff7081e	Fixes for running Phi-4 quantized. (#2714 )	2025-01-13 14:35:33 +01:00
Jani Monoses	461e8c1685	ModernBERT model (#2713 ) * layer_norm_no_bias * Modernbert model. * Format + cleanup error. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-01-13 08:39:27 +01:00
Laurent Mazare	2344c4e4b8	Clippy fixes for 1.84. (#2710 )	2025-01-10 10:15:15 +01:00
Laurent Mazare	32defdb7d5	Update cudarc. (#2708 )	2025-01-08 15:10:23 +01:00
Laurent Mazare	236c35e578	Bump the caret version to 0.8.2. (#2703 ) 0.8.2	2025-01-07 15:50:16 +01:00
Andrei Fajardo	6f8351dfda	add link to README (#2701 )	2025-01-04 23:07:30 +01:00
Luka Zakrajšek	57f41da13b	Fix mistral attention on Metal (#2699 ) Co-authored-by: Luka Zakrajsek <luka.zakrajsek@soniox.com>	2025-01-04 16:11:20 +01:00
Nick Senger	cbaa0ad46f	UniPC for diffusion sampling (#2684 ) * feat: Add unipc multistep scheduler * chore: Clippy and formatting * chore: Update comments * chore: Avoid unsafety in float ordering * refactor: Update Scheduler::step mutability requirements * fix: Corrector img2img * chore: Update unipc ref link to latest diffusers release * chore: Deduplicate float ordering * fix: Panic when running with dev profile	2025-01-01 21:34:17 +01:00
Laurent Mazare	b12c7c2888	Update the hf-hub dependency to 0.4.0. (#2691 ) * Update the hf-hub dependency to 0.4.0. * Fix the book. * Use 0.4.1.	2024-12-31 19:07:47 +01:00
Laurent Mazare	94ffc2ec6f	Actually remove the default hf-hub cache path for glm. (#2696 )	2024-12-31 11:00:44 +01:00
Laurent Mazare	7354afc673	Use the default hf-hub cache for glm. (#2695 )	2024-12-31 10:55:45 +01:00
Michael Feil	2a705e6f37	Flash-Attn upgrade / SoftCap Candle-FlashAttn [3/n] (#2690 ) * update flash-attn v1 * restore: hdim224 * add 224 flash_fwd_template * remove whitespace * softcap is working, including test and api. * make softcap test case better * unpadded lse added	2024-12-31 10:04:47 +01:00

1 2 3 4 5 ...

2306 Commits