candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 19:58:35 +00:00

Author	SHA1	Message	Date
zachcp	3769206583	Update docs (#2553 ) * add module docs for candle-core * doc each of the candle-nn modules and add the links to the doc page	2024-11-11 22:13:52 +01:00
Eric Buehler	e2b6b367fa	Add some fast Metal MLX SDPA kernels (#2584 ) * Add some fast Metal MLX SDPA kernels (#32) * Sketch the sdpa kernel * Add full sdpa kernel, * Add test * Add vectorized kernel for decoding * Update tests * Add some docs * Fix sdpa_vector names * Add softcapping for vectorized sdpa * Add softcapping for full sdpa * Add support for head dim 32, 96, 256 * Add support for head dim 32, 96, 256 * Update docs * Add update notice * Clippy and format * Conditional compilation for bf16 * Use it in quantized llama * Some review comments * Use set_params! * Remove unused * Remove feature * Fix metal sdpa for v stride * Remove comma * Add the dim method to layout and shape. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-11-05 09:28:00 +01:00
Laurent Mazare	6454597943	Improved launch config for layer-norm/rms-norm. (#2591 ) * Improved launch config for layer-norm/rms-norm. * Add more testing for the fused layer/rms norm kernels.	2024-11-04 10:42:18 +01:00
Jeroen Vlek	242e006bbb	Depth Anything v2 (#2279 ) * define structs * construct ResidualConvUnit * forward() for ResidualConvUnit * implement FeatureFusionBlock * implement Scratch * implement DPTHead * add identity module * implement forward for DTPHead * add get_intermediate_layers to DinoVisionTransformer * implement DepthAnythingV2 * some minor tweaks * fix compile errors * fix var builder prefixes * setup initial example * use fixed patch size of 37 (518 / 14) * debugged until output * print min and max values * add some dynamism to the output location * scale input image * extract prep function * extract output path function * normalize image with magic mean and std * add spectral coloring * squeeze in the right place * make enterpolation optional * use bail instead of panic * omit unnecessary Shape call * remove empty curly braces * use bail instead of assert * use vb and pp * remove closures * extract config object * Apply rustfmt. * Fix some clippy lints. * More lints. * Use the array methods. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-06-24 19:12:52 +02:00
Laurent Mazare	1df2bddccf	Add the layernorm specialized op. (#2212 ) * Add the layernorm cuda kernels. * Dedicated layer norm op. * Add the slower variant. * Plug the cuda implementation. * Add the metal variant. * Add a dedicated test. * Bugfix.	2024-05-24 15:58:01 +02:00
MilkFather	3bbb88fcb4	Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114 ) * add sigmoid op * small fix * add as a method on `Tensor` * implement gradient calculation for sigmoid * add sigmoid tests * we should have a specialized op for this * fix clippy * fix clippy 2 * Revert all previous commits in favor of a `CustomOp` based solution * use `CustomOp1` implementation * fix rustfmt * experimental add metal impl * add cuda kernel impl * fix fmt * Add a test + reduce some cuda duplication. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-29 11:04:43 +02:00
Laurent Mazare	e5c8b88f90	Apply the cast before the scaling. (#2135 )	2024-04-28 08:30:35 +02:00
Laurent Mazare	0fddec762e	RmsNorm kernel for metal. (#1895 ) * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.	2024-03-21 09:48:56 +01:00
Laurent Mazare	af7f8b87d3	Custom op for RmsNorm (#1890 ) * Trying out a custom RmsNorm cuda kernel. * CPU implementation for rms-norm. * Cuda wrappers. * Add some validation. * Add some testing. * More testing.	2024-03-21 06:36:28 +01:00
Kirpal Grewal	758366160e	add clone to candle dropout (#1814 )	2024-03-08 08:18:01 +01:00
ivarflakstad	0c09d10f32	Improve metal buffer usage (#1807 ) * Improve metal buffer usage * Clone cpu storage when loading to reduce wait_until_complete calls * Use powers of two for buffer sizes so reuse is more likely. * Select best available buffer by size. * Add count to MetalStorage -> can use buffer with different size Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co> * Simplify new buffer creation without blit copy. Revert &[] -> Vec * Add documentation on newBufferWithBytes safety / synchronization * Drop unused buffers after command buffer is done syncing. --------- Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>	2024-03-07 09:42:34 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Nicolas Patry	03641293ee	Clippy pass.	2023-12-18 15:22:43 +01:00
Nicolas Patry	6bc92e63cb	Addressing a lot of comments.	2023-12-15 13:06:04 +01:00
Nicolas Patry	aa04015098	Remove `unwrap()`.	2023-12-15 12:23:28 +01:00
Nicolas Patry	26540641c1	Renamed all kernel names.	2023-12-15 11:24:47 +01:00
Nicolas Patry	ece4c69a68	Fixing softmax.	2023-12-15 01:35:08 +01:00
Nicolas Patry	361f2ad2af	Working with merging encoders and using fences.	2023-12-14 16:05:33 +01:00
nicolas	87dc559817	Lots of updates including some stack of command buffers.	2023-12-12 17:41:56 +01:00
Nicolas Patry	4349ff1fc2	Starting to fix some tests. Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit `c65f68e988`. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.	2023-11-30 11:30:31 +01:00
Laurent Mazare	a2a20aeecc	Add the swiglu activation from the chatglm PR. (#1246 )	2023-11-02 20:01:34 +01:00
jamjamjon	d39d0c40fd	Add hard-sigmoid and hard-swish activations (#1244 ) * Add hard-sigmoid and hard-swish activations * Update ops.rs * Use / rather than div. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-11-02 18:20:27 +01:00
Laurent Mazare	55bc3382cf	Allow for different behavior between training and eval (#1213 ) * Forward with training. * Do not use dropout on vgg evaluation.	2023-10-29 07:53:09 +01:00
Laurent Mazare	34f2ecbc3b	Fix the leaky relu. (#898 )	2023-09-19 18:17:17 +01:00
Laurent Mazare	30be5b6660	Replication pad (#861 ) * Add the embed mapper convolutions. * Add the replication pad layer. * Use the replication-pad op. * Tweak a todo.	2023-09-15 14:06:21 +01:00
Laurent Mazare	2746f2c4be	DiffNeXt/unet (#859 ) * DiffNeXt/unet * Start adding the vae. * VAE residual block. * VAE forward pass. * Add pixel shuffling. * Actually use pixel shuffling.	2023-09-15 10:14:02 +01:00
Laurent Mazare	130fe5a087	Add the upblocks. (#853 )	2023-09-14 22:24:56 +01:00
Laurent Mazare	a0d65585db	Softmax implementation for cuda. (#747 )	2023-09-05 18:38:03 +01:00
Laurent Mazare	6615daf242	Tweaks to softmax. (#745 )	2023-09-05 15:22:27 +01:00
Laurent Mazare	1c9e5394a5	Add a custom softmax implementation. (#744 ) * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.	2023-09-05 14:20:23 +01:00
Laurent Mazare	2047d34b7c	More robust tests (so that they pass on accelerate). (#679 )	2023-08-30 18:10:10 +01:00
Laurent Mazare	3159982a89	Add a Dropout layer (#676 ) * Add a dropout layer. * Add an actual layer.	2023-08-30 16:19:28 +01:00
Laurent Mazare	5bb2fce998	Implement group-norm. (#334 ) * Implement group-norm. * Add some testing for group-norm.	2023-08-07 06:53:05 +01:00
Laurent Mazare	d34039e352	Add a stable diffusion example (#328 ) * Start adding a stable-diffusion example. * Proper computation of the causal mask. * Add the chunk operation. * Work in progress: port the attention module. * Add some dummy modules for conv2d and group-norm, get the attention module to compile. * Re-enable the 2d convolution. * Add the embeddings module. * Add the resnet module. * Add the unet blocks. * Add the unet. * And add the variational auto-encoder. * Use the pad function from utils.	2023-08-06 17:49:43 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	1f26042693	Move some shared functions to the nn module. (#221 )	2023-07-22 13:25:11 +01:00

36 Commits