candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 19:47:12 +00:00

Author	SHA1	Message	Date
Laurent Mazare	a2e925462c	Add the scatter in place ops. (#2923 ) * Add the scatter_set op. * Metal op. * Cuda version. * Merge the checks. * Add the actual ops.	2025-04-26 07:36:49 +02:00
Laurent Mazare	3827685524	Add the scatter op. (#2921 ) * Add the scatter op. * Backprop support. * Cuda support.	2025-04-25 21:46:58 +02:00
Laurent Mazare	a4c56a958e	Add the const-set op. (#2910 ) * Add the const-set op. * Cuda implementation. * Bugfix. * Metal cleanup. * Add the metal kernels. * Add some testing. * Finish the metal implementation. * Bump the version.	2025-04-19 10:07:02 +02:00
zachcp	3159f91b90	20241118 docs (#2629 ) * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes	2024-11-19 04:07:07 +01:00
Laurent Mazare	7b60bda4ed	Add support for cuda streams. (#2532 )	2024-10-02 21:30:58 +02:00
Laurent Mazare	9cff7bc3f4	Make it possible to use TF32 accumulation in F32 matmuls. (#2178 ) * Allow the use of tf32 accumulation in matmul. * Better timings. * Dummy versions for use when cuda is not enabled.	2024-05-11 12:28:39 +02:00
Laurent Mazare	ed7b99f525	Add a toggle for F16/BF16 accumulation in gemm. (#2141 ) * Add a toggle to control f16/bf16 gemm precision. * Use the faster variant in the quantized example. * Bugfix.	2024-04-29 09:21:07 +02:00
Laurent Mazare	8a05743a21	Add StorageRef. (#2113 ) * Add the storage-ref bits. * Add the metal implementation.	2024-04-23 13:23:27 +02:00
Laurent Mazare	53e5380bf6	Add a synchronize method to devices. (#2055 ) * Add a synchronize method to devices. * Metal version.	2024-04-14 16:32:55 +02:00
Laurent Mazare	6708870e63	Add the alloc_uninit function. (#1901 ) * Add the alloc_uninit function. * Dummy metal fix. * Lazy initialization.	2024-03-22 07:25:23 +01:00
Laurent Mazare	ec97c98e81	Async tensor copying. (#1900 )	2024-03-21 13:09:42 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Laurent Mazare	be4555c5a5	Add the conv-transpose1d op. (#1251 ) * Skeleton structure for conv-transpose1d. * CPU implementation for conv-transpose1d.	2023-11-03 09:44:46 +01:00
Laurent Mazare	9abeddd750	Make the cuda rng seedable. (#1056 )	2023-10-08 09:32:36 +01:00
Laurent Mazare	9a465e1b26	Add 1d upsampling. (#839 ) * Add 1d upsampling. * Add the interpolate functions.	2023-09-13 16:50:39 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00
Laurent Mazare	3cca89cc70	Add conv-transpose. (#635 ) * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.	2023-08-28 10:10:12 +01:00
LeeeSe	a5c5a893aa	add max_pool2d (#371 ) Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>	2023-08-09 18:05:26 +01:00
Laurent Mazare	b5bb5e056d	Add more conv2d support. (#340 ) * Add more conv2d support. * Conv2d cpu work. * Conv2d output shape.	2023-08-08 06:04:32 +01:00
Laurent Mazare	d0d7010682	CPU implementation for upsample-nearest2d. (#339 )	2023-08-07 20:07:10 +01:00
Laurent Mazare	fc265d9dcf	Some CLIP fixes for stable diffusion. (#338 ) * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.	2023-08-07 18:31:45 +01:00
Laurent Mazare	4b3bd79fbd	Remove the embedding ops in favor of index-select. (#299 ) * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.	2023-08-02 05:42:11 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	52c5d8c087	Add the gather op. (#219 ) * Start adding gather. * Gather cpu implementation + use in simple training. * Add scatter_add for the gradient of gather. * Simple cpu implementation of scatter_add. * Use gather in the simple-training backprop.	2023-07-22 07:21:28 +01:00
laurent	27174a82aa	Start adding index-add.	2023-07-21 20:12:48 +01:00
Laurent Mazare	fa08fb3126	Add the index-select op. (#209 ) * Add the index-select op. * Cpu implementation of index-select. * Add the cpu implementation for index-select.	2023-07-20 14:01:03 +01:00
Laurent Mazare	2a8f28d687	Op refactor (#208 ) * Add the binary and unary op enums to factorize some code. * Bugfix.	2023-07-20 12:28:45 +01:00
Laurent Mazare	e9c052bf94	Add the comparison operations. (#207 ) * Add the comparison operations. * Add the helper functions on the tensor side. * More cmp operations. * Cpu implementation for the comparison operations.	2023-07-20 09:40:31 +01:00
Laurent Mazare	cb687b4897	Add some more developed training examples. (#199 ) * Use contiguous tensors for variables. * Sketch the mnist example. * Start adding the reduce ops. * Renaming. * Refactor the reduce operations. * Bugfix for the broadcasting vectorization.	2023-07-19 15:37:52 +01:00
Nicolas Patry	dcb4a9291e	Expliciting how to enable cuda.	2023-07-14 17:08:05 +02:00
Laurent Mazare	64264d97c1	Modular backends (#138 ) * Add some trait to formalize backends. * Use the generic backend trait.	2023-07-11 11:17:02 +01:00
Laurent Mazare	ae79c00e48	Allow for uniform initialization in a single step. (#136 )	2023-07-11 08:52:29 +01:00
Laurent Mazare	f29b77ec19	Random initializers. (#128 ) * Random initialization. * CPU rng generation.	2023-07-10 18:26:21 +01:00
Laurent Mazare	270997a055	Add the elu op. (#113 )	2023-07-09 21:56:31 +01:00
laurent	a424d95473	Add more of the conv1d op.	2023-07-04 11:15:45 +01:00
laurent	3aac1047fe	Sketch the conv1d op.	2023-07-04 10:52:34 +01:00
laurent	122e334d0c	Simplify the pattern matching logic in the cuda backend.	2023-06-29 09:21:11 +01:00
laurent	3f0d9fbb25	Adapt the cuda bits.	2023-06-28 15:43:03 +01:00
laurent	14449ff80c	Get the cpu backend to compile.	2023-06-28 14:12:38 +01:00
Nicolas Patry	d7f729fb8f	Refactor the hierarchy.	2023-06-27 11:57:27 +02:00

40 Commits