candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Author	SHA1	Message	Date
Laurent Mazare	77db8396d0	Explicit error when slice-set is called with the same src and dst. (#2733 )	2025-01-22 21:31:49 +01:00
Laurent Mazare	6fd2f63a15	Bump the ug dependency. (#2720 ) * Bump the ug dependency. * Fix some test. * Fix the ug test.	2025-01-16 09:39:16 +01:00
Laurent Mazare	2344c4e4b8	Clippy fixes for 1.84. (#2710 )	2025-01-10 10:15:15 +01:00
Laurent Mazare	e38e2a85dd	Fix a cuda warning. (#2693 )	2024-12-31 09:06:10 +01:00
Laurent Mazare	62ced44ea9	Add a Context trait similar to anyhow::Context. (#2676 ) * Add a Context trait similar to anyhow::Context. * Switch two unwrap to context.	2024-12-22 09:18:13 +01:00
zachcp	6f715f9256	add scatter add (#2656 )	2024-12-01 18:39:38 +01:00
zachcp	dba7a9c93e	add u32 - U32 gather (#2653 )	2024-11-30 23:18:07 +01:00
Laurent Mazare	b52c2c6050	Clippy fixes for the cuda feature. (#2650 )	2024-11-29 09:01:34 +01:00
Anubhab Bandyopadhyay	54e7fc3c97	Lint fixes introduced with Rust 1.83 (#2646 ) * Fixes for lint errors introduced with Rust 1.83 * rustfmt * Fix more lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-11-28 23:00:21 +01:00
Andrei Fajardo	c12db594e3	fix typo (#2606 )	2024-11-23 08:40:00 +01:00
zachcp	3159f91b90	20241118 docs (#2629 ) * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes	2024-11-19 04:07:07 +01:00
Laurent Mazare	0ed24b9852	Add max-all/min-all. (#2616 )	2024-11-14 21:08:04 +01:00
Laurent Mazare	06350c31c7	Add some missing index-select metal kernels. (#2613 ) * Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul.	2024-11-12 17:10:12 +01:00
zachcp	3769206583	Update docs (#2553 ) * add module docs for candle-core * doc each of the candle-nn modules and add the links to the doc page	2024-11-11 22:13:52 +01:00
Eric Buehler	e2b6b367fa	Add some fast Metal MLX SDPA kernels (#2584 ) * Add some fast Metal MLX SDPA kernels (#32) * Sketch the sdpa kernel * Add full sdpa kernel, * Add test * Add vectorized kernel for decoding * Update tests * Add some docs * Fix sdpa_vector names * Add softcapping for vectorized sdpa * Add softcapping for full sdpa * Add support for head dim 32, 96, 256 * Add support for head dim 32, 96, 256 * Update docs * Add update notice * Clippy and format * Conditional compilation for bf16 * Use it in quantized llama * Some review comments * Use set_params! * Remove unused * Remove feature * Fix metal sdpa for v stride * Remove comma * Add the dim method to layout and shape. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-11-05 09:28:00 +01:00
Laurent Mazare	0e2c8c17fb	UG metal integration. (#2580 )	2024-10-27 15:20:37 +01:00
Laurent Mazare	594d984f9c	Support for UG kernels. (#2579 ) * Support for UG kernels. * Add a dedicated test.	2024-10-27 13:37:19 +01:00
Anubhab Bandyopadhyay	dcd83336b6	Testcases (#2567 )	2024-10-17 13:00:45 +02:00
Laurent Mazare	e4a96f9e7c	Switch to using the MLX matmul by default. (#2547 )	2024-10-06 23:24:55 +02:00
Laurent Mazare	6faecaa616	Fix for cudnn bf16 conv2d. (#2535 )	2024-10-02 23:18:55 +02:00
Laurent Mazare	7b60bda4ed	Add support for cuda streams. (#2532 )	2024-10-02 21:30:58 +02:00
Anubhab Bandyopadhyay	a2bcc227df	Efficient implementation of `Tensor::ones()` for `metal` (#2512 ) * WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran	2024-10-01 19:11:59 +02:00
Laurent Mazare	def4c6cdee	Cuda quantized mmv bugfix. (#2526 )	2024-10-01 12:57:55 +02:00
Laurent Mazare	724650446c	Yet another cuda qmm padding fix. (#2509 )	2024-09-30 21:53:30 +02:00
Laurent Mazare	844d45cde4	Bugfix for the metal elu kernel. (#2490 ) * Bugfix for the metal elu kernel. * Add a test.	2024-09-21 15:03:19 +02:00
Laurent Mazare	af2104078f	Metal commands refactoring (#2489 ) * Split out the commands part of the metal device. * Make most fields private. * Move the allocator back. * Rework the encoder provider type.	2024-09-21 13:18:42 +02:00
ivnsch	382c6b51af	Improve error message (#2485 )	2024-09-20 07:11:41 -06:00
Laurent Mazare	6eea45a761	Add a couple cast metal kernels. (#2479 )	2024-09-15 22:27:46 +02:00
Shengtuo Hu	ebf722b446	Export TensorIndexer public to candle users (#2477 )	2024-09-13 22:21:57 +02:00
Laurent Mazare	b60faebea4	Missing metal kernels. (#2474 )	2024-09-12 13:58:50 +02:00
Laurent Mazare	72d649058b	Hook the MLX matmul kernels in candle-core. (#2473 )	2024-09-12 13:52:59 +02:00
Laurent Mazare	afb6575835	Use the new MLX kernels to handle the BF16 matmul. (#2470 )	2024-09-11 17:34:05 +02:00
hongmengning	13b2a8a4a0	Complete the missing backticks in the comments (#2469 )	2024-09-11 16:37:05 +02:00
Laurent Mazare	aafa24ed93	Update cudarc to 0.12. (#2451 ) * Update cudarc to 0.12. * Some cudnn tweaks.	2024-08-27 10:10:30 +02:00
Laurent Mazare	736d8eb752	Stream tensor (#2429 ) * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive. * Add StreamTensor.	2024-08-17 21:54:28 +02:00
Laurent Mazare	7cff5898ec	Support Minus(u) for arbitrary values of u, e.g. Minus(3). (#2428 ) * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive.	2024-08-17 21:29:01 +02:00
Carsten Csiky	d3fe989d08	Add documentation examples for `Tensor::i` and `Tensor::narrow` methods (#2308 ) * Add documentation examples for `Tensor` methods * Apply fmt. * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-08-10 08:11:09 +02:00
MilkFather	c0a559d427	optimize gradient for silu a bit (#2393 )	2024-08-04 11:24:17 +02:00
Laurent Mazare	0fcb40b229	Revert the bf16 gemm metal changes for now. (#2386 )	2024-08-01 23:08:47 +02:00
Laurent Mazare	d4b6f6eef6	Add a minimal test for the metal bf16 matmul. (#2381 )	2024-08-01 11:22:46 +02:00
Laurent Mazare	957d604a78	Enable BF16 on metal. (#2380 )	2024-08-01 11:05:07 +02:00
Takanori MAEHARA	ce90287f45	Add get_ids to GradStore (#2379 )	2024-08-01 10:56:13 +02:00
Laurent Mazare	1ba87a9450	Use BF16 on metal when possible. (#2378 )	2024-08-01 10:48:58 +02:00
Yun-Jhong Wu	bd80078acf	Fix log_sum_exp to handle large positive/negative inputs (#2367 )	2024-08-01 10:37:02 +02:00
Laurent Mazare	8696cf6494	Enable the affine kernel for u8/u32. (#2376 )	2024-08-01 10:03:11 +02:00
Eric Buehler	0f5cbb08b3	Add support for Llama 3.1 (#2359 ) * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable	2024-07-26 21:32:26 +02:00
Ivor Wanders	f25173d68b	Fix for backprop in ConvTranspose2D with stride of 2 (#2337 ) * Add gradient test for conv_transpose2d with stride of 2. * Swap dilation and stride in ConvTranspose2D backpropagation. Without this, a shape mismatch occurs with a stride of 2 and dilation of 1. * Add further tests of the ConvTranspose2D gradient. Values calculated with torch, minor numerical errors adjusted and commented.	2024-07-17 19:22:23 +02:00
Alexey Gerasev	6a4741bbf9	Fix Elu gradient NaN on large input (#2328 ) * Fix Elu gradient NaN on large input * Reuse previously computed exp in Elu	2024-07-16 14:41:16 +02:00
Laurent Mazare	25960676ca	Add a basic metal example with capture (#2324 ) * Add some tracing. * Get the trace to work.	2024-07-09 12:38:11 +02:00
Laurent Mazare	6baa1d486b	Fix a bug in the metal implemtation of col2im1d. (#2284 )	2024-06-22 23:21:20 +02:00

1 2 3 4 5 ...

761 Commits