candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 02:38:10 +00:00

Author	SHA1	Message	Date
Laurent Mazare	34505fdf3a	Avoid using batched-matmul in nn::Linear. (#2883 ) * Avoid using batched-matmul in nn::Linear. * Also avoid batched matmul in conv1d. * Also tweak the conv2d. * Batched tests. * Also cover conv2d.	2025-04-12 19:53:58 +02:00
Bryan Lee	9541467d6b	Add `flip` to `tensor` (#2855 ) * Add `flip` to `tensor` * Move the tests to the proper places. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2025-04-01 09:07:16 +02:00
Mike Seddon	c930ab7e1a	upgrade half library to fix rand (#2806 ) fix lints	2025-03-14 09:01:54 +01:00
Laurent Mazare	77db8396d0	Explicit error when slice-set is called with the same src and dst. (#2733 )	2025-01-22 21:31:49 +01:00
Laurent Mazare	6fd2f63a15	Bump the ug dependency. (#2720 ) * Bump the ug dependency. * Fix some test. * Fix the ug test.	2025-01-16 09:39:16 +01:00
Laurent Mazare	0e2c8c17fb	UG metal integration. (#2580 )	2024-10-27 15:20:37 +01:00
Laurent Mazare	594d984f9c	Support for UG kernels. (#2579 ) * Support for UG kernels. * Add a dedicated test.	2024-10-27 13:37:19 +01:00
Anubhab Bandyopadhyay	dcd83336b6	Testcases (#2567 )	2024-10-17 13:00:45 +02:00
Anubhab Bandyopadhyay	a2bcc227df	Efficient implementation of `Tensor::ones()` for `metal` (#2512 ) * WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran	2024-10-01 19:11:59 +02:00
Laurent Mazare	844d45cde4	Bugfix for the metal elu kernel. (#2490 ) * Bugfix for the metal elu kernel. * Add a test.	2024-09-21 15:03:19 +02:00
Laurent Mazare	d4b6f6eef6	Add a minimal test for the metal bf16 matmul. (#2381 )	2024-08-01 11:22:46 +02:00
Yun-Jhong Wu	bd80078acf	Fix log_sum_exp to handle large positive/negative inputs (#2367 )	2024-08-01 10:37:02 +02:00
Ivor Wanders	f25173d68b	Fix for backprop in ConvTranspose2D with stride of 2 (#2337 ) * Add gradient test for conv_transpose2d with stride of 2. * Swap dilation and stride in ConvTranspose2D backpropagation. Without this, a shape mismatch occurs with a stride of 2 and dilation of 1. * Add further tests of the ConvTranspose2D gradient. Values calculated with torch, minor numerical errors adjusted and commented.	2024-07-17 19:22:23 +02:00
Laurent Mazare	01545f7303	Add a slice_set op. (#2193 ) * Add a slice_set op. * Add some testing. * Add the dedicated kv-cache module. * Derive debug and clone. * Expose more kv-cache functions. * Return the current data when appending. * Use the new cache in the quantized phi3 model.	2024-05-18 15:58:18 +02:00
Laurent Mazare	21f82a5155	Add SliceSafetensors. (#2179 ) * Add SlicedSafetensors. * And add some testing.	2024-05-11 13:15:42 +02:00
Laurent Mazare	eb26e2467e	Add the cuda dequantize f16 kernels. (#2137 ) * Add the cuda dequantize f16 kernels. * Expose the cuda kernels. * Add some testing + fix. * Test the other cases too. * A few more tests. * Add an environment variable to enable the dequantize f16 + matmul behavior.	2024-04-28 20:05:05 +02:00
Laurent Mazare	805f3be8e1	Add a sort function. (#2134 )	2024-04-28 08:18:04 +02:00
Laurent Mazare	96a48e5cc4	Add argsort. (#2132 ) * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.	2024-04-27 20:17:35 +02:00
Laurent Mazare	8de0ce6cba	Add more QMMV cuda kernels. (#2077 ) * Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.	2024-04-18 08:36:43 +02:00
Laurent Mazare	2817643db9	Add the mmv kernels for small batch sizes. (#2075 ) * Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.	2024-04-16 21:30:51 +02:00
Laurent Mazare	f135b7963d	Fix for the batch dim in the quantized matmul example. (#2073 ) * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.	2024-04-15 20:00:28 +02:00
Laurent Mazare	e198bb0816	Handle zero dims in some simple operations. (#2064 ) * Handle zero dims in some simple operations. * Handle zero-dims in matmul. * More testing.	2024-04-15 09:18:54 +02:00
Thomas Santerre	c5626b8271	Add support for "sign" on tensors (#2012 ) * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-04 22:32:47 +02:00
Laurent Mazare	e6a5b82ba6	Fix the matmul layout for accelerate & mkl. (#2011 ) * Fix the matmul layout for accelerate & mkl. * Reduce the required precision for pow (because of accelerate). * And a fix the gelu f16 test.	2024-04-04 19:18:03 +02:00
Laurent Mazare	30b145150f	Optimize the gelu f16 opt. (#2008 ) * Optimize the gelu f16 opt. * And add a test.	2024-04-04 16:28:23 +02:00
Laurent Mazare	08c049def3	Improve the handling of matmul with squeezed layouts. (#1998 ) * Improve the handling of matmul with squeezed layouts. * Fix for the cuda backend. * Revert the temporary fix.	2024-04-02 23:17:05 +02:00
Laurent Mazare	ab86cd37c8	Support i64 in index-select on metal. (#1951 ) * Support i64 in index-select on metal. * Add some testing of index-select for all dtypes.	2024-03-27 16:30:07 +01:00
Laurent Mazare	a9abde5f93	More flexible matmul contiguity checks. (#1949 ) * More flexible matmul contiguity checks. * Also relax the checks on the metal side.	2024-03-27 10:59:05 +01:00
Laurent Mazare	fdfe8fd129	Preliminary support for inplace ops. (#1921 ) * Preliminary support for inplace ops. * Add a test.	2024-03-23 14:16:19 +01:00
Kirpal Grewal	cc856db9ce	Backwards for ConvTranspose2D (#1910 ) * add documentation for nackprop * add backwards for ConvTranspose2D * add test python code to test	2024-03-23 07:05:55 +01:00
Thomas Santerre	9563a5fee4	Add support for conv_transpose2d on Metal backend (#1903 ) * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal	2024-03-21 18:08:45 +01:00
Laurent Mazare	455c42aa72	Avoid copying the data on squeeze and unsqueeze. (#1884 ) * Avoid copying the data on squeeze and unsqueeze. * Fix the quantized llama example. * Unrelated fix for the quantized stable-lm example on cuda. * Fix for mamba on cuda (unrelated to the PR).	2024-03-20 13:04:36 +01:00
Thomas Santerre	2a8679509e	Add support for conv_transpose1d for metal backend (#1874 ) * first attempt * progress * integrate into metal backend * finish and get test passing * add other dtype support * update transpose1d dtypes supported	2024-03-19 08:46:58 +01:00
Thomas Santerre	04a61a9c72	Add avg_pool2d metal implementation for the metal backend (#1869 ) * implement metal avg pool 2d * fixX * add suggested precision workaround for the accumulator	2024-03-18 18:50:14 +01:00
Thomas Santerre	754fa1e813	Add support for max_pool2d for Metal backend (#1863 ) * first pass at implementation of maxpool2d * Add definitions for other dtypes * add tests for other dtypes * Cosmetic tweaks + re-enable maxpool2d tests for metal. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-03-18 08:33:30 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Laurent Mazare	3440cec3a0	Fast CPU kernel for transposed 1d convolutions. (#1822 ) * Fast CPU kernel for transposed 1d convolutions. * Bugfix.	2024-03-08 22:43:07 +01:00
Niklas Hallqvist	be5b68cd0b	Metal random-generation bug fixes (#1811 ) * use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together. * The seeding was incorrect and used the address instead of the value of the passed in seed. * Add a check that likely exhibits failure to update the seed between generation of random tensors. * Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units. * By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted. * Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted." This reverts commit `d7302de9` Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119 * The Metal random kernel failed to set element N/2 of tensors with N elements, N being even. The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number. In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors. * Add a test catching any deterministic tensor element in rand and randn output. --------- Co-authored-by: niklas <niklas@appli.se> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2024-03-08 16:11:50 +01:00
Laurent Mazare	bd9ab9bc04	Add a cuda kernel for dequantizing q8_0. (#1804 )	2024-03-05 09:50:37 +01:00
laurent	2c95b7394a	Handle Q5_0 and Q5_1 quants in cuda.	2024-02-29 10:54:01 +01:00
Laurent Mazare	6400e1b0a0	Fix the block size for some cuda kernels. (#1767 )	2024-02-27 14:08:33 +01:00
Laurent Mazare	c753f72c85	Support for attention bias in gemma + refactor things a bit. (#1744 ) * Support for attention bias in gemma + refactor things a bit. * Fix the cuda tests.	2024-02-22 09:35:28 +01:00
Kirpal Grewal	8013b50829	Add grads for interpolate1d (#1742 ) * add backprop for interpolate1d * fix clippy lint * correct fix clippy lint	2024-02-22 08:44:01 +01:00
Laurent Mazare	fc67d878bb	Bugfix for conv-transpose1d (#1734 ) * Add a currently broken test. * Bugfix + fix test.	2024-02-19 09:04:49 +01:00
Laurent Mazare	1fb728772d	Support for groups in conv-transpose1d. (#1731 ) * Groups support in conv-transpose-1d. * Remove dangling file.	2024-02-18 21:28:07 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Laurent Mazare	d0aa197b07	ConvTranspose1d cuda support. (#1697 ) * ConvTranspose1d cuda support. * Add the conv-transpose1d kernel. * Remove some unused variables.	2024-02-12 15:03:18 +01:00
Dilshod Tadjibaev	e5eb9602d0	Add support for loading Fortran contiguous tensors (#1672 ) * Add support for loading Fortran contiguous tensors This commit introduces the ability to handle Fortran contiguous tensors in the tensor loading process. Previously, the code only supported loading tensors that were contiguous in memory, failing with an error for non-contiguous tensors. With this update, tensors identified as Fortran contiguous (column-major order) are now correctly handled by reversing their dimensions after loading. This enhancement ensures broader compatibility with different tensor layouts, improving the robustness of tensor loading operations. - Check if a tensor is Fortran contiguous using the `is_fortran_contiguous` flag. - For Fortran contiguous tensors, reverse the dimensions after loading to correctly represent their layout in memory. - Continue to bail out with an error for tensors that are neither C contiguous nor Fortran contiguous, maintaining the previous behavior for non-contiguous tensors without explicit support. This change addresses the issue of loading Fortran contiguous tensors, which was previously unsupported, thereby extending the functionality of the tensor loading mechanism to accommodate a wider variety of tensor layouts. * Add reshape step to handle fortran contiguous case * Skip fortran contiguous fix if rank is < 2 * Fail on rank 0, 1 if contiguous	2024-02-07 21:49:59 +01:00
Dilshod Tadjibaev	b75e8945bc	Enhance pickle to retrieve state_dict with a given key (#1671 )	2024-02-06 21:17:33 +01:00
Laurent Mazare	adfae2460a	Fix rustfmt. (#1669 )	2024-02-06 12:06:06 +01:00

1 2 3 4

177 Commits