candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 11:56:45 +00:00

Author	SHA1	Message	Date
Laurent Mazare	fc67d878bb	Bugfix for conv-transpose1d (#1734 ) * Add a currently broken test. * Bugfix + fix test.	2024-02-19 09:04:49 +01:00
Laurent Mazare	1fb728772d	Support for groups in conv-transpose1d. (#1731 ) * Groups support in conv-transpose-1d. * Remove dangling file.	2024-02-18 21:28:07 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Laurent Mazare	d0aa197b07	ConvTranspose1d cuda support. (#1697 ) * ConvTranspose1d cuda support. * Add the conv-transpose1d kernel. * Remove some unused variables.	2024-02-12 15:03:18 +01:00
Dilshod Tadjibaev	e5eb9602d0	Add support for loading Fortran contiguous tensors (#1672 ) * Add support for loading Fortran contiguous tensors This commit introduces the ability to handle Fortran contiguous tensors in the tensor loading process. Previously, the code only supported loading tensors that were contiguous in memory, failing with an error for non-contiguous tensors. With this update, tensors identified as Fortran contiguous (column-major order) are now correctly handled by reversing their dimensions after loading. This enhancement ensures broader compatibility with different tensor layouts, improving the robustness of tensor loading operations. - Check if a tensor is Fortran contiguous using the `is_fortran_contiguous` flag. - For Fortran contiguous tensors, reverse the dimensions after loading to correctly represent their layout in memory. - Continue to bail out with an error for tensors that are neither C contiguous nor Fortran contiguous, maintaining the previous behavior for non-contiguous tensors without explicit support. This change addresses the issue of loading Fortran contiguous tensors, which was previously unsupported, thereby extending the functionality of the tensor loading mechanism to accommodate a wider variety of tensor layouts. * Add reshape step to handle fortran contiguous case * Skip fortran contiguous fix if rank is < 2 * Fail on rank 0, 1 if contiguous	2024-02-07 21:49:59 +01:00
Dilshod Tadjibaev	b75e8945bc	Enhance pickle to retrieve state_dict with a given key (#1671 )	2024-02-06 21:17:33 +01:00
Laurent Mazare	adfae2460a	Fix rustfmt. (#1669 )	2024-02-06 12:06:06 +01:00
Roma Klapaukh	1ba11f22d6	Fix: pth files don't load on Windows (#1661 ) * Don't treat zip path as OS path * Add a test case * Add code to generate test pth data	2024-02-06 08:50:55 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Laurent Mazare	e6d86b0819	Add the pow operator. (#1583 ) * Add the pow operator. * Support the pow operation in onnx.	2024-01-13 20:24:06 +01:00
Laurent Mazare	41915184bb	Bugfix for dequantizing q5k layers. (#1569 )	2024-01-11 23:15:11 +01:00
Laurent Mazare	0eb90ed783	Simpler repro for the neon optimization issue + bugfix (#1544 ) * Simpler repro for the neon optimization issue. * Bugfix for q4k. * Improve the fix, share the dot-prod bit. * Clippy fixes. * Fix for q6k. * Also fix for q2k. * Use the new shared dotprod. * Add more testing.	2024-01-07 20:21:49 +01:00
Laurent Mazare	96f1a28e39	Add a simple full method. (#1455 ) * Add a simple implementation of the full method. * Add the docstring.	2023-12-17 20:15:57 -05:00
Laurent Mazare	4cb443d00a	Fix the logsumexp test. (#1426 )	2023-12-12 10:56:11 -06:00
Wenqing Zong	77252ffb82	Add logsumexp function (#1424 )	2023-12-12 10:32:17 -06:00
KGrewal1	18eb87f25f	Upsample grad (#1420 ) * encode size of upsample in enum * working convolution method for limited 2d kernels * add test for sf 3 interpolation * add higher dimensional tests, fix to work with multichannel input * Remove commented out line. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-12-10 08:43:24 +01:00
Nicolas Patry	e2eb6590ed	Merge pull request #1323 from huggingface/metal3 Adding the test scaffolding.	2023-11-27 13:06:01 +01:00
Laurent Mazare	481c45d78d	Add a basic implementation for slice-assign. (#1377 )	2023-11-26 17:31:22 +00:00
Laurent Mazare	bfa7c8fc01	Implement the module trait directly for QMatMul. (#1372 )	2023-11-25 10:09:45 +00:00
Nicolas Patry	8d6c6de8e0	Missing new test.	2023-11-20 14:38:35 +01:00
Nicolas Patry	7ec345c2eb	Adding the test scaffolding.	2023-11-20 14:38:35 +01:00
Laurent Mazare	c6763e3b41	Add a simple implementation of cumsum. (#1334 ) * Add a simple implementation of cumsum. * Add another test.	2023-11-15 21:11:15 +00:00
Laurent Mazare	347e31c9ff	Add the tril/triu/eye ops. (#1333 ) * Add tril/triu/eye. * Revert the metal crate tweak.	2023-11-15 20:34:37 +00:00
Laurent Mazare	9e666d4229	Add the var method. (#1315 ) * Add the var method. * Add a test.	2023-11-10 22:47:57 +01:00
drbh	7051fb8098	feat: add backprop for elu (#1269 ) * feat: add backprop for elu * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2023-11-04 21:26:41 +01:00
drbh	3173b1ce3b	feat: impl backprop for erf and gelu-erf (#1258 ) * impl backprop for erf anf gelu-erf * feat: unary tests added for erf and gelu-erf * fix: (clippy) remove immediately dereferenced ref * fix: improve comments with pytorch code snippet * fix: adjust comment typo in backprop impl	2023-11-03 21:32:30 +01:00
Laurent Mazare	b07b2350b6	Test for the transposed conv1d. (#1254 )	2023-11-03 13:10:28 +01:00
Laurent Mazare	5fc66bd4ba	Support negative steps in arange. (#1218 )	2023-10-30 07:40:54 +00:00
Laurent Mazare	154c674a79	Add i64-abs. (#1216 )	2023-10-29 15:28:53 +00:00
Laurent Mazare	46d6566c99	Fix the conv2d gradient computation. (#1214 )	2023-10-29 09:50:04 +00:00
KGrewal1	807e3f9f52	derivative for GELU (#1160 ) * derivative for GELU * add tests	2023-10-23 20:23:45 +01:00
Laurent Mazare	87eb1658e1	Add pad_with_same. (#1127 ) * More model cloning. * More cloning on quantized models. * Add pad-with-same. * Add some tests.	2023-10-18 23:13:37 +01:00
Laurent Mazare	7473c4ceca	Fix the npy read function and add some testing. (#1080 )	2023-10-12 15:25:05 +02:00
Laurent Mazare	7f7d95e2c3	Add the round-to function. (#1039 )	2023-10-05 20:28:09 +01:00
Gonzalo	8f7973958c	fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 (#1037 ) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt	2023-10-05 18:46:13 +01:00
Laurent Mazare	c18a856e76	Add the rounding operators. (#1030 ) * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests.	2023-10-04 17:58:44 +01:00
Laurent Mazare	043cc25766	Fix for the index-select cuda setup. (#1022 ) * Fix for index-select. * Better fix + add some testing.	2023-10-03 10:21:46 +01:00
Laurent Mazare	cddfc3944c	Add the q8k vec-dot multiplication. (#1019 )	2023-10-02 21:53:34 +01:00
Laurent Mazare	089fc3b584	Improve the quantized whisper setup. (#1018 ) * Improve the quantized whisper setup. * Fix the config file paths. * Use the standard matmul where possible.	2023-10-02 17:17:46 +01:00
Laurent Mazare	263a172202	Improve the testing of the optimized quantized vec-dot ops (#1016 ) * Expose the unopt functions for testing. * Better testing of the optimized quantized computations.	2023-10-02 09:50:43 +01:00
Laurent Mazare	4e55aaa51f	Simd128 version of the q2k-q8k vecdot product. (#1011 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator. * Simdify the q2k-q8k vecdot product. * Cosmetic change.	2023-09-30 20:12:41 +01:00
Gonzalo	fc59bc31bf	fix: add missing gpu fill_* (#996 )	2023-09-29 15:49:30 +01:00
Laurent Mazare	8601537e31	Add slice-scatter. (#927 ) * Add slice-scatter. * Add the op. * Make transpose be a no-op when the dimensions are identical. * Add the backprop. * And add some gradient test.	2023-09-22 12:18:16 +01:00
Laurent Mazare	7b26e513f1	Add the erf function. (#917 )	2023-09-21 06:19:10 +01:00
Laurent Mazare	d7e48234d4	Add an erf based gelu op (#900 ) * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new).	2023-09-19 19:54:28 +01:00
Laurent Mazare	18d3c803a8	Scalar support in minimum/maximum. (#832 ) * Scalar support in minimum/maximum. * Add a clamp method to tensors.	2023-09-13 08:24:58 +01:00
Laurent Mazare	258ac32c38	Fix cuda randn when generating an odd number of values. (#793 )	2023-09-09 18:44:21 +01:00
Laurent Mazare	ad8a62dbf5	Add tanh. (#675 ) * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes.	2023-08-30 13:54:50 +01:00
Laurent Mazare	393690387f	Support dilation in conv-transpose2d. (#671 )	2023-08-30 09:22:00 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00

1 2 3

134 Commits