candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 11:56:45 +00:00

Author	SHA1	Message	Date
Nicolas Patry	b2db5adf82	Bad code removal.	2024-01-15 17:43:00 +01:00
Nicolas Patry	9c4b4f0da0	Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml).	2024-01-15 17:42:58 +01:00
Nicolas Patry	30313c3081	Moving to a proper build crate `bindgen_cuda`. (#1531 ) * Moving to a proper build crate `bindgen_cuda`. * Fmt.	2024-01-07 12:29:24 +01:00
Laurent Mazare	d35f0a1376	Bump the crate version to 0.3.3. (#1490 )	2023-12-28 13:38:30 +01:00
Laurent Mazare	94817dac56	Bump the crate version to 0.3.2. (#1452 )	2023-12-17 05:34:53 -06:00
Laurent Mazare	a209ce8ceb	Update for 0.3.1. (#1324 )	2023-11-11 18:48:52 +00:00
Laurent Mazare	2fe24ac5b1	Rework the cuda casting bits. (#1112 )	2023-10-17 09:44:51 +01:00
OlivierDehaene	75629981bc	feat: parse Cuda compute cap from env (#1066 ) * feat: add support for multiple compute caps * Revert to one compute cap * fmt * fix	2023-10-16 15:37:38 +01:00
Gonzalo	8f7973958c	fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 (#1037 ) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt	2023-10-05 18:46:13 +01:00
Laurent Mazare	c18a856e76	Add the rounding operators. (#1030 ) * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests.	2023-10-04 17:58:44 +01:00
Laurent Mazare	096dee7073	Bump the version to 0.3.0. (#1014 ) * Bump the version to 0.3.0. * Changelog update.	2023-10-01 13:51:57 +01:00
Gonzalo	fc59bc31bf	fix: add missing gpu fill_* (#996 )	2023-09-29 15:49:30 +01:00
Laurent Mazare	5e1c595e00	Optimize the index-select cuda kernel. (#976 )	2023-09-28 09:05:29 +01:00
Laurent Mazare	402ddcfcb4	Add the missing kernel. (#955 )	2023-09-24 17:21:37 +01:00
Gonzalo	a96878f235	cuda cast i64 (#925 )	2023-09-21 19:52:39 +01:00
Laurent Mazare	d7e48234d4	Add an erf based gelu op (#900 ) * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new).	2023-09-19 19:54:28 +01:00
Laurent Mazare	7dd8e12472	Bump the crate versions to v0.2.3. (#886 ) * Bump the crate version. * Also update the python bindings.	2023-09-18 12:14:03 +01:00
Charles Lew	1c09164021	Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. (#836 )	2023-09-13 11:39:22 +01:00
Laurent Mazare	2257f4d475	Bump the crate version + update the changelog. (#822 )	2023-09-12 06:39:24 +01:00
Laurent Mazare	dbd4561416	im2col version of the conv1d kernel. (#815 ) * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel.	2023-09-11 14:40:09 +01:00
Laurent Mazare	98d1242b8f	im2col based conv2d (#802 ) * im2col implementation for conv2d. * Fix for the im2col implementation to match the current conv2d. * Small optimization. * Add a cuda kernel. * Handle arbitrary layouts. * Im2Col cuda code.	2023-09-10 21:02:42 +01:00
Laurent Mazare	94c6a8d3d3	Add a dedicated cuda kernel for softmax. (#746 )	2023-09-05 17:53:20 +02:00
Laurent Mazare	ad8a62dbf5	Add tanh. (#675 ) * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes.	2023-08-30 13:54:50 +01:00
Laurent Mazare	618f4e4c78	Add some documentation. (#673 ) * Add some documentation. * Bump the crate version.	2023-08-30 11:54:00 +01:00
Laurent Mazare	393690387f	Support dilation in conv-transpose2d. (#671 )	2023-08-30 09:22:00 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00
Laurent Mazare	71221559d3	Fix the dilated convolutions. (#659 )	2023-08-29 16:37:42 +01:00
Laurent Mazare	a044907ffc	Dilated convolutions (#657 ) * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.	2023-08-29 16:12:11 +01:00
Laurent Mazare	037b41c9dc	Cuda conv transpose (#645 ) * Cuda kernel for conv-transpose. * Fix the cuda kernel. * Fix the tests.	2023-08-28 20:58:49 +01:00
Laurent Mazare	a3f97c143d	Bump the crate version + update CHANGELOG. (#628 )	2023-08-27 18:17:11 +01:00
Nicolas Patry	d4e75d5825	Let's keep the dirty code on its own.	2023-08-25 12:01:58 +00:00
Nicolas Patry	be371e827c	Intermediary float cast is necessary for cuda 11.8	2023-08-25 11:54:30 +00:00
Nicolas Patry	1c1e34735e	`static_cast` ?	2023-08-25 11:40:36 +00:00
Nicolas Patry	db8bab8b7a	Different casting ?	2023-08-25 10:49:22 +00:00
Nicolas Patry	bc131b402b	Repairing cast bf16/f16	2023-08-25 10:38:19 +00:00
Laurent Mazare	ca318a6ec7	Add to the cuda example a reproduction of the issue. (#579 ) * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.	2023-08-24 12:07:31 +01:00
Laurent Mazare	aba1e90797	Add some group parameter to convolutions. (#566 ) * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.	2023-08-23 12:58:55 +01:00
Laurent Mazare	9a5c7db91a	Add support for i64 (#563 ) * Add the i64 dtype. * Adapt the cuda kernels.	2023-08-23 10:42:19 +01:00
Laurent Mazare	a1812f934f	Add a yolo-v3 example. (#528 ) * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.	2023-08-20 18:19:37 +01:00
Laurent Mazare	a8f61e66cc	Bump the crates version to 0.1.2. (#522 )	2023-08-20 08:07:07 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Nicolas Patry	4a95d34c83	Compat windows.	2023-08-10 17:46:47 +02:00
Nicolas Patry	66d1c093e0	This is duplicated code on Cuda 12.2. Without it we can compile for 52 (but I get Operation Not supported when actually trying to use those kernels).	2023-08-10 09:20:18 +02:00
Laurent Mazare	e72ba0b9e7	Add the license files. (#335 )	2023-08-07 14:11:27 +01:00
Laurent Mazare	166bfd5847	Add the recip op + use it in stable-diffusion. (#331 ) * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid.	2023-08-06 21:14:52 +01:00
Laurent Mazare	4fe8a02f88	Update the repo location. (#305 )	2023-08-02 11:12:18 +01:00
Laurent Mazare	4b3bd79fbd	Remove the embedding ops in favor of index-select. (#299 ) * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.	2023-08-02 05:42:11 +01:00

1 2

78 Commits