candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	acc5bd335f	Cuda cleanup. (#2880 ) * Cuda cleanup. * More fixes.	2025-04-11 21:43:35 +02:00
Laurent Mazare	338f6a102e	Clippy 1.86 fixes for cuda. (#2868 )	2025-04-05 15:45:35 +02:00
Laurent Mazare	d9904a3baf	Update to cudarc 0.14 (breaking change). (#2858 ) * Start updating to cudarc 0.14. * Adapt a couple more things. * And a couple more fixes. * More tweaks. * And a couple more fixes. * Bump the major version number. * Proper module system for the cuda kernels. * Proper ptx loading. * Launch the sort kernel. * Custom op. * Start using the builder pattern. * More builder. * More builder. * Get candle-core to compile. * Get the tests to pass. * Get candle-nn to work too. * Support for custom cuda functions. * cudnn fixes. * Get flash attn to run. * Switch the crate versions to be alpha. * Bump the ug dependency.	2025-04-03 09:12:19 +02:00
Laurent Mazare	b52c2c6050	Clippy fixes for the cuda feature. (#2650 )	2024-11-29 09:01:34 +01:00
Laurent Mazare	def4c6cdee	Cuda quantized mmv bugfix. (#2526 )	2024-10-01 12:57:55 +02:00
Laurent Mazare	724650446c	Yet another cuda qmm padding fix. (#2509 )	2024-09-30 21:53:30 +02:00
Laurent Mazare	eb26e2467e	Add the cuda dequantize f16 kernels. (#2137 ) * Add the cuda dequantize f16 kernels. * Expose the cuda kernels. * Add some testing + fix. * Test the other cases too. * A few more tests. * Add an environment variable to enable the dequantize f16 + matmul behavior.	2024-04-28 20:05:05 +02:00
Laurent Mazare	8de0ce6cba	Add more QMMV cuda kernels. (#2077 ) * Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.	2024-04-18 08:36:43 +02:00
Laurent Mazare	2817643db9	Add the mmv kernels for small batch sizes. (#2075 ) * Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.	2024-04-16 21:30:51 +02:00
Laurent Mazare	f135b7963d	Fix for the batch dim in the quantized matmul example. (#2073 ) * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.	2024-04-15 20:00:28 +02:00
Laurent Mazare	8ad822a983	Add a function to clear the KV cache in falcon. (#2066 ) * Add a function to clear the KV cache in falcon. * Clippy.	2024-04-15 09:29:25 +02:00
Laurent Mazare	f7d5bf5b97	Faster kernels for quantized matmul on cuda (#2060 ) * Hook the quantized matmul cuda kernels. * Add a (currently broken) test. * Kernel fixes. * Fix by transposing the rhs matrix. * Add the q4-1 kernels. * Proper block sizes. * More details in the tests.	2024-04-15 08:32:47 +02:00
Laurent Mazare	318cb82f16	Quantized cuda tweaks. (#1981 ) * Quantized cuda tweaks. * Add some safety checks. * Factorize the dequantization bits.	2024-04-01 11:06:42 +02:00
Laurent Mazare	c7557b65dc	Switch the default to using the faster kernels. (#1978 ) * Switch the default to using the faster kernels. * Add the force-dmmv flag.	2024-04-01 10:00:11 +02:00
Laurent Mazare	cd29c7ccd4	More ggml cuda kernels (#1977 ) * Add more cuda kernels for quantized matmul. * Add the vec-dot bits. * Expose the quantized matmul-vec kernels. * Also include the quantize-q8-1 kernel. * Glue code for the q8-1 quantization. * mm-vec product via q8-1 quantization. * Add a test. * Add a mm test. * Get the test to return some sensible results. * Also test dmmv. * Fix the launch params. * Allow for tweaking the force_dmmv parameter while it's experimental.	2024-04-01 00:15:48 +02:00
Laurent Mazare	df5f69444e	Properly handle the batch dimension in cuda quantized matmul. (#1832 )	2024-03-10 20:23:43 +01:00
laurent	2c95b7394a	Handle Q5_0 and Q5_1 quants in cuda.	2024-02-29 10:54:01 +01:00
Laurent Mazare	6400e1b0a0	Fix the block size for some cuda kernels. (#1767 )	2024-02-27 14:08:33 +01:00
Laurent Mazare	badf886583	Cuda kernel for dequantizing q8k. (#1760 ) * Cuda kernel for dequantizing q8k. * Clippy lints.	2024-02-26 08:42:44 +01:00
Laurent Mazare	2f22afd80e	Cuda acceleration for quantized model. (#1754 ) * Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.	2024-02-25 18:11:47 +01:00

20 Commits