Files
candle/candle-examples/examples
Laurent Mazare 2f22afd80e Cuda acceleration for quantized model. (#1754)
* Boilerplate for the quantized cuda support.

* More basic cuda support.

* More cuda quantization (quantize on cpu for now).

* Add the dequantization bit.

* Start adding some dedicated cuda kernels from llama.cpp.

* Move the kernel code.

* Start interfacing with the kernel.

* Tweak the kernel launch params.

* Bugfix for quantized metal.

* Fix some clippy lints.

* Tweak the launch parameters.

* Tweak cuda basics to perform a quantized matmul.

* Perform the dequantization on the cpu + use cublas for matmul.

* Add the dequantization kernel.

* Test the qmatmul.

* More kernels.

* Matmul-vec kernel.

* Add a couple kernels.

* More dequantization kernels.
2024-02-25 18:11:47 +01:00
..
2024-01-17 10:27:58 +01:00
2024-02-09 17:36:50 +01:00
2023-10-11 19:51:10 +01:00
2023-11-24 15:09:14 +00:00
2024-02-22 10:22:03 +01:00
2024-01-16 06:34:16 +01:00
2024-01-17 10:27:58 +01:00
2024-01-17 10:27:58 +01:00
2024-01-17 10:27:58 +01:00
2024-01-17 10:27:58 +01:00
2024-01-12 09:59:29 +01:00
2024-02-14 15:31:33 +01:00
2024-02-10 16:14:50 +01:00
2023-10-20 09:08:39 +01:00