candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Files

Laurent Mazare 2f22afd80e Cuda acceleration for quantized model. (#1754 )

* Boilerplate for the quantized cuda support.

* More basic cuda support.

* More cuda quantization (quantize on cpu for now).

* Add the dequantization bit.

* Start adding some dedicated cuda kernels from llama.cpp.

* Move the kernel code.

* Start interfacing with the kernel.

* Tweak the kernel launch params.

* Bugfix for quantized metal.

* Fix some clippy lints.

* Tweak the launch parameters.

* Tweak cuda basics to perform a quantized matmul.

* Perform the dequantization on the cpu + use cublas for matmul.

* Add the dequantization kernel.

* Test the qmatmul.

* More kernels.

* Matmul-vec kernel.

* Add a couple kernels.

* More dequantization kernels.

2024-02-25 18:11:47 +01:00

kernels

Wrapping code to call the custom op. (#225 )

2023-07-23 11:31:17 +01:00

cuda_kernels.rs

Cuda acceleration for quantized model. (#1754 )

2024-02-25 18:11:47 +01:00

main.rs

Use bindgen-cuda for the custom-kernel example. (#1536 )

2024-01-07 17:18:46 +01:00