candle/cuda_kernels.rs at 36508a2c935cbe40584b85cc95b68532dba43b2d - candle - Gitea: Git with a cup of tea

huggingface/candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Files

Laurent Mazare 2f22afd80e Cuda acceleration for quantized model. (#1754 )

* Boilerplate for the quantized cuda support.

* More basic cuda support.

* More cuda quantization (quantize on cpu for now).

* Add the dequantization bit.

* Start adding some dedicated cuda kernels from llama.cpp.

* Move the kernel code.

* Start interfacing with the kernel.

* Tweak the kernel launch params.

* Bugfix for quantized metal.

* Fix some clippy lints.

* Tweak the launch parameters.

* Tweak cuda basics to perform a quantized matmul.

* Perform the dequantization on the cpu + use cublas for matmul.

* Add the dequantization kernel.

* Test the qmatmul.

* More kernels.

* Matmul-vec kernel.

* Add a couple kernels.

* More dequantization kernels.

2024-02-25 18:11:47 +01:00

2 lines

102 B

Rust

Raw Blame History

pub const LAYERNORM_KERNELS: &str = include_str!(concat!(env!("OUT_DIR"), "/layernorm_kernels.ptx"));