More ggml cuda kernels (#1977)

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

* Add more cuda kernels for quantized matmul.

* Add the vec-dot bits.

* Expose the quantized matmul-vec kernels.

* Also include the quantize-q8-1 kernel.

* Glue code for the q8-1 quantization.

* mm-vec product via q8-1 quantization.

* Add a test.

* Add a mm test.

* Get the test to return some sensible results.

* Also test dmmv.

* Fix the launch params.

* Allow for tweaking the force_dmmv parameter while it's experimental.

This commit is contained in:

Laurent Mazare

2024-04-01 00:15:48 +02:00

committed by

GitHub

parent f9954b73ba

commit cd29c7ccd4

3 changed files with 1169 additions and 82 deletions

1089

candle-kernels/src/quantized.cu

View File

File diff suppressed because it is too large Load Diff

More ggml cuda kernels (#1977)

1089 candle-kernels/src/quantized.cu View File

1089

candle-kernels/src/quantized.cu

View File