Add quantization support for q2k, q3k, q4k and q5k (#524)

* first q2 implementation

* First Q4K and Q5K implementations

* fix `q2k` and `q5k`

* Some first cleanups

* run `clippy` on tests

* finally implement `q3k`

* deactivate `q3k` test on macos

* also disable the test on linux

* Fix floating bits in `q3k` dequantization

* Refactoring pass + reorder quants in file

* `fmt`

* Re-add `src` asserts and redefine `dst`
This commit is contained in:
Lukas Kreussel
2023-08-22 16:04:55 +02:00
committed by GitHub
parent 9bc811a247
commit 352383cbc3
4 changed files with 1111 additions and 456 deletions

View File

@ -6,6 +6,7 @@ pub mod ggml_file;
pub mod k_quants;
#[cfg(target_feature = "neon")]
pub mod neon;
pub mod utils;
pub use k_quants::GgmlType;