Commit Graph

46 Commits

Author SHA1 Message Date
41915184bb Bugfix for dequantizing q5k layers. (#1569) 2024-01-11 23:15:11 +01:00
ef33df7ae2 No need for the even constraint on vecdot-q40-q80. (#1202) 2023-10-28 07:23:59 +01:00
11d3687cc6 Simd128 optimized q8k vecdot. (#1026) 2023-10-03 15:29:48 +01:00
dac73edb34 AVX optimized q8k vecdot. (#1024) 2023-10-03 12:10:58 +01:00
7670fe7d1f neon optimized q8k multiplication. (#1021)
* neon optimized q8k multiplication.

* Bugfixes.

* simdification.
2023-10-02 23:26:34 +01:00
cddfc3944c Add the q8k vec-dot multiplication. (#1019) 2023-10-02 21:53:34 +01:00
263a172202 Improve the testing of the optimized quantized vec-dot ops (#1016)
* Expose the unopt functions for testing.

* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
5130a7da32 Simd128 version of q6k vec-dot. (#1015)
* Add a specific function for the simd128 q6k vec-dot.

* Simdification.

* More simdification.
2023-10-01 19:44:12 +01:00
4e55aaa51f Simd128 version of the q2k-q8k vecdot product. (#1011)
* Sketch the simd128 version of q2k vecdot.

* Use a single accumulator.

* Simdify the q2k-q8k vecdot product.

* Cosmetic change.
2023-09-30 20:12:41 +01:00
25657804ef Simd128 q2k vecdot (#982)
* Sketch the simd128 version of q2k vecdot.

* Use a single accumulator.
2023-09-28 12:16:35 +01:00
9cb110c44c Sketch a simd128 optimized q4k vecdot. (#977)
* Sketch a simd128 optimized q4k vecdot.

* Simdify.

* More quantization optimizations.

* Again more simdification.

* Simdify the splitting loop.
2023-09-27 20:19:38 +01:00
667f01c173 Simd128 vec-dot for q4_0. (#974)
* Simd128 vec-dot for q4_0.

* Bugfix.

* Add wasm tests.

* Bugfix for the q40 vecdot.

* More quantization tests.
2023-09-27 14:15:30 +01:00
e59784e353 simd128 optimized q8_0 vecdot (#972)
* wasm/simd128 version of the quantized q8_0 vecdot.

* Add the missing conversion.
2023-09-27 11:03:20 +01:00
98172d46fa Fix some errors about BlockQ8_1 (#776)
* use int8 type instead of uint8 for BlockQ8_1.qs

The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

* fix sum error in vec_dot of BlockQ4_1

Ref: ebc96086af/ggml.c (L2840)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

* fix sum error in vec_dot of BlockQ5_1

Ref: ebc96086af/ggml.c (L3490)

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>

---------

Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>
2023-09-08 13:29:40 +01:00
a1a5ab8b0a Neon optimized vecdot (#666)
* Q5k vecdot.

* Add the q3k vecdot.

* Q2k vecdot.

* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
ee8bb1bde1 Add avx implemenetations of q2k, q3k and q5k vec-dot functions (#654)
* `q2k` avx implementation

* `q3k` avx implementation

* `q5k` avx implementation

* `avx` make masks constant

* clippy stuff
2023-08-29 13:35:56 +01:00
4b8d57ba15 AVX version of the q4k vecdot. (#651) 2023-08-29 09:41:17 +01:00
1da71a5da1 Neon optimized version of the q4k vecdot product. (#632) 2023-08-27 21:30:47 +01:00
a8b39dd7b7 Fix for q5_1 quantization. (#617)
* Fix for q5_1 quantization.

* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d Quantization tests + fix some issues. (#616) 2023-08-27 08:17:38 +01:00
28658054ff More missing quantized bits. (#615)
* Q4_1 support.

* Add Q5_1 quantization.

* Tweak.
2023-08-27 07:52:26 +01:00
f704e39761 Missing quants ops (#611)
* Another transmute tweak.

* Changelog tweak.

* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
fdf15f0e05 Another transmute tweak. (#610)
* Another transmute tweak.

* Changelog tweak.
2023-08-26 13:00:24 +01:00
06b37ea7ad Avoid using tmp values. (#609) 2023-08-26 12:28:28 +01:00
c72eb3d75b Add reference implementation for q4k and q5k (#586)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix

* `q4k` vec-dot

* `q5k` vec-dot

* Validate against GGML unit test results.

* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
6559eae72c Avoid some transmutes. (#607) 2023-08-25 18:21:37 +01:00
9c8d6dbc2a Neon intrinsics for the q8_0 vecdot. (#604)
* Neon intrinsics for the q8_0 vecdot.

* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
afc10a3232 AVX version for the q8-0 multiplications. (#598) 2023-08-25 10:14:49 +01:00
c093b03d51 Generic implementation of vecdot for q80. (#596)
* Generic implementation of vecdot for q80.

* Add support for code-llama 7b.

* Support more code-llama.
2023-08-25 09:04:05 +01:00
d2f42ab086 Referenze implementations of q2k and q3k vec-dot functions (#580)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
07067b01dc Avoid some mutable variables (take 2). (#554)
* Avoid some mutable variables (take 2).

* Fix.
2023-08-22 18:51:20 +01:00
ec665acad7 Revert "Avoid some mut in quantized functions. (#550)" (#552)
This reverts commit cf27b9b636.
2023-08-22 15:57:46 +01:00
cf27b9b636 Avoid some mut in quantized functions. (#550)
* Avoid a couple more 'let mut'.

* Tweaks.
2023-08-22 15:44:26 +01:00
352383cbc3 Add quantization support for q2k, q3k, q4k and q5k (#524)
* first q2 implementation

* First Q4K and Q5K implementations

* fix `q2k` and `q5k`

* Some first cleanups

* run `clippy` on tests

* finally implement `q3k`

* deactivate `q3k` test on macos

* also disable the test on linux

* Fix floating bits in `q3k` dequantization

* Refactoring pass + reorder quants in file

* `fmt`

* Re-add `src` asserts and redefine `dst`
2023-08-22 15:04:55 +01:00
82410995a2 Neon support for quantization. (#519)
* Skeleton files for neon support of quantization.

* SIMD version for q4 vecdot.

* Also simdify the q6k multiplication.
2023-08-19 22:07:29 +01:00
109e95b189 Basic qmatmul parallelization (#492)
* Basic `par_iter` parallelization

* Pass errors up

* Disable `avx` for x86 macs
2023-08-18 09:45:37 +01:00
557b2c28dd Q6K quantization (#495)
* Print the detected arch options.

* Add the q6k quantization.

* Add a currently broken test.

* Bugfix.

* Bugfix.

* Another bugfix.

* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
fc81af1712 AVX version of the q6k vec-dot. (#493)
* AVX version of the q6k vec-dot.

* Use the avx sum.
2023-08-17 20:13:18 +01:00
d99cac3ec3 Move the avx specific bits to a separate file. (#481) 2023-08-17 09:01:06 +01:00
306c8eee7a AVX version of the vecdot for q4_0. (#474)
* AVX version of the vecdot for q4_0.

* Tweak the avx bits.

* Add a qmatmul benchmark.

* Fix the quantized test.
2023-08-17 07:03:32 +01:00
098909de40 Add vecdot for q6k-q8k. (#476)
* Add vecdot for q6k-q8k.

* Add some testing for q8k.

* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
3bedba1fce Use a zipped iterator. (#475)
* Use a zipped iterator.

* Add to/from float for q8k.
2023-08-16 20:15:11 +01:00
a9101700b6 Add a kv-cache to the quantized llama example. (#466)
* Add a kv-cache to the quantized llama example.

* Also print the prompt.

* Bugfix in q6k dequantizing.

* Another bugfix.
2023-08-16 14:28:42 +01:00
3071134788 Get the ggml based llama to generate some text. (#464)
* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.
2023-08-16 12:41:07 +01:00
b8263aa15c Quantized support for f16 and f32 (#457)
* Add f32 as a quantized type.

* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4 Split out the quantized file. (#456) 2023-08-15 20:26:27 +01:00