41915184bb
Bugfix for dequantizing q5k layers. ( #1569 )
2024-01-11 23:15:11 +01:00
ef33df7ae2
No need for the even constraint on vecdot-q40-q80. ( #1202 )
2023-10-28 07:23:59 +01:00
11d3687cc6
Simd128 optimized q8k vecdot. ( #1026 )
2023-10-03 15:29:48 +01:00
dac73edb34
AVX optimized q8k vecdot. ( #1024 )
2023-10-03 12:10:58 +01:00
7670fe7d1f
neon optimized q8k multiplication. ( #1021 )
...
* neon optimized q8k multiplication.
* Bugfixes.
* simdification.
2023-10-02 23:26:34 +01:00
cddfc3944c
Add the q8k vec-dot multiplication. ( #1019 )
2023-10-02 21:53:34 +01:00
263a172202
Improve the testing of the optimized quantized vec-dot ops ( #1016 )
...
* Expose the unopt functions for testing.
* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
5130a7da32
Simd128 version of q6k vec-dot. ( #1015 )
...
* Add a specific function for the simd128 q6k vec-dot.
* Simdification.
* More simdification.
2023-10-01 19:44:12 +01:00
4e55aaa51f
Simd128 version of the q2k-q8k vecdot product. ( #1011 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
* Simdify the q2k-q8k vecdot product.
* Cosmetic change.
2023-09-30 20:12:41 +01:00
25657804ef
Simd128 q2k vecdot ( #982 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
2023-09-28 12:16:35 +01:00
9cb110c44c
Sketch a simd128 optimized q4k vecdot. ( #977 )
...
* Sketch a simd128 optimized q4k vecdot.
* Simdify.
* More quantization optimizations.
* Again more simdification.
* Simdify the splitting loop.
2023-09-27 20:19:38 +01:00
667f01c173
Simd128 vec-dot for q4_0. ( #974 )
...
* Simd128 vec-dot for q4_0.
* Bugfix.
* Add wasm tests.
* Bugfix for the q40 vecdot.
* More quantization tests.
2023-09-27 14:15:30 +01:00
e59784e353
simd128 optimized q8_0 vecdot ( #972 )
...
* wasm/simd128 version of the quantized q8_0 vecdot.
* Add the missing conversion.
2023-09-27 11:03:20 +01:00
98172d46fa
Fix some errors about BlockQ8_1 ( #776 )
...
* use int8 type instead of uint8 for BlockQ8_1.qs
The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ4_1
Ref: ebc96086af/ggml.c (L2840)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ5_1
Ref: ebc96086af/ggml.c (L3490)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
---------
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
2023-09-08 13:29:40 +01:00
a1a5ab8b0a
Neon optimized vecdot ( #666 )
...
* Q5k vecdot.
* Add the q3k vecdot.
* Q2k vecdot.
* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
ee8bb1bde1
Add avx
implemenetations of q2k
, q3k
and q5k
vec-dot functions ( #654 )
...
* `q2k` avx implementation
* `q3k` avx implementation
* `q5k` avx implementation
* `avx` make masks constant
* clippy stuff
2023-08-29 13:35:56 +01:00
4b8d57ba15
AVX version of the q4k vecdot. ( #651 )
2023-08-29 09:41:17 +01:00
1da71a5da1
Neon optimized version of the q4k vecdot product. ( #632 )
2023-08-27 21:30:47 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
28658054ff
More missing quantized bits. ( #615 )
...
* Q4_1 support.
* Add Q5_1 quantization.
* Tweak.
2023-08-27 07:52:26 +01:00
f704e39761
Missing quants ops ( #611 )
...
* Another transmute tweak.
* Changelog tweak.
* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
fdf15f0e05
Another transmute tweak. ( #610 )
...
* Another transmute tweak.
* Changelog tweak.
2023-08-26 13:00:24 +01:00
06b37ea7ad
Avoid using tmp values. ( #609 )
2023-08-26 12:28:28 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
6559eae72c
Avoid some transmutes. ( #607 )
2023-08-25 18:21:37 +01:00
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
afc10a3232
AVX version for the q8-0 multiplications. ( #598 )
2023-08-25 10:14:49 +01:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
d2f42ab086
Referenze implementations of q2k
and q3k
vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
07067b01dc
Avoid some mutable variables (take 2). ( #554 )
...
* Avoid some mutable variables (take 2).
* Fix.
2023-08-22 18:51:20 +01:00
ec665acad7
Revert "Avoid some mut in quantized functions. ( #550 )" ( #552 )
...
This reverts commit cf27b9b636
.
2023-08-22 15:57:46 +01:00
cf27b9b636
Avoid some mut in quantized functions. ( #550 )
...
* Avoid a couple more 'let mut'.
* Tweaks.
2023-08-22 15:44:26 +01:00
352383cbc3
Add quantization support for q2k
, q3k
, q4k
and q5k
( #524 )
...
* first q2 implementation
* First Q4K and Q5K implementations
* fix `q2k` and `q5k`
* Some first cleanups
* run `clippy` on tests
* finally implement `q3k`
* deactivate `q3k` test on macos
* also disable the test on linux
* Fix floating bits in `q3k` dequantization
* Refactoring pass + reorder quants in file
* `fmt`
* Re-add `src` asserts and redefine `dst`
2023-08-22 15:04:55 +01:00
82410995a2
Neon support for quantization. ( #519 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
2023-08-19 22:07:29 +01:00
109e95b189
Basic qmatmul
parallelization ( #492 )
...
* Basic `par_iter` parallelization
* Pass errors up
* Disable `avx` for x86 macs
2023-08-18 09:45:37 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
fc81af1712
AVX version of the q6k vec-dot. ( #493 )
...
* AVX version of the q6k vec-dot.
* Use the avx sum.
2023-08-17 20:13:18 +01:00
d99cac3ec3
Move the avx specific bits to a separate file. ( #481 )
2023-08-17 09:01:06 +01:00
306c8eee7a
AVX version of the vecdot for q4_0. ( #474 )
...
* AVX version of the vecdot for q4_0.
* Tweak the avx bits.
* Add a qmatmul benchmark.
* Fix the quantized test.
2023-08-17 07:03:32 +01:00
098909de40
Add vecdot for q6k-q8k. ( #476 )
...
* Add vecdot for q6k-q8k.
* Add some testing for q8k.
* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
3bedba1fce
Use a zipped iterator. ( #475 )
...
* Use a zipped iterator.
* Add to/from float for q8k.
2023-08-16 20:15:11 +01:00
a9101700b6
Add a kv-cache to the quantized llama example. ( #466 )
...
* Add a kv-cache to the quantized llama example.
* Also print the prompt.
* Bugfix in q6k dequantizing.
* Another bugfix.
2023-08-16 14:28:42 +01:00
3071134788
Get the ggml based llama to generate some text. ( #464 )
...
* Add more stats to the ggml example.
* Build a quantized model from the file content.
* Move the tensor retrieval in the main crate.
* Start adding the forward pass.
* Add more to the forward pass of the quantized llama.
* Apply the attention layers.
* Add the sampling loop.
* Get the sampling loop to work.
* Minor tweak.
* Add a quantize/dequantize test.
* Bugfix.
* Add a comment + swap the order.
* Bugfixes.
2023-08-16 12:41:07 +01:00
b8263aa15c
Quantized support for f16 and f32 ( #457 )
...
* Add f32 as a quantized type.
* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4
Split out the quantized file. ( #456 )
2023-08-15 20:26:27 +01:00