0eb90ed783
Simpler repro for the neon optimization issue + bugfix ( #1544 )
...
* Simpler repro for the neon optimization issue.
* Bugfix for q4k.
* Improve the fix, share the dot-prod bit.
* Clippy fixes.
* Fix for q6k.
* Also fix for q2k.
* Use the new shared dotprod.
* Add more testing.
2024-01-07 20:21:49 +01:00
ef33df7ae2
No need for the even constraint on vecdot-q40-q80. ( #1202 )
2023-10-28 07:23:59 +01:00
e2826e70b3
Add a quantized variant of llama2.c ( #1197 )
...
* Add a quantized variant of llama2.c
* Clippy fixes.
2023-10-27 15:34:06 +01:00
7670fe7d1f
neon optimized q8k multiplication. ( #1021 )
...
* neon optimized q8k multiplication.
* Bugfixes.
* simdification.
2023-10-02 23:26:34 +01:00
a1a5ab8b0a
Neon optimized vecdot ( #666 )
...
* Q5k vecdot.
* Add the q3k vecdot.
* Q2k vecdot.
* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
1da71a5da1
Neon optimized version of the q4k vecdot product. ( #632 )
2023-08-27 21:30:47 +01:00
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
82410995a2
Neon support for quantization. ( #519 )
...
* Skeleton files for neon support of quantization.
* SIMD version for q4 vecdot.
* Also simdify the q6k multiplication.
2023-08-19 22:07:29 +01:00