badf886583
Cuda kernel for dequantizing q8k. ( #1760 )
...
* Cuda kernel for dequantizing q8k.
* Clippy lints.
2024-02-26 08:42:44 +01:00
2f22afd80e
Cuda acceleration for quantized model. ( #1754 )
...
* Boilerplate for the quantized cuda support.
* More basic cuda support.
* More cuda quantization (quantize on cpu for now).
* Add the dequantization bit.
* Start adding some dedicated cuda kernels from llama.cpp.
* Move the kernel code.
* Start interfacing with the kernel.
* Tweak the kernel launch params.
* Bugfix for quantized metal.
* Fix some clippy lints.
* Tweak the launch parameters.
* Tweak cuda basics to perform a quantized matmul.
* Perform the dequantization on the cpu + use cublas for matmul.
* Add the dequantization kernel.
* Test the qmatmul.
* More kernels.
* Matmul-vec kernel.
* Add a couple kernels.
* More dequantization kernels.
2024-02-25 18:11:47 +01:00
0de0795220
Qmetal tweaks ( #1704 )
...
* Add the dummy qmetal backend.
* Fix the metal compilation.
2024-02-13 18:11:17 +01:00
c1b418586c
Fixing quantized llama demo on metal. ( #1703 )
2024-02-13 16:28:56 +01:00
403680f17d
Quantized GGUF style ( #1523 )
...
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local >
2024-01-17 10:27:58 +01:00
41915184bb
Bugfix for dequantizing q5k layers. ( #1569 )
2024-01-11 23:15:11 +01:00
0eb90ed783
Simpler repro for the neon optimization issue + bugfix ( #1544 )
...
* Simpler repro for the neon optimization issue.
* Bugfix for q4k.
* Improve the fix, share the dot-prod bit.
* Clippy fixes.
* Fix for q6k.
* Also fix for q2k.
* Use the new shared dotprod.
* Add more testing.
2024-01-07 20:21:49 +01:00
7135791dd5
Fix the quantized mistral example. ( #1478 )
2023-12-25 09:31:24 +01:00
1e86717bf2
Fix a couple typos ( #1451 )
...
* Mixtral quantized instruct.
* Fix a couple typos.
2023-12-17 05:20:05 -06:00
bfa7c8fc01
Implement the module trait directly for QMatMul. ( #1372 )
2023-11-25 10:09:45 +00:00
6fa3151820
Allow using gguf-v3 files. ( #1262 )
2023-11-03 23:07:53 +01:00
ef33df7ae2
No need for the even constraint on vecdot-q40-q80. ( #1202 )
2023-10-28 07:23:59 +01:00
e2826e70b3
Add a quantized variant of llama2.c ( #1197 )
...
* Add a quantized variant of llama2.c
* Clippy fixes.
2023-10-27 15:34:06 +01:00
aa53368aeb
Better control on the optional dequantization in QMatMul ( #1049 )
...
* Cosmetic change to the quantized whisper model.
* Fix the dequantization.
* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
11d3687cc6
Simd128 optimized q8k vecdot. ( #1026 )
2023-10-03 15:29:48 +01:00
dac73edb34
AVX optimized q8k vecdot. ( #1024 )
2023-10-03 12:10:58 +01:00
7670fe7d1f
neon optimized q8k multiplication. ( #1021 )
...
* neon optimized q8k multiplication.
* Bugfixes.
* simdification.
2023-10-02 23:26:34 +01:00
cddfc3944c
Add the q8k vec-dot multiplication. ( #1019 )
2023-10-02 21:53:34 +01:00
089fc3b584
Improve the quantized whisper setup. ( #1018 )
...
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
263a172202
Improve the testing of the optimized quantized vec-dot ops ( #1016 )
...
* Expose the unopt functions for testing.
* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
5130a7da32
Simd128 version of q6k vec-dot. ( #1015 )
...
* Add a specific function for the simd128 q6k vec-dot.
* Simdification.
* More simdification.
2023-10-01 19:44:12 +01:00
4e55aaa51f
Simd128 version of the q2k-q8k vecdot product. ( #1011 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
* Simdify the q2k-q8k vecdot product.
* Cosmetic change.
2023-09-30 20:12:41 +01:00
25657804ef
Simd128 q2k vecdot ( #982 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
2023-09-28 12:16:35 +01:00
9cb110c44c
Sketch a simd128 optimized q4k vecdot. ( #977 )
...
* Sketch a simd128 optimized q4k vecdot.
* Simdify.
* More quantization optimizations.
* Again more simdification.
* Simdify the splitting loop.
2023-09-27 20:19:38 +01:00
667f01c173
Simd128 vec-dot for q4_0. ( #974 )
...
* Simd128 vec-dot for q4_0.
* Bugfix.
* Add wasm tests.
* Bugfix for the q40 vecdot.
* More quantization tests.
2023-09-27 14:15:30 +01:00
e59784e353
simd128 optimized q8_0 vecdot ( #972 )
...
* wasm/simd128 version of the quantized q8_0 vecdot.
* Add the missing conversion.
2023-09-27 11:03:20 +01:00
ce0a4e3a85
Use the gelu-erf activation. ( #969 )
2023-09-26 22:30:21 +01:00
4abc1ea34d
Avoid some overflows on wasm32. ( #968 )
2023-09-26 11:15:38 +01:00
2619c4307f
Add a quantized version of the t5 model. ( #921 )
2023-09-21 11:13:39 +01:00
98172d46fa
Fix some errors about BlockQ8_1 ( #776 )
...
* use int8 type instead of uint8 for BlockQ8_1.qs
The uint8 type of BlockQ8_1.qs causes great loss for negative weights
Ref: ebc96086af/ggml.c (L904)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ4_1
Ref: ebc96086af/ggml.c (L2840)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
* fix sum error in vec_dot of BlockQ5_1
Ref: ebc96086af/ggml.c (L3490)
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
---------
Signed-off-by: Zhang Miaolei <zmlcc@outlook.com >
2023-09-08 13:29:40 +01:00
f7980e07e0
Add ggufv2
support ( #725 )
2023-09-03 14:41:57 +01:00
2ed78ab336
Support for quantized tensors in the python api. ( #706 )
...
* Add more pyo3 support.
* Add some support for quantized tensors in pyo3.
* Add an arc layer on qmatmul.
* Add the quantized matmul.
* Quantization support.
* More quantization support.
* Test the python quantization.
2023-09-01 15:53:42 +01:00
9b25113393
Small cleanups (avoid some possible mutations) ( #670 )
...
* More mut cleanup.
* Factor out some common bits.
2023-08-30 08:54:00 +01:00
a1a5ab8b0a
Neon optimized vecdot ( #666 )
...
* Q5k vecdot.
* Add the q3k vecdot.
* Q2k vecdot.
* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
ee8bb1bde1
Add avx
implemenetations of q2k
, q3k
and q5k
vec-dot functions ( #654 )
...
* `q2k` avx implementation
* `q3k` avx implementation
* `q5k` avx implementation
* `avx` make masks constant
* clippy stuff
2023-08-29 13:35:56 +01:00
4b8d57ba15
AVX version of the q4k vecdot. ( #651 )
2023-08-29 09:41:17 +01:00
1da71a5da1
Neon optimized version of the q4k vecdot product. ( #632 )
2023-08-27 21:30:47 +01:00
be471d50ab
Llama quantization. ( #625 )
2023-08-27 14:08:15 +01:00
7151f2cf63
Add the quantize command. ( #624 )
...
* Add the quantize command.
* Bugfix for writing gguf files.
* And add a comment.
2023-08-27 11:35:19 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
28658054ff
More missing quantized bits. ( #615 )
...
* Q4_1 support.
* Add Q5_1 quantization.
* Tweak.
2023-08-27 07:52:26 +01:00
f704e39761
Missing quants ops ( #611 )
...
* Another transmute tweak.
* Changelog tweak.
* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
fdf15f0e05
Another transmute tweak. ( #610 )
...
* Another transmute tweak.
* Changelog tweak.
2023-08-26 13:00:24 +01:00
06b37ea7ad
Avoid using tmp values. ( #609 )
2023-08-26 12:28:28 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
6559eae72c
Avoid some transmutes. ( #607 )
2023-08-25 18:21:37 +01:00
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
afc10a3232
AVX version for the q8-0 multiplications. ( #598 )
2023-08-25 10:14:49 +01:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00