candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 11:37:11 +00:00

Author	SHA1	Message	Date
Laurent Mazare	41915184bb	Bugfix for dequantizing q5k layers. (#1569 )	2024-01-11 23:15:11 +01:00
Laurent Mazare	ef33df7ae2	No need for the even constraint on vecdot-q40-q80. (#1202 )	2023-10-28 07:23:59 +01:00
Laurent Mazare	11d3687cc6	Simd128 optimized q8k vecdot. (#1026 )	2023-10-03 15:29:48 +01:00
Laurent Mazare	dac73edb34	AVX optimized q8k vecdot. (#1024 )	2023-10-03 12:10:58 +01:00
Laurent Mazare	7670fe7d1f	neon optimized q8k multiplication. (#1021 ) * neon optimized q8k multiplication. * Bugfixes. * simdification.	2023-10-02 23:26:34 +01:00
Laurent Mazare	cddfc3944c	Add the q8k vec-dot multiplication. (#1019 )	2023-10-02 21:53:34 +01:00
Laurent Mazare	263a172202	Improve the testing of the optimized quantized vec-dot ops (#1016 ) * Expose the unopt functions for testing. * Better testing of the optimized quantized computations.	2023-10-02 09:50:43 +01:00
Laurent Mazare	5130a7da32	Simd128 version of q6k vec-dot. (#1015 ) * Add a specific function for the simd128 q6k vec-dot. * Simdification. * More simdification.	2023-10-01 19:44:12 +01:00
Laurent Mazare	4e55aaa51f	Simd128 version of the q2k-q8k vecdot product. (#1011 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator. * Simdify the q2k-q8k vecdot product. * Cosmetic change.	2023-09-30 20:12:41 +01:00
Laurent Mazare	25657804ef	Simd128 q2k vecdot (#982 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator.	2023-09-28 12:16:35 +01:00
Laurent Mazare	9cb110c44c	Sketch a simd128 optimized q4k vecdot. (#977 ) * Sketch a simd128 optimized q4k vecdot. * Simdify. * More quantization optimizations. * Again more simdification. * Simdify the splitting loop.	2023-09-27 20:19:38 +01:00
Laurent Mazare	667f01c173	Simd128 vec-dot for q4_0. (#974 ) * Simd128 vec-dot for q4_0. * Bugfix. * Add wasm tests. * Bugfix for the q40 vecdot. * More quantization tests.	2023-09-27 14:15:30 +01:00
Laurent Mazare	e59784e353	simd128 optimized q8_0 vecdot (#972 ) * wasm/simd128 version of the quantized q8_0 vecdot. * Add the missing conversion.	2023-09-27 11:03:20 +01:00
zmlcc	98172d46fa	Fix some errors about BlockQ8_1 (#776 ) * use int8 type instead of uint8 for BlockQ8_1.qs The uint8 type of BlockQ8_1.qs causes great loss for negative weights Ref: `ebc96086af/ggml.c (L904)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ4_1 Ref: `ebc96086af/ggml.c (L2840)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ5_1 Ref: `ebc96086af/ggml.c (L3490)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> --------- Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>	2023-09-08 13:29:40 +01:00
Laurent Mazare	a1a5ab8b0a	Neon optimized vecdot (#666 ) * Q5k vecdot. * Add the q3k vecdot. * Q2k vecdot. * Move the quantized model to its own file.	2023-08-29 22:28:46 +01:00
Lukas Kreussel	ee8bb1bde1	Add `avx` implemenetations of `q2k`, `q3k` and `q5k` vec-dot functions (#654 ) * `q2k` avx implementation * `q3k` avx implementation * `q5k` avx implementation * `avx` make masks constant * clippy stuff	2023-08-29 13:35:56 +01:00
Laurent Mazare	4b8d57ba15	AVX version of the q4k vecdot. (#651 )	2023-08-29 09:41:17 +01:00
Laurent Mazare	1da71a5da1	Neon optimized version of the q4k vecdot product. (#632 )	2023-08-27 21:30:47 +01:00
Laurent Mazare	a8b39dd7b7	Fix for q5_1 quantization. (#617 ) * Fix for q5_1 quantization. * Fix some typos.	2023-08-27 08:31:18 +01:00
Laurent Mazare	fa0d75b18d	Quantization tests + fix some issues. (#616 )	2023-08-27 08:17:38 +01:00
Laurent Mazare	28658054ff	More missing quantized bits. (#615 ) * Q4_1 support. * Add Q5_1 quantization. * Tweak.	2023-08-27 07:52:26 +01:00
Laurent Mazare	f704e39761	Missing quants ops (#611 ) * Another transmute tweak. * Changelog tweak. * Add some missing quantized ops.	2023-08-26 20:09:04 +01:00
Laurent Mazare	fdf15f0e05	Another transmute tweak. (#610 ) * Another transmute tweak. * Changelog tweak.	2023-08-26 13:00:24 +01:00
Laurent Mazare	06b37ea7ad	Avoid using tmp values. (#609 )	2023-08-26 12:28:28 +01:00
Lukas Kreussel	c72eb3d75b	Add reference implementation for `q4k` and `q5k` (#586 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix * `q4k` vec-dot * `q5k` vec-dot * Validate against GGML unit test results. * Remove some more `transmutes`	2023-08-26 12:07:54 +01:00
Laurent Mazare	6559eae72c	Avoid some transmutes. (#607 )	2023-08-25 18:21:37 +01:00
Laurent Mazare	9c8d6dbc2a	Neon intrinsics for the q8_0 vecdot. (#604 ) * Neon intrinsics for the q8_0 vecdot. * Get the tests to run with accelerate (with some numerical error failures).	2023-08-25 14:42:18 +01:00
Laurent Mazare	afc10a3232	AVX version for the q8-0 multiplications. (#598 )	2023-08-25 10:14:49 +01:00
Laurent Mazare	c093b03d51	Generic implementation of vecdot for q80. (#596 ) * Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.	2023-08-25 09:04:05 +01:00
Lukas Kreussel	d2f42ab086	Referenze implementations of `q2k` and `q3k` vec-dot functions (#580 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix	2023-08-24 12:35:54 +01:00
Laurent Mazare	07067b01dc	Avoid some mutable variables (take 2). (#554 ) * Avoid some mutable variables (take 2). * Fix.	2023-08-22 18:51:20 +01:00
Laurent Mazare	ec665acad7	Revert "Avoid some mut in quantized functions. (#550 )" (#552 ) This reverts commit `cf27b9b636`.	2023-08-22 15:57:46 +01:00
Laurent Mazare	cf27b9b636	Avoid some mut in quantized functions. (#550 ) * Avoid a couple more 'let mut'. * Tweaks.	2023-08-22 15:44:26 +01:00
Lukas Kreussel	352383cbc3	Add quantization support for `q2k`, `q3k`, `q4k` and `q5k` (#524 ) * first q2 implementation * First Q4K and Q5K implementations * fix `q2k` and `q5k` * Some first cleanups * run `clippy` on tests * finally implement `q3k` * deactivate `q3k` test on macos * also disable the test on linux * Fix floating bits in `q3k` dequantization * Refactoring pass + reorder quants in file * `fmt` * Re-add `src` asserts and redefine `dst`	2023-08-22 15:04:55 +01:00
Laurent Mazare	82410995a2	Neon support for quantization. (#519 ) * Skeleton files for neon support of quantization. * SIMD version for q4 vecdot. * Also simdify the q6k multiplication.	2023-08-19 22:07:29 +01:00
Lukas Kreussel	109e95b189	Basic `qmatmul` parallelization (#492 ) * Basic `par_iter` parallelization * Pass errors up * Disable `avx` for x86 macs	2023-08-18 09:45:37 +01:00
Laurent Mazare	557b2c28dd	Q6K quantization (#495 ) * Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.	2023-08-17 22:22:57 +01:00
Laurent Mazare	fc81af1712	AVX version of the q6k vec-dot. (#493 ) * AVX version of the q6k vec-dot. * Use the avx sum.	2023-08-17 20:13:18 +01:00
Laurent Mazare	d99cac3ec3	Move the avx specific bits to a separate file. (#481 )	2023-08-17 09:01:06 +01:00
Laurent Mazare	306c8eee7a	AVX version of the vecdot for q4_0. (#474 ) * AVX version of the vecdot for q4_0. * Tweak the avx bits. * Add a qmatmul benchmark. * Fix the quantized test.	2023-08-17 07:03:32 +01:00
Laurent Mazare	098909de40	Add vecdot for q6k-q8k. (#476 ) * Add vecdot for q6k-q8k. * Add some testing for q8k. * Use QMatMul for the output layer.	2023-08-16 20:59:40 +01:00
Laurent Mazare	3bedba1fce	Use a zipped iterator. (#475 ) * Use a zipped iterator. * Add to/from float for q8k.	2023-08-16 20:15:11 +01:00
Laurent Mazare	a9101700b6	Add a kv-cache to the quantized llama example. (#466 ) * Add a kv-cache to the quantized llama example. * Also print the prompt. * Bugfix in q6k dequantizing. * Another bugfix.	2023-08-16 14:28:42 +01:00
Laurent Mazare	3071134788	Get the ggml based llama to generate some text. (#464 ) * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.	2023-08-16 12:41:07 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00

46 Commits