candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 19:58:35 +00:00

Author	SHA1	Message	Date
Laurent Mazare	6fa3151820	Allow using gguf-v3 files. (#1262 )	2023-11-03 23:07:53 +01:00
Laurent Mazare	ef33df7ae2	No need for the even constraint on vecdot-q40-q80. (#1202 )	2023-10-28 07:23:59 +01:00
Laurent Mazare	e2826e70b3	Add a quantized variant of llama2.c (#1197 ) * Add a quantized variant of llama2.c * Clippy fixes.	2023-10-27 15:34:06 +01:00
Laurent Mazare	aa53368aeb	Better control on the optional dequantization in QMatMul (#1049 ) * Cosmetic change to the quantized whisper model. * Fix the dequantization. * Add the dequantize all variable.	2023-10-07 10:16:18 +01:00
Laurent Mazare	11d3687cc6	Simd128 optimized q8k vecdot. (#1026 )	2023-10-03 15:29:48 +01:00
Laurent Mazare	dac73edb34	AVX optimized q8k vecdot. (#1024 )	2023-10-03 12:10:58 +01:00
Laurent Mazare	7670fe7d1f	neon optimized q8k multiplication. (#1021 ) * neon optimized q8k multiplication. * Bugfixes. * simdification.	2023-10-02 23:26:34 +01:00
Laurent Mazare	cddfc3944c	Add the q8k vec-dot multiplication. (#1019 )	2023-10-02 21:53:34 +01:00
Laurent Mazare	089fc3b584	Improve the quantized whisper setup. (#1018 ) * Improve the quantized whisper setup. * Fix the config file paths. * Use the standard matmul where possible.	2023-10-02 17:17:46 +01:00
Laurent Mazare	263a172202	Improve the testing of the optimized quantized vec-dot ops (#1016 ) * Expose the unopt functions for testing. * Better testing of the optimized quantized computations.	2023-10-02 09:50:43 +01:00
Laurent Mazare	5130a7da32	Simd128 version of q6k vec-dot. (#1015 ) * Add a specific function for the simd128 q6k vec-dot. * Simdification. * More simdification.	2023-10-01 19:44:12 +01:00
Laurent Mazare	4e55aaa51f	Simd128 version of the q2k-q8k vecdot product. (#1011 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator. * Simdify the q2k-q8k vecdot product. * Cosmetic change.	2023-09-30 20:12:41 +01:00
Laurent Mazare	25657804ef	Simd128 q2k vecdot (#982 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator.	2023-09-28 12:16:35 +01:00
Laurent Mazare	9cb110c44c	Sketch a simd128 optimized q4k vecdot. (#977 ) * Sketch a simd128 optimized q4k vecdot. * Simdify. * More quantization optimizations. * Again more simdification. * Simdify the splitting loop.	2023-09-27 20:19:38 +01:00
Laurent Mazare	667f01c173	Simd128 vec-dot for q4_0. (#974 ) * Simd128 vec-dot for q4_0. * Bugfix. * Add wasm tests. * Bugfix for the q40 vecdot. * More quantization tests.	2023-09-27 14:15:30 +01:00
Laurent Mazare	e59784e353	simd128 optimized q8_0 vecdot (#972 ) * wasm/simd128 version of the quantized q8_0 vecdot. * Add the missing conversion.	2023-09-27 11:03:20 +01:00
Laurent Mazare	ce0a4e3a85	Use the gelu-erf activation. (#969 )	2023-09-26 22:30:21 +01:00
Laurent Mazare	4abc1ea34d	Avoid some overflows on wasm32. (#968 )	2023-09-26 11:15:38 +01:00
Laurent Mazare	2619c4307f	Add a quantized version of the t5 model. (#921 )	2023-09-21 11:13:39 +01:00
zmlcc	98172d46fa	Fix some errors about BlockQ8_1 (#776 ) * use int8 type instead of uint8 for BlockQ8_1.qs The uint8 type of BlockQ8_1.qs causes great loss for negative weights Ref: `ebc96086af/ggml.c (L904)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ4_1 Ref: `ebc96086af/ggml.c (L2840)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ5_1 Ref: `ebc96086af/ggml.c (L3490)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> --------- Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>	2023-09-08 13:29:40 +01:00
Lukas Kreussel	f7980e07e0	Add `ggufv2` support (#725 )	2023-09-03 14:41:57 +01:00
Laurent Mazare	2ed78ab336	Support for quantized tensors in the python api. (#706 ) * Add more pyo3 support. * Add some support for quantized tensors in pyo3. * Add an arc layer on qmatmul. * Add the quantized matmul. * Quantization support. * More quantization support. * Test the python quantization.	2023-09-01 15:53:42 +01:00
Laurent Mazare	9b25113393	Small cleanups (avoid some possible mutations) (#670 ) * More mut cleanup. * Factor out some common bits.	2023-08-30 08:54:00 +01:00
Laurent Mazare	a1a5ab8b0a	Neon optimized vecdot (#666 ) * Q5k vecdot. * Add the q3k vecdot. * Q2k vecdot. * Move the quantized model to its own file.	2023-08-29 22:28:46 +01:00
Lukas Kreussel	ee8bb1bde1	Add `avx` implemenetations of `q2k`, `q3k` and `q5k` vec-dot functions (#654 ) * `q2k` avx implementation * `q3k` avx implementation * `q5k` avx implementation * `avx` make masks constant * clippy stuff	2023-08-29 13:35:56 +01:00
Laurent Mazare	4b8d57ba15	AVX version of the q4k vecdot. (#651 )	2023-08-29 09:41:17 +01:00
Laurent Mazare	1da71a5da1	Neon optimized version of the q4k vecdot product. (#632 )	2023-08-27 21:30:47 +01:00
Laurent Mazare	be471d50ab	Llama quantization. (#625 )	2023-08-27 14:08:15 +01:00
Laurent Mazare	7151f2cf63	Add the quantize command. (#624 ) * Add the quantize command. * Bugfix for writing gguf files. * And add a comment.	2023-08-27 11:35:19 +01:00
Laurent Mazare	a8b39dd7b7	Fix for q5_1 quantization. (#617 ) * Fix for q5_1 quantization. * Fix some typos.	2023-08-27 08:31:18 +01:00
Laurent Mazare	fa0d75b18d	Quantization tests + fix some issues. (#616 )	2023-08-27 08:17:38 +01:00
Laurent Mazare	28658054ff	More missing quantized bits. (#615 ) * Q4_1 support. * Add Q5_1 quantization. * Tweak.	2023-08-27 07:52:26 +01:00
Laurent Mazare	f704e39761	Missing quants ops (#611 ) * Another transmute tweak. * Changelog tweak. * Add some missing quantized ops.	2023-08-26 20:09:04 +01:00
Laurent Mazare	fdf15f0e05	Another transmute tweak. (#610 ) * Another transmute tweak. * Changelog tweak.	2023-08-26 13:00:24 +01:00
Laurent Mazare	06b37ea7ad	Avoid using tmp values. (#609 )	2023-08-26 12:28:28 +01:00
Lukas Kreussel	c72eb3d75b	Add reference implementation for `q4k` and `q5k` (#586 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix * `q4k` vec-dot * `q5k` vec-dot * Validate against GGML unit test results. * Remove some more `transmutes`	2023-08-26 12:07:54 +01:00
Laurent Mazare	6559eae72c	Avoid some transmutes. (#607 )	2023-08-25 18:21:37 +01:00
Laurent Mazare	9c8d6dbc2a	Neon intrinsics for the q8_0 vecdot. (#604 ) * Neon intrinsics for the q8_0 vecdot. * Get the tests to run with accelerate (with some numerical error failures).	2023-08-25 14:42:18 +01:00
Laurent Mazare	afc10a3232	AVX version for the q8-0 multiplications. (#598 )	2023-08-25 10:14:49 +01:00
Laurent Mazare	c093b03d51	Generic implementation of vecdot for q80. (#596 ) * Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.	2023-08-25 09:04:05 +01:00
Laurent Mazare	c265ac50fa	Add a function to write gguf files. (#585 ) * Add a function to write gguf files. * More GGUF file writing. * Write the tensor data in GGUF files.	2023-08-24 17:03:06 +01:00
Lukas Kreussel	d2f42ab086	Referenze implementations of `q2k` and `q3k` vec-dot functions (#580 ) * add `q2k` vec-dot * `q3k` vec-dot + quantization bugfix	2023-08-24 12:35:54 +01:00
Laurent Mazare	508d34daf2	GGUF support in the quantized model. (#559 ) * GGUF support in the quantized model. * Get the GGUF support to work on llama.	2023-08-23 09:20:57 +01:00
Laurent Mazare	0764741cc4	Handle GGUF files in tensor-tools. (#558 )	2023-08-23 06:32:07 +01:00
Laurent Mazare	6a30ecefad	Preliminary GGUF support. (#557 ) * Preliminary GGUF support. * Tensor reading.	2023-08-23 00:14:10 +01:00
Laurent Mazare	07067b01dc	Avoid some mutable variables (take 2). (#554 ) * Avoid some mutable variables (take 2). * Fix.	2023-08-22 18:51:20 +01:00
Laurent Mazare	ec665acad7	Revert "Avoid some mut in quantized functions. (#550 )" (#552 ) This reverts commit `cf27b9b636`.	2023-08-22 15:57:46 +01:00
Laurent Mazare	cf27b9b636	Avoid some mut in quantized functions. (#550 ) * Avoid a couple more 'let mut'. * Tweaks.	2023-08-22 15:44:26 +01:00
Lukas Kreussel	352383cbc3	Add quantization support for `q2k`, `q3k`, `q4k` and `q5k` (#524 ) * first q2 implementation * First Q4K and Q5K implementations * fix `q2k` and `q5k` * Some first cleanups * run `clippy` on tests * finally implement `q3k` * deactivate `q3k` test on macos * also disable the test on linux * Fix floating bits in `q3k` dequantization * Refactoring pass + reorder quants in file * `fmt` * Re-add `src` asserts and redefine `dst`	2023-08-22 15:04:55 +01:00
Laurent Mazare	82410995a2	Neon support for quantization. (#519 ) * Skeleton files for neon support of quantization. * SIMD version for q4 vecdot. * Also simdify the q6k multiplication.	2023-08-19 22:07:29 +01:00

1 2

65 Commits