candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3144150b8d	Move the tensor-tools binary in a separate crate. (#1969 )	2024-03-30 15:49:37 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Juarez Bochi	d5c2a7b64b	Add info about MADLAD-400 in readme files (#1287 )	2023-11-07 15:21:59 +01:00
Juarez Bochi	508f811b93	Add support for MADLAD400 (#1285 ) * Add support for madlad * Add support for quantized MADLAD	2023-11-07 05:35:37 +01:00
Laurent Mazare	2e5fb0b251	Do not use the kv-cache on external key-value states. (#1054 )	2023-10-07 22:37:19 +01:00
Juarez Bochi	b86ac0c507	Quant t5: Add coedit model to wasm demo and readme (#1031 )	2023-10-04 20:57:33 +01:00
Juarez Bochi	3349c89252	Add quantized t5 args for weight and config (#1029 )	2023-10-04 17:02:49 +01:00
Laurent Mazare	b43ca493f6	Add more quantized flan t5 variants (#923 ) * Add the quantized flan-t5-large variant. * Add more sizes.	2023-09-21 13:23:30 +01:00
Laurent Mazare	3b557765e8	T5 quantized example (#922 ) * Load gguf files for the quantized t5. * Add the quantized t5 example. * Allow for loading local files. * Add some support for quantizing safetensor files. * Transpose before quantizing. * Quantized t5. * Retrieve the weights from the hub.	2023-09-21 12:33:15 +01:00

9 Commits