candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Author	SHA1	Message	Date
Laurent Mazare	098909de40	Add vecdot for q6k-q8k. (#476 ) * Add vecdot for q6k-q8k. * Add some testing for q8k. * Use QMatMul for the output layer.	2023-08-16 20:59:40 +01:00
Laurent Mazare	c5f45887dc	Add some tracing to the quantized example. (#473 )	2023-08-16 18:49:08 +01:00
Laurent Mazare	2e206e269d	Add the model argument. (#471 )	2023-08-16 16:41:06 +01:00
Laurent Mazare	575e88a999	Add a quantized test that use negative values. (#470 ) * Add a quantized test that use negative values. * Add a default tokenizer.	2023-08-16 16:32:58 +01:00
Laurent Mazare	a9101700b6	Add a kv-cache to the quantized llama example. (#466 ) * Add a kv-cache to the quantized llama example. * Also print the prompt. * Bugfix in q6k dequantizing. * Another bugfix.	2023-08-16 14:28:42 +01:00
Laurent Mazare	3071134788	Get the ggml based llama to generate some text. (#464 ) * Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.	2023-08-16 12:41:07 +01:00
Laurent Mazare	ca449f9ee1	Add quantized tensors. (#458 ) * Add quantized tensors. * Implement the debug trait for QTensor. * Add the QMatMul custom op.	2023-08-15 22:45:53 +01:00
Laurent Mazare	b8263aa15c	Quantized support for f16 and f32 (#457 ) * Add f32 as a quantized type. * Add f16 as a quantized type too.	2023-08-15 21:09:37 +01:00
Laurent Mazare	e68b2accb4	Split out the quantized file. (#456 )	2023-08-15 20:26:27 +01:00
Lukas Kreussel	9e7e6e0288	Add dequantization for ggmls `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` (#407 ) * Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0` * expose `tensor_from_ggml` for external usage * bugfixes & example	2023-08-13 23:22:57 +01:00

10 Commits