candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 19:47:12 +00:00

Author	SHA1	Message	Date
Nicolas Patry	c8c603ce96	Removing the fences speeds everything up and is correct this time...	2024-01-15 17:43:00 +01:00
Nicolas Patry	9c4b4f0da0	Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml).	2024-01-15 17:42:58 +01:00
Laurent Mazare	37c539f2b7	Helper function to load sharded safetensors files (#1481 ) * Fix the quantized mistral example. * Add a helper function to load sharded safetensors weights. * Use the sharded loader.	2023-12-25 21:49:21 +01:00
Laurent Mazare	7135791dd5	Fix the quantized mistral example. (#1478 )	2023-12-25 09:31:24 +01:00
Laurent Mazare	88589d8815	Support mistral instruct v0.2. (#1475 ) * Support mistral instruct v0.2. * Use the safetensors model now that they are available.	2023-12-23 16:18:49 +01:00
Laurent Mazare	deee7612da	Quantized version of mistral. (#1009 ) * Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.	2023-09-30 18:25:47 +01:00
Laurent Mazare	06207332bc	Streaming mode for reporting the generated tokens (#1007 ) * Token streaming. * Use the token output stream. * Flush the output. * Ensure that the last characters get reported.	2023-09-30 15:04:11 +01:00
Laurent Mazare	4021272875	Use flash-attn for mistral. (#1004 )	2023-09-30 12:15:10 +01:00
Laurent Mazare	87e3a4e175	Mistral: exit on eos token. (#1001 ) * Mistral: exit on eos token. * Print the proper stats. * Also add a short flag.	2023-09-30 07:07:06 +01:00
Laurent Mazare	6f17ef82be	Mistral: print the generated text. (#992 )	2023-09-29 10:56:11 +01:00
Laurent Mazare	ada8851a23	Add the mistral example. (#984 ) * Add the mistral example. * Use the two model files. * Adjust the dtype. * Tweak the weight paths. * Remove the end of text token. * Get the mistral model to generate some text.	2023-09-28 16:19:18 +01:00

11 Commits