candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 11:37:11 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3ad4770eb6	Use cat for faster MQA computation. (#2043 ) * Use cat for faster MQA computation. * Move the function to utils + use it in mistral. * Use the shared repeat-kv in a few more models. * Fix.	2024-04-12 09:15:10 +02:00
Laurent Mazare	196765e995	Use the new rope kernel in mistral. (#1937 ) * Use the new rope kernel in mistral. * Compute the cos and sin with full precision. * Bugfix.	2024-03-25 23:26:05 +01:00
Laurent Mazare	e2b4829531	Support more mistral models. (#1927 ) * Support more mistral models. * Use the appropriate rope parameter.	2024-03-24 08:04:04 +01:00
Laurent Mazare	6f877592a7	Avoid broadcasting on the batch dimension for the attention mask. (#1920 )	2024-03-23 13:08:53 +01:00
Laurent Mazare	c0bdd9c7a6	Use the fast RmsNorm in the quantized model. (#1904 )	2024-03-21 18:49:35 +01:00
drbh	f6408a3779	feat: add clear_kv_cache to mistral and qmistral models (#1464 )	2023-12-21 21:19:19 +01:00
Laurent Mazare	902d0b9166	More model cloning. (#1126 ) * More model cloning. * More cloning on quantized models.	2023-10-18 21:55:46 +01:00
Laurent Mazare	392fe02fba	Move the common quantized-nn code to a shared module. (#1063 )	2023-10-09 06:22:22 +01:00
Laurent Mazare	deee7612da	Quantized version of mistral. (#1009 ) * Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.	2023-09-30 18:25:47 +01:00

9 Commits