candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	4f92420132	Add some flash attn test (#253 ) * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test.	2023-07-26 20:56:00 +01:00
Laurent Mazare	2ce5f12513	Again set a few extra params in flash-attn. (#245 ) * Again set a few extra params. * Use the appropriate kernel sizes. * Add all the kernel sizes. * Parallel compiling. * Reduce the amount of parallelism. * Add the missing kernel. * Fix a typo. * Remove bf16 support for now.	2023-07-26 14:16:37 +01:00
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00