candle/flash_fwd_hdim256_fp16_sm80.cu at 109e95b189fc6a587ef7d2f901d194646f59b0c4 - candle - Gitea: Git with a cup of tea

huggingface/candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 18:28:24 +00:00

Files

Laurent Mazare 2ce5f12513 Again set a few extra params in flash-attn. (#245 )

* Again set a few extra params.

* Use the appropriate kernel sizes.

* Add all the kernel sizes.

* Parallel compiling.

* Reduce the amount of parallelism.

* Add the missing kernel.

* Fix a typo.

* Remove bf16 support for now.

2023-07-26 14:16:37 +01:00

10 lines

321 B

Plaintext

Raw Blame History

 // Copyright (c) 2023, Tri Dao.
 // Splitting the different head dimensions to different files to speed up compilation.
 #include "flash_fwd_launch_template.h"
 template<> void run_mha_fwd_<cutlass::half_t, 256>(Flash_fwd_params &params, cudaStream_t stream) {
     run_mha_fwd_hdim256<cutlass::half_t>(params, stream);
 }