Again set a few extra params in flash-attn. (#245)

* Again set a few extra params.

* Use the appropriate kernel sizes.

* Add all the kernel sizes.

* Parallel compiling.

* Reduce the amount of parallelism.

* Add the missing kernel.

* Fix a typo.

* Remove bf16 support for now.
This commit is contained in:
Laurent Mazare
2023-07-26 14:16:37 +01:00
committed by GitHub
parent fa2b64d678
commit 2ce5f12513
21 changed files with 476 additions and 116 deletions

View File

@ -16,3 +16,5 @@ half = { version = "2.3.1", features = ["num-traits"] }
[build-dependencies]
anyhow = { version = "1", features = ["backtrace"] }
num_cpus = "1.15.0"
rayon = "1.7.0"