Commit Graph

3 Commits

Author SHA1 Message Date
e676f85f00 Sketch a fast cuda kernel for reduce-sum. (#109)
* Sketch a fast cuda kernel for reduce-sum.

* Sketch the rust support code for the fast sum kernel.

* More work on the fast kernel.

* Add some testing ground.

* A couple fixes for the fast sum kernel.
2023-07-08 12:43:56 +01:00
ec79fc43f2 Add the bf16 cuda kernels. 2023-06-29 23:12:02 +01:00
d7f729fb8f Refactor the hierarchy. 2023-06-27 11:57:27 +02:00