Commit Graph

8 Commits

Author SHA1 Message Date
94c6a8d3d3 Add a dedicated cuda kernel for softmax. (#746) 2023-09-05 17:53:20 +02:00
9a5c7db91a Add support for i64 (#563)
* Add the i64 dtype.

* Adapt the cuda kernels.
2023-08-23 10:42:19 +01:00
c950a5c6b1 Cuda support for the mnist training. (#277)
* Cuda support for the mnist training.

* min/max fix + testing.

* Add the argmin/argmax tests.

* More cuda support for argmin/argmax.

* Cuda kernels for argmin and argmax.
2023-07-29 19:48:04 +01:00
23827c49cd Cleanup some todos. (#226)
* Cleanup some todos.

* Fix more todo.

* Optimize for the contiguous case.

* Add the IntDType trait.

* Handle the intdtype trait for more ops.

* Remove a todo.

* Remove a todo.
2023-07-23 16:00:00 +01:00
536c5e702e Cuda kernels for fast min/max reductions (#203)
* Add the min/max cuda kernels.

* Better integration of the cuda kernels.
2023-07-19 18:12:27 +01:00
e676f85f00 Sketch a fast cuda kernel for reduce-sum. (#109)
* Sketch a fast cuda kernel for reduce-sum.

* Sketch the rust support code for the fast sum kernel.

* More work on the fast kernel.

* Add some testing ground.

* A couple fixes for the fast sum kernel.
2023-07-08 12:43:56 +01:00
ec79fc43f2 Add the bf16 cuda kernels. 2023-06-29 23:12:02 +01:00
d7f729fb8f Refactor the hierarchy. 2023-06-27 11:57:27 +02:00