* Add the i64 dtype. * Adapt the cuda kernels.
* Cuda support for the mnist training. * min/max fix + testing. * Add the argmin/argmax tests. * More cuda support for argmin/argmax. * Cuda kernels for argmin and argmax.
* Add the min/max cuda kernels. * Better integration of the cuda kernels.