Commit Graph

29 Commits

Author SHA1 Message Date
4c8931d2e4 More u32 support. 2023-06-23 14:54:03 +01:00
92da45879c Dummy broadcast placeholder functions. 2023-06-23 14:07:05 +01:00
8add5a5f49 Backport. 2023-06-23 14:17:39 +02:00
5e54f37fe1 Adding embedding op (not generic gather, no select). 2023-06-23 13:13:26 +02:00
4ffdeb4e23 Optimize for the contiguous case. 2023-06-23 11:23:49 +01:00
1a90f9d3a6 Cuda implementation for copying data around. 2023-06-23 11:18:29 +01:00
3b550a56dc Transfer tensors between devices. 2023-06-23 08:35:22 +01:00
2231c717d5 Fix the matmul example. 2023-06-22 21:11:41 +01:00
6463d661d8 Tweaks. 2023-06-22 20:25:14 +01:00
aebffcfc13 Add a matmul cuda example. 2023-06-22 19:44:26 +01:00
0671b8c369 Improve the gemm config. 2023-06-22 19:24:02 +01:00
cc78900922 Start adding the cublas based matmul. 2023-06-22 18:45:10 +01:00
683730c21d Add the cublas handle to the cuda device. 2023-06-22 18:03:53 +01:00
7d9a8ff3f9 Do not ignore errors when cloning the storage. 2023-06-22 16:29:18 +01:00
065b7a19c7 Stride support for unary ops. 2023-06-22 15:46:34 +01:00
5b1ab5b687 Support strides in affine. 2023-06-22 15:38:42 +01:00
836ad5f76c Remove one level of indirection for the binary and unary ops. 2023-06-22 15:20:51 +01:00
5276755fb3 Add cuda support for unary ops. 2023-06-22 15:12:59 +01:00
b8f514d9c6 Add more binary kernels. 2023-06-22 14:07:02 +01:00
e1eb86db61 Add some first binary op (add). 2023-06-22 13:52:02 +01:00
083ced4428 Integrate the kernels bits. 2023-06-22 09:59:00 +01:00
1309932933 Polish a bit the kernel loading. 2023-06-22 09:16:43 +01:00
0a758ffa05 Add the fill kernel and use it for 'ones'. 2023-06-22 08:33:32 +01:00
fc26bab3ed Add some specific errors rather than panicking. 2023-06-22 07:51:53 +01:00
7c46de9584 Check that the tensor is contiguous before applying the kernel. 2023-06-21 21:28:59 +01:00
304a557d84 Add a dummy module. 2023-06-21 21:16:00 +01:00
97d9142dee Add a first kernel. 2023-06-21 20:48:22 +01:00
deb6091099 Use a type alias for cuda errors. 2023-06-21 19:50:00 +01:00
71735c7a02 Move the data between the host and the device. 2023-06-21 19:43:25 +01:00