Commit Graph

39 Commits

Author SHA1 Message Date
117f014b55 Add where_cond and properly apply the causal mask. 2023-06-25 21:08:03 +01:00
817e4b5005 Rework the embeddings so that it works on non-contiguous weights + factor out some code. 2023-06-25 17:37:47 +01:00
3852a85af3 Boilerplate code for the sum operator. 2023-06-25 09:35:17 +01:00
d6cb4f1c53 Add the source offset when copying the data around. 2023-06-24 08:35:49 +01:00
d0a91db8fd Softmax cpu implementation. 2023-06-23 22:26:53 +01:00
8443963d4f Skeleton implementation for softmax. 2023-06-23 22:00:13 +01:00
5d44e76e3f Add the casting operation. 2023-06-23 21:22:07 +01:00
92da45879c Dummy broadcast placeholder functions. 2023-06-23 14:07:05 +01:00
2fb87edda5 Address comments. 2023-06-23 13:43:18 +02:00
5e54f37fe1 Adding embedding op (not generic gather, no select). 2023-06-23 13:13:26 +02:00
1a90f9d3a6 Cuda implementation for copying data around. 2023-06-23 11:18:29 +01:00
4712dcc2f6 Actually copy the data around in cat (cpu only). 2023-06-23 10:24:02 +01:00
6110db31c9 Add the cat operator (without the storage implementation for now). 2023-06-23 10:13:37 +01:00
cc78900922 Start adding the cublas based matmul. 2023-06-22 18:45:10 +01:00
683730c21d Add the cublas handle to the cuda device. 2023-06-22 18:03:53 +01:00
7d9a8ff3f9 Do not ignore errors when cloning the storage. 2023-06-22 16:29:18 +01:00
836ad5f76c Remove one level of indirection for the binary and unary ops. 2023-06-22 15:20:51 +01:00
5276755fb3 Add cuda support for unary ops. 2023-06-22 15:12:59 +01:00
b8f514d9c6 Add more binary kernels. 2023-06-22 14:07:02 +01:00
e1eb86db61 Add some first binary op (add). 2023-06-22 13:52:02 +01:00
ce977b489e Adding matmul? 2023-06-22 12:25:58 +02:00
97d9142dee Add a first kernel. 2023-06-21 20:48:22 +01:00
71735c7a02 Move the data between the host and the device. 2023-06-21 19:43:25 +01:00
2bfe8f18ab Start adding support for cuda. 2023-06-21 18:11:56 +01:00
eb52b9b343 Move the cpu backend specific bits apart. 2023-06-21 10:25:56 +01:00
8cde0c5478 Add some skeleton code for GPU support. 2023-06-21 09:13:57 +01:00
3a5405ca6d Move the StridedIndex in its own module. 2023-06-21 07:44:36 +01:00
78bac0ed32 Add a couple operators. 2023-06-20 22:32:11 +01:00
f1f372b13e Add the affine transformation. 2023-06-20 21:51:35 +01:00
98b423145a Bugfix for the contiguous strides. 2023-06-20 13:35:07 +01:00
d9cb1917ce Add some unary ops. 2023-06-20 12:04:01 +01:00
f5b0aa815a Get the addition/multiplication to work. 2023-06-20 11:07:59 +01:00
6c5fc767a8 Add the slice indexing. 2023-06-20 10:50:58 +01:00
786544292d Add more to the binary operators. 2023-06-20 09:49:40 +01:00
bcae61b7f2 Cosmetic changes. 2023-06-19 21:30:03 +01:00
26d6288eb6 Add an easy way to create tensor objects. 2023-06-19 20:59:26 +01:00
8e2c534d1f Flesh out some ops bits. 2023-06-19 19:28:33 +01:00
ce718bb807 Add the op. 2023-06-19 18:34:54 +01:00
844704de5c Split the tensor file. 2023-06-19 17:34:13 +01:00