|
117f014b55
|
Add where_cond and properly apply the causal mask.
|
2023-06-25 21:08:03 +01:00 |
|
|
817e4b5005
|
Rework the embeddings so that it works on non-contiguous weights + factor out some code.
|
2023-06-25 17:37:47 +01:00 |
|
|
3852a85af3
|
Boilerplate code for the sum operator.
|
2023-06-25 09:35:17 +01:00 |
|
|
d6cb4f1c53
|
Add the source offset when copying the data around.
|
2023-06-24 08:35:49 +01:00 |
|
|
d0a91db8fd
|
Softmax cpu implementation.
|
2023-06-23 22:26:53 +01:00 |
|
|
8443963d4f
|
Skeleton implementation for softmax.
|
2023-06-23 22:00:13 +01:00 |
|
|
5d44e76e3f
|
Add the casting operation.
|
2023-06-23 21:22:07 +01:00 |
|
|
92da45879c
|
Dummy broadcast placeholder functions.
|
2023-06-23 14:07:05 +01:00 |
|
|
2fb87edda5
|
Address comments.
|
2023-06-23 13:43:18 +02:00 |
|
|
5e54f37fe1
|
Adding embedding op (not generic gather, no select).
|
2023-06-23 13:13:26 +02:00 |
|
|
1a90f9d3a6
|
Cuda implementation for copying data around.
|
2023-06-23 11:18:29 +01:00 |
|
|
4712dcc2f6
|
Actually copy the data around in cat (cpu only).
|
2023-06-23 10:24:02 +01:00 |
|
|
6110db31c9
|
Add the cat operator (without the storage implementation for now).
|
2023-06-23 10:13:37 +01:00 |
|
|
cc78900922
|
Start adding the cublas based matmul.
|
2023-06-22 18:45:10 +01:00 |
|
|
683730c21d
|
Add the cublas handle to the cuda device.
|
2023-06-22 18:03:53 +01:00 |
|
|
7d9a8ff3f9
|
Do not ignore errors when cloning the storage.
|
2023-06-22 16:29:18 +01:00 |
|
|
836ad5f76c
|
Remove one level of indirection for the binary and unary ops.
|
2023-06-22 15:20:51 +01:00 |
|
|
5276755fb3
|
Add cuda support for unary ops.
|
2023-06-22 15:12:59 +01:00 |
|
|
b8f514d9c6
|
Add more binary kernels.
|
2023-06-22 14:07:02 +01:00 |
|
|
e1eb86db61
|
Add some first binary op (add).
|
2023-06-22 13:52:02 +01:00 |
|
|
ce977b489e
|
Adding matmul?
|
2023-06-22 12:25:58 +02:00 |
|
|
97d9142dee
|
Add a first kernel.
|
2023-06-21 20:48:22 +01:00 |
|
|
71735c7a02
|
Move the data between the host and the device.
|
2023-06-21 19:43:25 +01:00 |
|
|
2bfe8f18ab
|
Start adding support for cuda.
|
2023-06-21 18:11:56 +01:00 |
|
|
eb52b9b343
|
Move the cpu backend specific bits apart.
|
2023-06-21 10:25:56 +01:00 |
|
|
8cde0c5478
|
Add some skeleton code for GPU support.
|
2023-06-21 09:13:57 +01:00 |
|
|
3a5405ca6d
|
Move the StridedIndex in its own module.
|
2023-06-21 07:44:36 +01:00 |
|
|
78bac0ed32
|
Add a couple operators.
|
2023-06-20 22:32:11 +01:00 |
|
|
f1f372b13e
|
Add the affine transformation.
|
2023-06-20 21:51:35 +01:00 |
|
|
98b423145a
|
Bugfix for the contiguous strides.
|
2023-06-20 13:35:07 +01:00 |
|
|
d9cb1917ce
|
Add some unary ops.
|
2023-06-20 12:04:01 +01:00 |
|
|
f5b0aa815a
|
Get the addition/multiplication to work.
|
2023-06-20 11:07:59 +01:00 |
|
|
6c5fc767a8
|
Add the slice indexing.
|
2023-06-20 10:50:58 +01:00 |
|
|
786544292d
|
Add more to the binary operators.
|
2023-06-20 09:49:40 +01:00 |
|
|
bcae61b7f2
|
Cosmetic changes.
|
2023-06-19 21:30:03 +01:00 |
|
|
26d6288eb6
|
Add an easy way to create tensor objects.
|
2023-06-19 20:59:26 +01:00 |
|
|
8e2c534d1f
|
Flesh out some ops bits.
|
2023-06-19 19:28:33 +01:00 |
|
|
ce718bb807
|
Add the op.
|
2023-06-19 18:34:54 +01:00 |
|
|
844704de5c
|
Split the tensor file.
|
2023-06-19 17:34:13 +01:00 |
|