|
1b5f892d73
|
Add a currently wrong test for narrow.
|
2023-06-24 08:50:37 +01:00 |
|
|
d6cb4f1c53
|
Add the source offset when copying the data around.
|
2023-06-24 08:35:49 +01:00 |
|
|
4db972781f
|
Handle copying for the u32 type.
|
2023-06-24 08:24:06 +01:00 |
|
|
dd657397b2
|
Skeleton implementation for the narrow method and op.
|
2023-06-24 08:17:35 +01:00 |
|
|
3deacba5f9
|
Reshape can now return a view.
|
2023-06-24 07:14:09 +01:00 |
|
|
47f9c48e7c
|
Avoid duplicating the storage by refcounting it.
|
2023-06-24 07:03:21 +01:00 |
|
|
b4653e41be
|
Helper function to build 3d arrays.
|
2023-06-24 06:29:06 +01:00 |
|
|
ae5dc5fbc6
|
Softmax tests + fix.
|
2023-06-23 22:46:36 +01:00 |
|
|
d0a91db8fd
|
Softmax cpu implementation.
|
2023-06-23 22:26:53 +01:00 |
|
|
8443963d4f
|
Skeleton implementation for softmax.
|
2023-06-23 22:00:13 +01:00 |
|
|
5d44e76e3f
|
Add the casting operation.
|
2023-06-23 21:22:07 +01:00 |
|
|
8ed350dc94
|
Add a couple unitary ops.
|
2023-06-23 20:19:20 +01:00 |
|
|
fe75a01188
|
Cleanup the tensor creation code.
|
2023-06-23 19:52:21 +01:00 |
|
|
88187b784b
|
Also optimize the contiguous case for the binary cuda kernels.
|
2023-06-23 19:04:13 +01:00 |
|
|
5ca309ecb0
|
Optimize the unary cuda kernels for the contiguous case.
|
2023-06-23 18:40:15 +01:00 |
|
|
4f9f14a06b
|
Optimize the cpu backend for the contiguous cases.
|
2023-06-23 18:08:55 +01:00 |
|
|
132859df75
|
Add some transpose tests.
|
2023-06-23 17:49:53 +01:00 |
|
|
691f7d8e0f
|
Cosmetic fix.
|
2023-06-23 16:43:45 +01:00 |
|
|
69f91b36f9
|
More backprop support for broadcasting ops.
|
2023-06-23 16:35:10 +01:00 |
|
|
d839d5d9fd
|
Basic support for broadcasting backprop.
|
2023-06-23 16:31:44 +01:00 |
|
|
1936a1f0a3
|
Bugfix for the strided copy + add some assertions.
|
2023-06-23 16:28:18 +01:00 |
|
|
bcfbb1dca1
|
More efficient CPU broadcasting implementation.
|
2023-06-23 16:23:12 +01:00 |
|
|
10a5807dff
|
Broadcast cpu implementation.
|
2023-06-23 16:16:52 +01:00 |
|
|
83e75b3af8
|
Optimize for the unstrided case.
|
2023-06-23 15:49:11 +01:00 |
|
|
4c8931d2e4
|
More u32 support.
|
2023-06-23 14:54:03 +01:00 |
|
|
08394f7924
|
Binary op for u32.
|
2023-06-23 14:50:52 +01:00 |
|
|
92da45879c
|
Dummy broadcast placeholder functions.
|
2023-06-23 14:07:05 +01:00 |
|
|
f8848db001
|
Fix the gelu kernel for f16.
|
2023-06-23 13:38:54 +01:00 |
|
|
db5526d51a
|
Merge pull request #8 from LaurentMazare/fix_cuda
Backport.
|
2023-06-23 14:27:01 +02:00 |
|
|
8add5a5f49
|
Backport.
|
2023-06-23 14:17:39 +02:00 |
|
|
7c1625f6a5
|
Merge pull request #6 from LaurentMazare/add_embedding
Adding embedding op (not generic gather, no select).
|
2023-06-23 13:49:13 +02:00 |
|
|
2fb87edda5
|
Address comments.
|
2023-06-23 13:43:18 +02:00 |
|
|
52c503ba8f
|
Handle the contiguous case in an optimized way when copying cpu memory.
|
2023-06-23 12:20:16 +01:00 |
|
|
d4054ab500
|
Merge pull request #5 from LaurentMazare/add_gelu
Creating Gelu op (no backward).
|
2023-06-23 13:17:37 +02:00 |
|
|
96289bce08
|
Rebase.
|
2023-06-23 13:17:21 +02:00 |
|
|
5e54f37fe1
|
Adding embedding op (not generic gather, no select).
|
2023-06-23 13:13:26 +02:00 |
|
|
09b7731b8d
|
Fix unary op.
|
2023-06-23 13:10:26 +02:00 |
|
|
56ae71dd4c
|
Address comments.
|
2023-06-23 13:08:04 +02:00 |
|
|
fd21c708ab
|
Creating Gelu op (no backward).
|
2023-06-23 13:07:39 +02:00 |
|
|
4ffdeb4e23
|
Optimize for the contiguous case.
|
2023-06-23 11:23:49 +01:00 |
|
|
1a90f9d3a6
|
Cuda implementation for copying data around.
|
2023-06-23 11:18:29 +01:00 |
|
|
79e4b29c2f
|
Add the reshape method and operation (without grad for now).
|
2023-06-23 10:51:05 +01:00 |
|
|
c4c6167949
|
Add the continuous method.
|
2023-06-23 10:45:20 +01:00 |
|
|
4712dcc2f6
|
Actually copy the data around in cat (cpu only).
|
2023-06-23 10:24:02 +01:00 |
|
|
6110db31c9
|
Add the cat operator (without the storage implementation for now).
|
2023-06-23 10:13:37 +01:00 |
|
|
bf9e1d1c23
|
Add the detach method.
|
2023-06-23 09:19:23 +01:00 |
|
|
3e7cb18d7f
|
Handle tensor transfers between devices in the backprop.
|
2023-06-23 08:55:34 +01:00 |
|
|
3f79d81b6f
|
Add transposition around arbitrary axis.
|
2023-06-23 08:51:13 +01:00 |
|
|
27d428af1a
|
Add the backward pass for transpose.
|
2023-06-23 08:43:05 +01:00 |
|
|
3b550a56dc
|
Transfer tensors between devices.
|
2023-06-23 08:35:22 +01:00 |
|