Commit Graph

2339 Commits

Author SHA1 Message Date
0988706c88 Support wider shapes for llama. 2023-06-24 20:08:18 +01:00
6b2cd9c51c Add the broadcast operator. 2023-06-24 19:16:03 +01:00
96c098b6cd Remove the unecessary features. 2023-06-24 18:15:44 +01:00
a7f80e258f Read and write npy files. 2023-06-24 18:12:10 +01:00
a6ca9baf3c Backprop for narrow. 2023-06-24 15:17:57 +01:00
fbbf3951dd More narrow testing. 2023-06-24 15:10:31 +01:00
0f34738831 Fix the cpu implementation for narrow. 2023-06-24 15:01:32 +01:00
1b5f892d73 Add a currently wrong test for narrow. 2023-06-24 08:50:37 +01:00
d6cb4f1c53 Add the source offset when copying the data around. 2023-06-24 08:35:49 +01:00
4db972781f Handle copying for the u32 type. 2023-06-24 08:24:06 +01:00
dd657397b2 Skeleton implementation for the narrow method and op. 2023-06-24 08:17:35 +01:00
3deacba5f9 Reshape can now return a view. 2023-06-24 07:14:09 +01:00
47f9c48e7c Avoid duplicating the storage by refcounting it. 2023-06-24 07:03:21 +01:00
b4653e41be Helper function to build 3d arrays. 2023-06-24 06:29:06 +01:00
ae5dc5fbc6 Softmax tests + fix. 2023-06-23 22:46:36 +01:00
d0a91db8fd Softmax cpu implementation. 2023-06-23 22:26:53 +01:00
8443963d4f Skeleton implementation for softmax. 2023-06-23 22:00:13 +01:00
5d44e76e3f Add the casting operation. 2023-06-23 21:22:07 +01:00
8ed350dc94 Add a couple unitary ops. 2023-06-23 20:19:20 +01:00
fe75a01188 Cleanup the tensor creation code. 2023-06-23 19:52:21 +01:00
88187b784b Also optimize the contiguous case for the binary cuda kernels. 2023-06-23 19:04:13 +01:00
5ca309ecb0 Optimize the unary cuda kernels for the contiguous case. 2023-06-23 18:40:15 +01:00
4f9f14a06b Optimize the cpu backend for the contiguous cases. 2023-06-23 18:08:55 +01:00
132859df75 Add some transpose tests. 2023-06-23 17:49:53 +01:00
691f7d8e0f Cosmetic fix. 2023-06-23 16:43:45 +01:00
69f91b36f9 More backprop support for broadcasting ops. 2023-06-23 16:35:10 +01:00
d839d5d9fd Basic support for broadcasting backprop. 2023-06-23 16:31:44 +01:00
1936a1f0a3 Bugfix for the strided copy + add some assertions. 2023-06-23 16:28:18 +01:00
bcfbb1dca1 More efficient CPU broadcasting implementation. 2023-06-23 16:23:12 +01:00
10a5807dff Broadcast cpu implementation. 2023-06-23 16:16:52 +01:00
83e75b3af8 Optimize for the unstrided case. 2023-06-23 15:49:11 +01:00
4c8931d2e4 More u32 support. 2023-06-23 14:54:03 +01:00
08394f7924 Binary op for u32. 2023-06-23 14:50:52 +01:00
92da45879c Dummy broadcast placeholder functions. 2023-06-23 14:07:05 +01:00
f8848db001 Fix the gelu kernel for f16. 2023-06-23 13:38:54 +01:00
db5526d51a Merge pull request #8 from LaurentMazare/fix_cuda
Backport.
2023-06-23 14:27:01 +02:00
8add5a5f49 Backport. 2023-06-23 14:17:39 +02:00
7c1625f6a5 Merge pull request #6 from LaurentMazare/add_embedding
Adding embedding op (not generic gather, no select).
2023-06-23 13:49:13 +02:00
2fb87edda5 Address comments. 2023-06-23 13:43:18 +02:00
52c503ba8f Handle the contiguous case in an optimized way when copying cpu memory. 2023-06-23 12:20:16 +01:00
d4054ab500 Merge pull request #5 from LaurentMazare/add_gelu
Creating Gelu op (no backward).
2023-06-23 13:17:37 +02:00
96289bce08 Rebase. 2023-06-23 13:17:21 +02:00
5e54f37fe1 Adding embedding op (not generic gather, no select). 2023-06-23 13:13:26 +02:00
09b7731b8d Fix unary op. 2023-06-23 13:10:26 +02:00
56ae71dd4c Address comments. 2023-06-23 13:08:04 +02:00
fd21c708ab Creating Gelu op (no backward). 2023-06-23 13:07:39 +02:00
4ffdeb4e23 Optimize for the contiguous case. 2023-06-23 11:23:49 +01:00
1a90f9d3a6 Cuda implementation for copying data around. 2023-06-23 11:18:29 +01:00
79e4b29c2f Add the reshape method and operation (without grad for now). 2023-06-23 10:51:05 +01:00
c4c6167949 Add the continuous method. 2023-06-23 10:45:20 +01:00