Commit Graph

33 Commits

Author SHA1 Message Date
5952c3fa91 Cleanup the broadcast setup. 2023-06-26 10:49:34 +01:00
117f014b55 Add where_cond and properly apply the causal mask. 2023-06-25 21:08:03 +01:00
817e4b5005 Rework the embeddings so that it works on non-contiguous weights + factor out some code. 2023-06-25 17:37:47 +01:00
118cc30908 Add some currently broken tests. 2023-06-25 14:55:25 +01:00
bb6450ebbb Bugfix for Tensor::cat + add some tests. 2023-06-25 14:20:42 +01:00
ba0693a908 Fix the reduce_sum implementation and add some tests. 2023-06-25 10:55:04 +01:00
0f369dd870 Add the cpu implementation for reduce_sum. 2023-06-25 10:37:04 +01:00
3852a85af3 Boilerplate code for the sum operator. 2023-06-25 09:35:17 +01:00
d6cb4f1c53 Add the source offset when copying the data around. 2023-06-24 08:35:49 +01:00
4db972781f Handle copying for the u32 type. 2023-06-24 08:24:06 +01:00
ae5dc5fbc6 Softmax tests + fix. 2023-06-23 22:46:36 +01:00
d0a91db8fd Softmax cpu implementation. 2023-06-23 22:26:53 +01:00
8443963d4f Skeleton implementation for softmax. 2023-06-23 22:00:13 +01:00
5d44e76e3f Add the casting operation. 2023-06-23 21:22:07 +01:00
4f9f14a06b Optimize the cpu backend for the contiguous cases. 2023-06-23 18:08:55 +01:00
1936a1f0a3 Bugfix for the strided copy + add some assertions. 2023-06-23 16:28:18 +01:00
bcfbb1dca1 More efficient CPU broadcasting implementation. 2023-06-23 16:23:12 +01:00
10a5807dff Broadcast cpu implementation. 2023-06-23 16:16:52 +01:00
83e75b3af8 Optimize for the unstrided case. 2023-06-23 15:49:11 +01:00
08394f7924 Binary op for u32. 2023-06-23 14:50:52 +01:00
92da45879c Dummy broadcast placeholder functions. 2023-06-23 14:07:05 +01:00
7c1625f6a5 Merge pull request #6 from LaurentMazare/add_embedding
Adding embedding op (not generic gather, no select).
2023-06-23 13:49:13 +02:00
52c503ba8f Handle the contiguous case in an optimized way when copying cpu memory. 2023-06-23 12:20:16 +01:00
96289bce08 Rebase. 2023-06-23 13:17:21 +02:00
5e54f37fe1 Adding embedding op (not generic gather, no select). 2023-06-23 13:13:26 +02:00
4712dcc2f6 Actually copy the data around in cat (cpu only). 2023-06-23 10:24:02 +01:00
3b550a56dc Transfer tensors between devices. 2023-06-23 08:35:22 +01:00
836ad5f76c Remove one level of indirection for the binary and unary ops. 2023-06-22 15:20:51 +01:00
a8b6c848e0 Final updates. 2023-06-22 12:39:33 +02:00
04cf14f35a Moving to gemm and adding matmul backprop.
- Tentative `T` operator.
2023-06-22 12:37:02 +02:00
ce977b489e Adding matmul? 2023-06-22 12:25:58 +02:00
68f525f321 Move more bits to the backend part. 2023-06-21 10:34:51 +01:00
eb52b9b343 Move the cpu backend specific bits apart. 2023-06-21 10:25:56 +01:00