Commit Graph

18 Commits

Author SHA1 Message Date
1936a1f0a3 Bugfix for the strided copy + add some assertions. 2023-06-23 16:28:18 +01:00
bcfbb1dca1 More efficient CPU broadcasting implementation. 2023-06-23 16:23:12 +01:00
10a5807dff Broadcast cpu implementation. 2023-06-23 16:16:52 +01:00
83e75b3af8 Optimize for the unstrided case. 2023-06-23 15:49:11 +01:00
08394f7924 Binary op for u32. 2023-06-23 14:50:52 +01:00
92da45879c Dummy broadcast placeholder functions. 2023-06-23 14:07:05 +01:00
7c1625f6a5 Merge pull request #6 from LaurentMazare/add_embedding
Adding embedding op (not generic gather, no select).
2023-06-23 13:49:13 +02:00
52c503ba8f Handle the contiguous case in an optimized way when copying cpu memory. 2023-06-23 12:20:16 +01:00
96289bce08 Rebase. 2023-06-23 13:17:21 +02:00
5e54f37fe1 Adding embedding op (not generic gather, no select). 2023-06-23 13:13:26 +02:00
4712dcc2f6 Actually copy the data around in cat (cpu only). 2023-06-23 10:24:02 +01:00
3b550a56dc Transfer tensors between devices. 2023-06-23 08:35:22 +01:00
836ad5f76c Remove one level of indirection for the binary and unary ops. 2023-06-22 15:20:51 +01:00
a8b6c848e0 Final updates. 2023-06-22 12:39:33 +02:00
04cf14f35a Moving to gemm and adding matmul backprop.
- Tentative `T` operator.
2023-06-22 12:37:02 +02:00
ce977b489e Adding matmul? 2023-06-22 12:25:58 +02:00
68f525f321 Move more bits to the backend part. 2023-06-21 10:34:51 +01:00
eb52b9b343 Move the cpu backend specific bits apart. 2023-06-21 10:25:56 +01:00