|
5952c3fa91
|
Cleanup the broadcast setup.
|
2023-06-26 10:49:34 +01:00 |
|
|
117f014b55
|
Add where_cond and properly apply the causal mask.
|
2023-06-25 21:08:03 +01:00 |
|
|
817e4b5005
|
Rework the embeddings so that it works on non-contiguous weights + factor out some code.
|
2023-06-25 17:37:47 +01:00 |
|
|
118cc30908
|
Add some currently broken tests.
|
2023-06-25 14:55:25 +01:00 |
|
|
bb6450ebbb
|
Bugfix for Tensor::cat + add some tests.
|
2023-06-25 14:20:42 +01:00 |
|
|
ba0693a908
|
Fix the reduce_sum implementation and add some tests.
|
2023-06-25 10:55:04 +01:00 |
|
|
0f369dd870
|
Add the cpu implementation for reduce_sum.
|
2023-06-25 10:37:04 +01:00 |
|
|
3852a85af3
|
Boilerplate code for the sum operator.
|
2023-06-25 09:35:17 +01:00 |
|
|
d6cb4f1c53
|
Add the source offset when copying the data around.
|
2023-06-24 08:35:49 +01:00 |
|
|
4db972781f
|
Handle copying for the u32 type.
|
2023-06-24 08:24:06 +01:00 |
|
|
ae5dc5fbc6
|
Softmax tests + fix.
|
2023-06-23 22:46:36 +01:00 |
|
|
d0a91db8fd
|
Softmax cpu implementation.
|
2023-06-23 22:26:53 +01:00 |
|
|
8443963d4f
|
Skeleton implementation for softmax.
|
2023-06-23 22:00:13 +01:00 |
|
|
5d44e76e3f
|
Add the casting operation.
|
2023-06-23 21:22:07 +01:00 |
|
|
4f9f14a06b
|
Optimize the cpu backend for the contiguous cases.
|
2023-06-23 18:08:55 +01:00 |
|
|
1936a1f0a3
|
Bugfix for the strided copy + add some assertions.
|
2023-06-23 16:28:18 +01:00 |
|
|
bcfbb1dca1
|
More efficient CPU broadcasting implementation.
|
2023-06-23 16:23:12 +01:00 |
|
|
10a5807dff
|
Broadcast cpu implementation.
|
2023-06-23 16:16:52 +01:00 |
|
|
83e75b3af8
|
Optimize for the unstrided case.
|
2023-06-23 15:49:11 +01:00 |
|
|
08394f7924
|
Binary op for u32.
|
2023-06-23 14:50:52 +01:00 |
|
|
92da45879c
|
Dummy broadcast placeholder functions.
|
2023-06-23 14:07:05 +01:00 |
|
|
7c1625f6a5
|
Merge pull request #6 from LaurentMazare/add_embedding
Adding embedding op (not generic gather, no select).
|
2023-06-23 13:49:13 +02:00 |
|
|
52c503ba8f
|
Handle the contiguous case in an optimized way when copying cpu memory.
|
2023-06-23 12:20:16 +01:00 |
|
|
96289bce08
|
Rebase.
|
2023-06-23 13:17:21 +02:00 |
|
|
5e54f37fe1
|
Adding embedding op (not generic gather, no select).
|
2023-06-23 13:13:26 +02:00 |
|
|
4712dcc2f6
|
Actually copy the data around in cat (cpu only).
|
2023-06-23 10:24:02 +01:00 |
|
|
3b550a56dc
|
Transfer tensors between devices.
|
2023-06-23 08:35:22 +01:00 |
|
|
836ad5f76c
|
Remove one level of indirection for the binary and unary ops.
|
2023-06-22 15:20:51 +01:00 |
|
|
a8b6c848e0
|
Final updates.
|
2023-06-22 12:39:33 +02:00 |
|
|
04cf14f35a
|
Moving to gemm and adding matmul backprop.
- Tentative `T` operator.
|
2023-06-22 12:37:02 +02:00 |
|
|
ce977b489e
|
Adding matmul?
|
2023-06-22 12:25:58 +02:00 |
|
|
68f525f321
|
Move more bits to the backend part.
|
2023-06-21 10:34:51 +01:00 |
|
|
eb52b9b343
|
Move the cpu backend specific bits apart.
|
2023-06-21 10:25:56 +01:00 |
|