b5bb5e056d
Add more conv2d support. ( #340 )
...
* Add more conv2d support.
* Conv2d cpu work.
* Conv2d output shape.
2023-08-08 06:04:32 +01:00
d0d7010682
CPU implementation for upsample-nearest2d. ( #339 )
2023-08-07 20:07:10 +01:00
fc265d9dcf
Some CLIP fixes for stable diffusion. ( #338 )
...
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
4b3bd79fbd
Remove the embedding ops in favor of index-select. ( #299 )
...
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
3eb2bc6d07
Softmax numerical stability. ( #267 )
...
* Softmax numerical stability.
* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
52c5d8c087
Add the gather op. ( #219 )
...
* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
2023-07-22 07:21:28 +01:00
27174a82aa
Start adding index-add.
2023-07-21 20:12:48 +01:00
fa08fb3126
Add the index-select op. ( #209 )
...
* Add the index-select op.
* Cpu implementation of index-select.
* Add the cpu implementation for index-select.
2023-07-20 14:01:03 +01:00
2a8f28d687
Op refactor ( #208 )
...
* Add the binary and unary op enums to factorize some code.
* Bugfix.
2023-07-20 12:28:45 +01:00
e9c052bf94
Add the comparison operations. ( #207 )
...
* Add the comparison operations.
* Add the helper functions on the tensor side.
* More cmp operations.
* Cpu implementation for the comparison operations.
2023-07-20 09:40:31 +01:00
cb687b4897
Add some more developed training examples. ( #199 )
...
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
2023-07-19 15:37:52 +01:00
dcb4a9291e
Expliciting how to enable cuda.
2023-07-14 17:08:05 +02:00
64264d97c1
Modular backends ( #138 )
...
* Add some trait to formalize backends.
* Use the generic backend trait.
2023-07-11 11:17:02 +01:00
ae79c00e48
Allow for uniform initialization in a single step. ( #136 )
2023-07-11 08:52:29 +01:00
f29b77ec19
Random initializers. ( #128 )
...
* Random initialization.
* CPU rng generation.
2023-07-10 18:26:21 +01:00
270997a055
Add the elu op. ( #113 )
2023-07-09 21:56:31 +01:00
a424d95473
Add more of the conv1d op.
2023-07-04 11:15:45 +01:00
3aac1047fe
Sketch the conv1d op.
2023-07-04 10:52:34 +01:00
122e334d0c
Simplify the pattern matching logic in the cuda backend.
2023-06-29 09:21:11 +01:00
3f0d9fbb25
Adapt the cuda bits.
2023-06-28 15:43:03 +01:00
14449ff80c
Get the cpu backend to compile.
2023-06-28 14:12:38 +01:00
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00