a5c5a893aa
add max_pool2d ( #371 )
...
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local >
2023-08-09 18:05:26 +01:00
b5bb5e056d
Add more conv2d support. ( #340 )
...
* Add more conv2d support.
* Conv2d cpu work.
* Conv2d output shape.
2023-08-08 06:04:32 +01:00
d0d7010682
CPU implementation for upsample-nearest2d. ( #339 )
2023-08-07 20:07:10 +01:00
fc265d9dcf
Some CLIP fixes for stable diffusion. ( #338 )
...
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
2345b8ce3f
Skeleton for the avg-pool2d and upsample-nearest2d ops. ( #337 )
...
* Skeleton for the avg-pool2d and upsample-nearest2d ops.
* Preliminary conv2d support.
2023-08-07 16:15:38 +01:00
4b3bd79fbd
Remove the embedding ops in favor of index-select. ( #299 )
...
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
3eb2bc6d07
Softmax numerical stability. ( #267 )
...
* Softmax numerical stability.
* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
23827c49cd
Cleanup some todos. ( #226 )
...
* Cleanup some todos.
* Fix more todo.
* Optimize for the contiguous case.
* Add the IntDType trait.
* Handle the intdtype trait for more ops.
* Remove a todo.
* Remove a todo.
2023-07-23 16:00:00 +01:00
52c5d8c087
Add the gather op. ( #219 )
...
* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
2023-07-22 07:21:28 +01:00
27174a82aa
Start adding index-add.
2023-07-21 20:12:48 +01:00
5cc843550d
Add binary and ternary custom ops. ( #217 )
2023-07-21 17:29:50 +01:00
a6bcdfb269
Custom ops with a single argument ( #214 )
...
* Add the CustomOp1 trait.
* Add an example of custom op.
* Polish the custom op example.
* Add some backward pass test for custom ops.
2023-07-21 15:18:05 +01:00
fa08fb3126
Add the index-select op. ( #209 )
...
* Add the index-select op.
* Cpu implementation of index-select.
* Add the cpu implementation for index-select.
2023-07-20 14:01:03 +01:00
2a8f28d687
Op refactor ( #208 )
...
* Add the binary and unary op enums to factorize some code.
* Bugfix.
2023-07-20 12:28:45 +01:00
e9c052bf94
Add the comparison operations. ( #207 )
...
* Add the comparison operations.
* Add the helper functions on the tensor side.
* More cmp operations.
* Cpu implementation for the comparison operations.
2023-07-20 09:40:31 +01:00
cb687b4897
Add some more developed training examples. ( #199 )
...
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
2023-07-19 15:37:52 +01:00
d88b6cdca9
Add backtrace information to errors where relevant. ( #166 )
...
* Add backtrace information to errors where relevant.
* More backtrace information.
* Add to the FAQ.
2023-07-14 09:31:25 +01:00
64264d97c1
Modular backends ( #138 )
...
* Add some trait to formalize backends.
* Use the generic backend trait.
2023-07-11 11:17:02 +01:00
270997a055
Add the elu op. ( #113 )
2023-07-09 21:56:31 +01:00
a424d95473
Add more of the conv1d op.
2023-07-04 11:15:45 +01:00
3aac1047fe
Sketch the conv1d op.
2023-07-04 10:52:34 +01:00
122e334d0c
Simplify the pattern matching logic in the cuda backend.
2023-06-29 09:21:11 +01:00
3f0d9fbb25
Adapt the cuda bits.
2023-06-28 15:43:03 +01:00
14449ff80c
Get the cpu backend to compile.
2023-06-28 14:12:38 +01:00
303b853098
Propagate the layout refactoring.
2023-06-28 13:42:23 +01:00
30b355ccd2
Simplify the narrow implementation.
2023-06-28 13:09:59 +01:00
c1bbbf94f6
Start refactoring the stride.
2023-06-28 12:57:30 +01:00
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00