a2e925462c
Add the scatter in place ops. ( #2923 )
...
* Add the scatter_set op.
* Metal op.
* Cuda version.
* Merge the checks.
* Add the actual ops.
2025-04-26 07:36:49 +02:00
3827685524
Add the scatter op. ( #2921 )
...
* Add the scatter op.
* Backprop support.
* Cuda support.
2025-04-25 21:46:58 +02:00
a4c56a958e
Add the const-set op. ( #2910 )
...
* Add the const-set op.
* Cuda implementation.
* Bugfix.
* Metal cleanup.
* Add the metal kernels.
* Add some testing.
* Finish the metal implementation.
* Bump the version.
2025-04-19 10:07:02 +02:00
cd254074f3
Really unique identifier for metal device ids. ( #1932 )
...
* Really unique identifier for metal device ids.
* Same device.
2024-03-25 11:48:16 +01:00
fdfe8fd129
Preliminary support for inplace ops. ( #1921 )
...
* Preliminary support for inplace ops.
* Add a test.
2024-03-23 14:16:19 +01:00
74b7f59261
Prepare for the custom-op extension. ( #1892 )
2024-03-21 07:02:20 +01:00
ce9fbc3682
Optimize the cat operation on contiguous tensors ( #1855 )
...
* Add a specialized kernel for copy2d.
* Move the cat operations.
* Avoid transpositions in cat.
* Bugfix.
* Bugfix for the cuda kernel.
* Add a benchmark.
* Add more testing.
* Test fix.
* Faster kernel.
* Add the missing kernel.
* Tweak the test.
* Add a metal kernel.
* Fix for the metal kernel.
* Get the tests to pass on metal.
* Also use this opportunity to fix the metal kernel for ELU.
* Add some bf16 kernels.
* Clippy fixes.
2024-03-17 10:49:13 +01:00
09e0148cce
Tweaks to run metavoice on metal ( #1792 )
...
* Enable tanh + tweak conv-transpose.
* Run the encodec decoding on cpu.
* Clippy fixes.
2024-03-03 07:46:44 +01:00
26c4e5bf1d
Metal part 1 - Scaffolding for metal. ( #1308 )
...
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
2023-11-10 08:35:48 +01:00
be4555c5a5
Add the conv-transpose1d op. ( #1251 )
...
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
2023-11-03 09:44:46 +01:00
9a465e1b26
Add 1d upsampling. ( #839 )
...
* Add 1d upsampling.
* Add the interpolate functions.
2023-09-13 16:50:39 +01:00
59b731de99
Add the powf op. ( #664 )
...
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
2023-08-29 20:48:18 +01:00
3cca89cc70
Add conv-transpose. ( #635 )
...
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
2023-08-28 10:10:12 +01:00
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
a5c5a893aa
add max_pool2d ( #371 )
...
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local >
2023-08-09 18:05:26 +01:00
b5bb5e056d
Add more conv2d support. ( #340 )
...
* Add more conv2d support.
* Conv2d cpu work.
* Conv2d output shape.
2023-08-08 06:04:32 +01:00
d0d7010682
CPU implementation for upsample-nearest2d. ( #339 )
2023-08-07 20:07:10 +01:00
fc265d9dcf
Some CLIP fixes for stable diffusion. ( #338 )
...
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
2345b8ce3f
Skeleton for the avg-pool2d and upsample-nearest2d ops. ( #337 )
...
* Skeleton for the avg-pool2d and upsample-nearest2d ops.
* Preliminary conv2d support.
2023-08-07 16:15:38 +01:00
4b3bd79fbd
Remove the embedding ops in favor of index-select. ( #299 )
...
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
3eb2bc6d07
Softmax numerical stability. ( #267 )
...
* Softmax numerical stability.
* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
23827c49cd
Cleanup some todos. ( #226 )
...
* Cleanup some todos.
* Fix more todo.
* Optimize for the contiguous case.
* Add the IntDType trait.
* Handle the intdtype trait for more ops.
* Remove a todo.
* Remove a todo.
2023-07-23 16:00:00 +01:00
52c5d8c087
Add the gather op. ( #219 )
...
* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
2023-07-22 07:21:28 +01:00
27174a82aa
Start adding index-add.
2023-07-21 20:12:48 +01:00
5cc843550d
Add binary and ternary custom ops. ( #217 )
2023-07-21 17:29:50 +01:00
a6bcdfb269
Custom ops with a single argument ( #214 )
...
* Add the CustomOp1 trait.
* Add an example of custom op.
* Polish the custom op example.
* Add some backward pass test for custom ops.
2023-07-21 15:18:05 +01:00
fa08fb3126
Add the index-select op. ( #209 )
...
* Add the index-select op.
* Cpu implementation of index-select.
* Add the cpu implementation for index-select.
2023-07-20 14:01:03 +01:00
2a8f28d687
Op refactor ( #208 )
...
* Add the binary and unary op enums to factorize some code.
* Bugfix.
2023-07-20 12:28:45 +01:00
e9c052bf94
Add the comparison operations. ( #207 )
...
* Add the comparison operations.
* Add the helper functions on the tensor side.
* More cmp operations.
* Cpu implementation for the comparison operations.
2023-07-20 09:40:31 +01:00
cb687b4897
Add some more developed training examples. ( #199 )
...
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
2023-07-19 15:37:52 +01:00
d88b6cdca9
Add backtrace information to errors where relevant. ( #166 )
...
* Add backtrace information to errors where relevant.
* More backtrace information.
* Add to the FAQ.
2023-07-14 09:31:25 +01:00
64264d97c1
Modular backends ( #138 )
...
* Add some trait to formalize backends.
* Use the generic backend trait.
2023-07-11 11:17:02 +01:00
270997a055
Add the elu op. ( #113 )
2023-07-09 21:56:31 +01:00
a424d95473
Add more of the conv1d op.
2023-07-04 11:15:45 +01:00
3aac1047fe
Sketch the conv1d op.
2023-07-04 10:52:34 +01:00
122e334d0c
Simplify the pattern matching logic in the cuda backend.
2023-06-29 09:21:11 +01:00
3f0d9fbb25
Adapt the cuda bits.
2023-06-28 15:43:03 +01:00
14449ff80c
Get the cpu backend to compile.
2023-06-28 14:12:38 +01:00
303b853098
Propagate the layout refactoring.
2023-06-28 13:42:23 +01:00
30b355ccd2
Simplify the narrow implementation.
2023-06-28 13:09:59 +01:00
c1bbbf94f6
Start refactoring the stride.
2023-06-28 12:57:30 +01:00
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00