481c45d78d
Add a basic implementation for slice-assign. ( #1377 )
2023-11-26 17:31:22 +00:00
bfa7c8fc01
Implement the module trait directly for QMatMul. ( #1372 )
2023-11-25 10:09:45 +00:00
8d6c6de8e0
Missing new test.
2023-11-20 14:38:35 +01:00
7ec345c2eb
Adding the test scaffolding.
2023-11-20 14:38:35 +01:00
c6763e3b41
Add a simple implementation of cumsum. ( #1334 )
...
* Add a simple implementation of cumsum.
* Add another test.
2023-11-15 21:11:15 +00:00
347e31c9ff
Add the tril/triu/eye ops. ( #1333 )
...
* Add tril/triu/eye.
* Revert the metal crate tweak.
2023-11-15 20:34:37 +00:00
9e666d4229
Add the var method. ( #1315 )
...
* Add the var method.
* Add a test.
2023-11-10 22:47:57 +01:00
7051fb8098
feat: add backprop for elu ( #1269 )
...
* feat: add backprop for elu
* Cosmetic tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-04 21:26:41 +01:00
3173b1ce3b
feat: impl backprop for erf and gelu-erf ( #1258 )
...
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
b07b2350b6
Test for the transposed conv1d. ( #1254 )
2023-11-03 13:10:28 +01:00
5fc66bd4ba
Support negative steps in arange. ( #1218 )
2023-10-30 07:40:54 +00:00
154c674a79
Add i64-abs. ( #1216 )
2023-10-29 15:28:53 +00:00
46d6566c99
Fix the conv2d gradient computation. ( #1214 )
2023-10-29 09:50:04 +00:00
807e3f9f52
derivative for GELU ( #1160 )
...
* derivative for GELU
* add tests
2023-10-23 20:23:45 +01:00
87eb1658e1
Add pad_with_same. ( #1127 )
...
* More model cloning.
* More cloning on quantized models.
* Add pad-with-same.
* Add some tests.
2023-10-18 23:13:37 +01:00
7473c4ceca
Fix the npy read function and add some testing. ( #1080 )
2023-10-12 15:25:05 +02:00
7f7d95e2c3
Add the round-to function. ( #1039 )
2023-10-05 20:28:09 +01:00
8f7973958c
fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 ( #1037 )
...
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0
* cargo fmt
2023-10-05 18:46:13 +01:00
c18a856e76
Add the rounding operators. ( #1030 )
...
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
2023-10-04 17:58:44 +01:00
043cc25766
Fix for the index-select cuda setup. ( #1022 )
...
* Fix for index-select.
* Better fix + add some testing.
2023-10-03 10:21:46 +01:00
cddfc3944c
Add the q8k vec-dot multiplication. ( #1019 )
2023-10-02 21:53:34 +01:00
089fc3b584
Improve the quantized whisper setup. ( #1018 )
...
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
263a172202
Improve the testing of the optimized quantized vec-dot ops ( #1016 )
...
* Expose the unopt functions for testing.
* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
4e55aaa51f
Simd128 version of the q2k-q8k vecdot product. ( #1011 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
* Simdify the q2k-q8k vecdot product.
* Cosmetic change.
2023-09-30 20:12:41 +01:00
fc59bc31bf
fix: add missing gpu fill_* ( #996 )
2023-09-29 15:49:30 +01:00
8601537e31
Add slice-scatter. ( #927 )
...
* Add slice-scatter.
* Add the op.
* Make transpose be a no-op when the dimensions are identical.
* Add the backprop.
* And add some gradient test.
2023-09-22 12:18:16 +01:00
7b26e513f1
Add the erf function. ( #917 )
2023-09-21 06:19:10 +01:00
d7e48234d4
Add an erf based gelu op ( #900 )
...
* Erf based gelu.
* Add the erf backed gelu.
* Test the new gelu op (which is not gelu_new).
2023-09-19 19:54:28 +01:00
18d3c803a8
Scalar support in minimum/maximum. ( #832 )
...
* Scalar support in minimum/maximum.
* Add a clamp method to tensors.
2023-09-13 08:24:58 +01:00
258ac32c38
Fix cuda randn when generating an odd number of values. ( #793 )
2023-09-09 18:44:21 +01:00
ad8a62dbf5
Add tanh. ( #675 )
...
* Add tanh.
* Use tanh in the lstm block.
* Add a test for tanh forward and backward passes.
2023-08-30 13:54:50 +01:00
393690387f
Support dilation in conv-transpose2d. ( #671 )
2023-08-30 09:22:00 +01:00
59b731de99
Add the powf op. ( #664 )
...
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
2023-08-29 20:48:18 +01:00
2d3fcad267
Simplify usage of the pool functions. ( #662 )
...
* Simplify usage of the pool functions.
* Small tweak.
* Attempt at using apply to simplify the convnet definition.
2023-08-29 19:12:16 +01:00
71221559d3
Fix the dilated convolutions. ( #659 )
2023-08-29 16:37:42 +01:00
a044907ffc
Dilated convolutions ( #657 )
...
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
2023-08-29 16:12:11 +01:00
037b41c9dc
Cuda conv transpose ( #645 )
...
* Cuda kernel for conv-transpose.
* Fix the cuda kernel.
* Fix the tests.
2023-08-28 20:58:49 +01:00
ca26198b95
Fix the cpu kernel for conv-transpose. ( #643 )
2023-08-28 16:45:12 +01:00
b292047882
Backprop for conv2d. ( #638 )
...
* Start adding backprop for conv2d.
* Backprop for conv2d.
* Bugfix + start adding a conv2d test.
* Conv2d backprop testing.
* More conv fixes.
2023-08-28 16:08:55 +01:00
3cca89cc70
Add conv-transpose. ( #635 )
...
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
2023-08-28 10:10:12 +01:00
5320aa6b7d
Move the test-utils bits to a shared place. ( #619 )
2023-08-27 09:42:22 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
afd965f77c
More non square testing ( #582 )
...
* Add more non square testing.
* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086
Referenze implementations of q2k
and q3k
vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
ca318a6ec7
Add to the cuda example a reproduction of the issue. ( #579 )
...
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
2023-08-24 12:07:31 +01:00
dd64465899
Add a test for conv2d with padding + bugfix the random number generation on cuda. ( #578 )
...
* Add a test for conv2d with padding.
* Cosmetic changes.
* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
7478dda255
Cosmetic tweaks. ( #570 )
2023-08-23 15:45:40 +01:00
075b505480
Mirror GGML's unit tests ( #569 )
...
* Add ggml unit tests
* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00