3173b1ce3b
feat: impl backprop for erf and gelu-erf ( #1258 )
...
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
b07b2350b6
Test for the transposed conv1d. ( #1254 )
2023-11-03 13:10:28 +01:00
5fc66bd4ba
Support negative steps in arange. ( #1218 )
2023-10-30 07:40:54 +00:00
154c674a79
Add i64-abs. ( #1216 )
2023-10-29 15:28:53 +00:00
46d6566c99
Fix the conv2d gradient computation. ( #1214 )
2023-10-29 09:50:04 +00:00
807e3f9f52
derivative for GELU ( #1160 )
...
* derivative for GELU
* add tests
2023-10-23 20:23:45 +01:00
87eb1658e1
Add pad_with_same. ( #1127 )
...
* More model cloning.
* More cloning on quantized models.
* Add pad-with-same.
* Add some tests.
2023-10-18 23:13:37 +01:00
7473c4ceca
Fix the npy read function and add some testing. ( #1080 )
2023-10-12 15:25:05 +02:00
7f7d95e2c3
Add the round-to function. ( #1039 )
2023-10-05 20:28:09 +01:00
8f7973958c
fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 ( #1037 )
...
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0
* cargo fmt
2023-10-05 18:46:13 +01:00
c18a856e76
Add the rounding operators. ( #1030 )
...
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
2023-10-04 17:58:44 +01:00
043cc25766
Fix for the index-select cuda setup. ( #1022 )
...
* Fix for index-select.
* Better fix + add some testing.
2023-10-03 10:21:46 +01:00
cddfc3944c
Add the q8k vec-dot multiplication. ( #1019 )
2023-10-02 21:53:34 +01:00
089fc3b584
Improve the quantized whisper setup. ( #1018 )
...
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
263a172202
Improve the testing of the optimized quantized vec-dot ops ( #1016 )
...
* Expose the unopt functions for testing.
* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
4e55aaa51f
Simd128 version of the q2k-q8k vecdot product. ( #1011 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
* Simdify the q2k-q8k vecdot product.
* Cosmetic change.
2023-09-30 20:12:41 +01:00
fc59bc31bf
fix: add missing gpu fill_* ( #996 )
2023-09-29 15:49:30 +01:00
8601537e31
Add slice-scatter. ( #927 )
...
* Add slice-scatter.
* Add the op.
* Make transpose be a no-op when the dimensions are identical.
* Add the backprop.
* And add some gradient test.
2023-09-22 12:18:16 +01:00
7b26e513f1
Add the erf function. ( #917 )
2023-09-21 06:19:10 +01:00
d7e48234d4
Add an erf based gelu op ( #900 )
...
* Erf based gelu.
* Add the erf backed gelu.
* Test the new gelu op (which is not gelu_new).
2023-09-19 19:54:28 +01:00
18d3c803a8
Scalar support in minimum/maximum. ( #832 )
...
* Scalar support in minimum/maximum.
* Add a clamp method to tensors.
2023-09-13 08:24:58 +01:00
258ac32c38
Fix cuda randn when generating an odd number of values. ( #793 )
2023-09-09 18:44:21 +01:00
ad8a62dbf5
Add tanh. ( #675 )
...
* Add tanh.
* Use tanh in the lstm block.
* Add a test for tanh forward and backward passes.
2023-08-30 13:54:50 +01:00
393690387f
Support dilation in conv-transpose2d. ( #671 )
2023-08-30 09:22:00 +01:00
59b731de99
Add the powf op. ( #664 )
...
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
2023-08-29 20:48:18 +01:00
2d3fcad267
Simplify usage of the pool functions. ( #662 )
...
* Simplify usage of the pool functions.
* Small tweak.
* Attempt at using apply to simplify the convnet definition.
2023-08-29 19:12:16 +01:00
71221559d3
Fix the dilated convolutions. ( #659 )
2023-08-29 16:37:42 +01:00
a044907ffc
Dilated convolutions ( #657 )
...
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
2023-08-29 16:12:11 +01:00
037b41c9dc
Cuda conv transpose ( #645 )
...
* Cuda kernel for conv-transpose.
* Fix the cuda kernel.
* Fix the tests.
2023-08-28 20:58:49 +01:00
ca26198b95
Fix the cpu kernel for conv-transpose. ( #643 )
2023-08-28 16:45:12 +01:00
b292047882
Backprop for conv2d. ( #638 )
...
* Start adding backprop for conv2d.
* Backprop for conv2d.
* Bugfix + start adding a conv2d test.
* Conv2d backprop testing.
* More conv fixes.
2023-08-28 16:08:55 +01:00
3cca89cc70
Add conv-transpose. ( #635 )
...
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
2023-08-28 10:10:12 +01:00
5320aa6b7d
Move the test-utils bits to a shared place. ( #619 )
2023-08-27 09:42:22 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
afd965f77c
More non square testing ( #582 )
...
* Add more non square testing.
* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086
Referenze implementations of q2k
and q3k
vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
ca318a6ec7
Add to the cuda example a reproduction of the issue. ( #579 )
...
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
2023-08-24 12:07:31 +01:00
dd64465899
Add a test for conv2d with padding + bugfix the random number generation on cuda. ( #578 )
...
* Add a test for conv2d with padding.
* Cosmetic changes.
* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
7478dda255
Cosmetic tweaks. ( #570 )
2023-08-23 15:45:40 +01:00
075b505480
Mirror GGML's unit tests ( #569 )
...
* Add ggml unit tests
* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
352383cbc3
Add quantization support for q2k
, q3k
, q4k
and q5k
( #524 )
...
* first q2 implementation
* First Q4K and Q5K implementations
* fix `q2k` and `q5k`
* Some first cleanups
* run `clippy` on tests
* finally implement `q3k`
* deactivate `q3k` test on macos
* also disable the test on linux
* Fix floating bits in `q3k` dequantization
* Refactoring pass + reorder quants in file
* `fmt`
* Re-add `src` asserts and redefine `dst`
2023-08-22 15:04:55 +01:00
d70cffdab6
Fix the minimum/maximum gradient computations. ( #534 )
2023-08-21 08:28:41 +01:00
a1812f934f
Add a yolo-v3 example. ( #528 )
...
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
2023-08-20 18:19:37 +01:00
2fcb386f17
Add a broadcast variant to matmul. ( #523 )
...
* Add a broadcast variant to matmul.
* Get the test to pass.
2023-08-20 13:20:42 +01:00
a22b1bed7b
Tensor -> QTensor conversion ( #496 )
...
* Sketch some qmatmul test.
* Add the quantization function.
* More testing.
* Make the test smaller and faster.
* Add some shape checking.
2023-08-18 08:19:20 +01:00
557b2c28dd
Q6K quantization ( #495 )
...
* Print the detected arch options.
* Add the q6k quantization.
* Add a currently broken test.
* Bugfix.
* Bugfix.
* Another bugfix.
* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00