fa06f5f5f9
F16/BF16 bugfix (bis). ( #2143 )
...
* F16/BF16 bugfix (bis).
* Another fix.
* Yet another fix.
2024-04-29 14:08:44 +02:00
09d4845aa8
Bugfix the recent f16/bf16 changes. ( #2142 )
2024-04-29 13:30:11 +02:00
3bbb88fcb4
Fix sigmoid gradient calculation and move sigmoid into a specialized op ( #2114 )
...
* add sigmoid op
* small fix
* add as a method on `Tensor`
* implement gradient calculation for sigmoid
* add sigmoid tests
* we should have a specialized op for this
* fix clippy
* fix clippy 2
* Revert all previous commits in favor of a `CustomOp` based solution
* use `CustomOp1` implementation
* fix rustfmt
* experimental add metal impl
* add cuda kernel impl
* fix fmt
* Add a test + reduce some cuda duplication.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-04-29 11:04:43 +02:00
ed7b99f525
Add a toggle for F16/BF16 accumulation in gemm. ( #2141 )
...
* Add a toggle to control f16/bf16 gemm precision.
* Use the faster variant in the quantized example.
* Bugfix.
2024-04-29 09:21:07 +02:00
8a05743a21
Add StorageRef. ( #2113 )
...
* Add the storage-ref bits.
* Add the metal implementation.
2024-04-23 13:23:27 +02:00
53e5380bf6
Add a synchronize method to devices. ( #2055 )
...
* Add a synchronize method to devices.
* Metal version.
2024-04-14 16:32:55 +02:00
8967c46563
Split the cuda error file. ( #2003 )
2024-04-04 08:27:23 +02:00
318d143224
Relax the contiguous check for cuda kernels. ( #2000 )
...
* Relax the contiguous check for cuda kernels.
* Ensure contiguity for RNNs.
* Unrelated fix for segment anything.
* Better error message + allow concatenating empty slices.
2024-04-03 09:02:38 +02:00
08c049def3
Improve the handling of matmul with squeezed layouts. ( #1998 )
...
* Improve the handling of matmul with squeezed layouts.
* Fix for the cuda backend.
* Revert the temporary fix.
2024-04-02 23:17:05 +02:00
665da30487
Backend refactoring. ( #1966 )
...
* Backend refactoring.
* Metal tweaks.
* Move the cudnn module.
2024-03-29 23:02:11 +01:00