|
6f0b807ffd
|
More efficient cuda implementation for ConvTranspose1d. (#2211)
* More efficient cuda implementation for ConvTranspose1d.
* Small tweak.
|
2024-05-24 11:05:43 +02:00 |
|
|
8a05743a21
|
Add StorageRef. (#2113)
* Add the storage-ref bits.
* Add the metal implementation.
|
2024-04-23 13:23:27 +02:00 |
|
|
53e5380bf6
|
Add a synchronize method to devices. (#2055)
* Add a synchronize method to devices.
* Metal version.
|
2024-04-14 16:32:55 +02:00 |
|
|
e6a5b82ba6
|
Fix the matmul layout for accelerate & mkl. (#2011)
* Fix the matmul layout for accelerate & mkl.
* Reduce the required precision for pow (because of accelerate).
* And a fix the gelu f16 test.
|
2024-04-04 19:18:03 +02:00 |
|
|
08c049def3
|
Improve the handling of matmul with squeezed layouts. (#1998)
* Improve the handling of matmul with squeezed layouts.
* Fix for the cuda backend.
* Revert the temporary fix.
|
2024-04-02 23:17:05 +02:00 |
|
|
665da30487
|
Backend refactoring. (#1966)
* Backend refactoring.
* Metal tweaks.
* Move the cudnn module.
|
2024-03-29 23:02:11 +01:00 |
|