|
a2e925462c
|
Add the scatter in place ops. (#2923)
* Add the scatter_set op.
* Metal op.
* Cuda version.
* Merge the checks.
* Add the actual ops.
|
2025-04-26 07:36:49 +02:00 |
|
|
36cf54525d
|
Fix the fast bf16 gemm cublas kernels. (#2274)
* Use flash-attn in gemma.
* Fix for the fast bf16 cublas gemm.
* Fix some clippy lints.
* Fix another lint.
* Proper clippy fix.
|
2024-06-18 23:46:58 +02:00 |
|
|
665da30487
|
Backend refactoring. (#1966)
* Backend refactoring.
* Metal tweaks.
* Move the cudnn module.
|
2024-03-29 23:02:11 +01:00 |
|