* Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.
* Backend refactoring. * Metal tweaks. * Move the cudnn module.