* Use flash-attn in gemma. * Fix flash-attn for head dim 256.
* Add the code-gemma models. * Tweak to the gemma config.
* Add the new gemma models. * Revert the lightning changes. * Support for the 1.1 models.
* Add the Gemma models. * Add the gemma example. * Adapt the RmsNorm. * Get the 2b model to work. * 7b support. * Use the config head dim. * Yet another fix. * Make the matrixes contiguous. * Also get the 7b model to work. * And add to the readme.