3ad4770eb6
Use cat for faster MQA computation. ( #2043 )
...
* Use cat for faster MQA computation.
* Move the function to utils + use it in mistral.
* Use the shared repeat-kv in a few more models.
* Fix.
2024-04-12 09:15:10 +02:00
a0460cd2b1
Add the code-gemma models. ( #2038 )
...
* Add the code-gemma models.
* Tweak to the gemma config.
2024-04-10 21:19:21 +02:00
33c9b66554
Add the new gemma models. ( #2023 )
...
* Add the new gemma models.
* Revert the lightning changes.
* Support for the 1.1 models.
2024-04-06 21:25:38 +02:00
c753f72c85
Support for attention bias in gemma + refactor things a bit. ( #1744 )
...
* Support for attention bias in gemma + refactor things a bit.
* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
45d5322d62
Add the Gemma models. ( #1741 )
...
* Add the Gemma models.
* Add the gemma example.
* Adapt the RmsNorm.
* Get the 2b model to work.
* 7b support.
* Use the config head dim.
* Yet another fix.
* Make the matrixes contiguous.
* Also get the 7b model to work.
* And add to the readme.
2024-02-21 22:02:50 +01:00