|
3ad4770eb6
|
Use cat for faster MQA computation. (#2043)
* Use cat for faster MQA computation.
* Move the function to utils + use it in mistral.
* Use the shared repeat-kv in a few more models.
* Fix.
|
2024-04-12 09:15:10 +02:00 |
|
|
b81ecf712d
|
Support alternative dtypes for mamba (#2036)
* Allow different dtypes in mamba.
* Add a dtype flag.
|
2024-04-10 18:10:01 +02:00 |
|
|
4523ecfb2a
|
Faster repeat penalty (#1940)
* Avoid the attention mask where possible.
* Faster repeat penalty.
|
2024-03-26 11:31:20 +01:00 |
|
|
6e485f2deb
|
Add some optional repeat penalty. (#623)
* Add some optional repeat penalty.
* Add the missing files.
|
2023-08-27 10:48:45 +01:00 |
|