|
3ad4770eb6
|
Use cat for faster MQA computation. (#2043)
* Use cat for faster MQA computation.
* Move the function to utils + use it in mistral.
* Use the shared repeat-kv in a few more models.
* Fix.
|
2024-04-12 09:15:10 +02:00 |
|
|
185b54a33b
|
Make some model cloneable. (#1125)
|
2023-10-18 19:30:47 +01:00 |
|
|
86e7d539d2
|
Add the quantized mpt model. (#1123)
* Add the quantized mpt model.
* Support the quantized model for replit-code.
|
2023-10-18 16:29:38 +01:00 |
|
|
cb034506cd
|
Remove the unused pragma in mpt. (#1122)
|
2023-10-18 15:47:50 +01:00 |
|
|
767a6578f1
|
MPT alibi fixes. (#1120)
* MPT alibi fixes.
* Some more fixes.
* Finally get the model to return some sensible outputs.
* Add a readme.
|
2023-10-18 10:58:05 +01:00 |
|
|
2cd745a97c
|
MPT fixes. (#1117)
* MPT fixes.
* Another couple fixes.
* Another shape fix.
|
2023-10-17 21:53:31 +01:00 |
|
|
a72b50e2c0
|
Build alibi bias. (#1115)
* Build alibi bias.
* Apply the alibi attention bias.
* Add the replit-code example.
|
2023-10-17 20:41:37 +01:00 |
|
|
872c3f14b0
|
Add the MPT model. (#1114)
* Add the MPT model.
* Add ffn and block.
* Forward pass for the mpt block.
* Repeat-kv.
|
2023-10-17 16:06:48 +01:00 |
|