587ee3bb6f
Small cleanups to the llama multi-process example. ( #2098 )
2024-04-20 22:19:46 +02:00
2b93dffe64
Use faster rotary embeddings for llama like models. ( #2087 )
2024-04-18 22:34:29 +02:00
e6ee7ba4d4
Llama v3. ( #2085 )
...
* Llama v3.
* Tweak the default params + handle special tokens.
* Small tweak.
2024-04-18 22:19:54 +02:00
3ad4770eb6
Use cat for faster MQA computation. ( #2043 )
...
* Use cat for faster MQA computation.
* Move the function to utils + use it in mistral.
* Use the shared repeat-kv in a few more models.
* Fix.
2024-04-12 09:15:10 +02:00
cf7d7fcf2f
Also avoid the mask in the llama example.
2024-03-24 19:04:32 +01:00
c07e4057ab
Fix for the llama model. ( #1906 )
2024-03-21 19:36:10 +01:00
90fc82211f
Use a common with_tracing::RmsNorm in a few models. ( #1871 )
...
* Add RmsNorm with tracing.
* Use with_tracing::RmsNorm in some models.
2024-03-18 21:40:06 +01:00
28057781aa
Make the cache for the llama model explicit too. ( #1745 )
2024-02-22 12:04:33 +01:00
2d5f2a728d
Add the RWKV model (v5). ( #1707 )
...
* Start adding the RWKV model.
* More of the forward step.
* Handle rescaling.
* FeedForward.
* More work on RWKV.
* Better state tracking.
* Finish a first pass on forward.
* Fix the shape mismatches.
* Do not rescale in f32.
* Rename to rwkv-v5.
* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
58cc896e69
make llama derive clone ( #1648 )
...
Co-authored-by: danielclough <danielclough@users.noreply.github.com >
2024-02-04 11:56:03 +01:00
63944714f2
Use candle_nn::embedding instead of local copies in a few models. ( #1562 )
2024-01-10 21:36:27 +01:00
1704f1b3ae
Consolidate the with-tracing usage. ( #1234 )
2023-11-01 18:21:36 +00:00
d3f05eae8c
Move some models to candle-transformers so that it's easier to re-use. ( #794 )
...
* Move some models to candle-transformers so that they can be shared.
* Also move falcon.
* Move Llama.
* Move whisper (partial).
2023-09-10 09:40:27 +01:00