b190fd8592
Remove some unnecessary calls to contiguous. ( #1968 )
...
* Remove some unnecessary calls to contiguous.
* Slightly improved kv cache concatenation.
2024-03-30 13:22:00 +01:00
708e422456
Qwen MoE model. ( #1960 )
...
* Qwen MoE model.
* Add the MoE model to the example.
* Fix the scaling.
* Readme updates.
* Readme tweaks.
2024-03-28 23:10:57 +01:00
cdc8b57b5c
Fix clippy lints + minor cleanups. ( #1957 )
...
* Fix clippy lints + minor cleanups.
* fmt.
* Derive clone.
2024-03-28 14:17:46 +01:00
b0340d72ec
CLIP model implementation with example ( #1950 )
...
* CLIP model implementation with example
* CLIP Implementation fixes, batch images
* CLIP model remove images from git
* CLIP model remove unnecessary use of batch_indices
2024-03-28 13:44:12 +01:00
ada5d7c096
add send and sync trait bounds for scheduler config in stable diffusion models ( #1952 )
...
* first commit
* add Sync deriving
* static
* remove static
2024-03-28 10:03:00 +01:00
75b6d4b0da
add config for mamba 2.8b model parameter ( #1946 )
...
* first commit
* Make the mamba config public.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-03-27 07:47:23 +01:00
66f0a4eeea
Another fix for squeezing. ( #1943 )
2024-03-26 17:05:26 +01:00
4523ecfb2a
Faster repeat penalty ( #1940 )
...
* Avoid the attention mask where possible.
* Faster repeat penalty.
2024-03-26 11:31:20 +01:00
196765e995
Use the new rope kernel in mistral. ( #1937 )
...
* Use the new rope kernel in mistral.
* Compute the cos and sin with full precision.
* Bugfix.
2024-03-25 23:26:05 +01:00
d3a8d291d5
Avoid the attention mask where possible. ( #1933 )
2024-03-25 15:31:04 +01:00
1b98f84a2b
Fast kernels for rotary embeddings. ( #1928 )
...
* Fast kernels for rotary embeddings.
* Add a test for the fast CPU kernel.
* Rope cuda bindings.
* Cuda kernel.
* Metal kernel (part 1).
* Cuda kernels.
* Finish the metal kernel.
* Use the new kernels in the quantized example.
* Fix warning.
2024-03-24 22:48:52 +01:00
cf7d7fcf2f
Also avoid the mask in the llama example.
2024-03-24 19:04:32 +01:00
8c0db87992
Avoid using the attn mask when not necessary.
2024-03-24 18:55:56 +01:00
e2b4829531
Support more mistral models. ( #1927 )
...
* Support more mistral models.
* Use the appropriate rope parameter.
2024-03-24 08:04:04 +01:00
5e70821dd0
Allow for arbitrary temperature modifications.
2024-03-23 15:47:39 +01:00
a62a97340c
Add topk sampling. ( #1923 )
2024-03-23 15:26:09 +01:00
6f877592a7
Avoid broadcasting on the batch dimension for the attention mask. ( #1920 )
2024-03-23 13:08:53 +01:00
32f567bac4
Fix loading the gguf files. ( #1913 )
2024-03-22 10:28:38 +01:00
c07e4057ab
Fix for the llama model. ( #1906 )
2024-03-21 19:36:10 +01:00
c0bdd9c7a6
Use the fast RmsNorm in the quantized model. ( #1904 )
2024-03-21 18:49:35 +01:00
455c42aa72
Avoid copying the data on squeeze and unsqueeze. ( #1884 )
...
* Avoid copying the data on squeeze and unsqueeze.
* Fix the quantized llama example.
* Unrelated fix for the quantized stable-lm example on cuda.
* Fix for mamba on cuda (unrelated to the PR).
2024-03-20 13:04:36 +01:00
90fc82211f
Use a common with_tracing::RmsNorm in a few models. ( #1871 )
...
* Add RmsNorm with tracing.
* Use with_tracing::RmsNorm in some models.
2024-03-18 21:40:06 +01:00
ff03fd3fb3
Expose some helper functions to create quantized models. ( #1837 )
2024-03-12 11:30:24 +01:00
0c5eecbc0f
Add some tracing to metavoice. ( #1826 )
2024-03-09 12:24:11 +01:00
dd00482ea3
Quantized version of the metavoice model. ( #1824 )
...
* Quantized version of the metavoice model.
* Integrate the quantized version of metavoice.
2024-03-09 11:06:04 +01:00
8a99cf7dd2
Add a flag to select the dtype used in metavoice. ( #1805 )
2024-03-05 12:16:00 +01:00
8cc0a183ba
Speaker embeddings computation for metavoice. ( #1800 )
...
* Speaker embeddings computation for metavoice.
* Compute the speaker embeddings.
2024-03-04 14:13:01 +01:00
924ccae30c
Add an initial Segformer implementation ( #1617 )
...
* add segformer
* Make the id2label field optional.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-03-03 16:01:46 +01:00
60dc72b96b
More metavoice tweaks. ( #1796 )
2024-03-03 15:05:25 +01:00
4fff5b51f5
Metavoice - first cut ( #1717 )
...
* Add the metavoice transformer.
* Sketch the speaker-encoder module.
* Adding to the metavoice model.
* Start adding the metavoice example.
* Get some logits out.
* Load the second stage model.
* Get the second step to run.
* Tweak the example.
* Add encodec tilting.
* Glue the different bits together.
* Fix a shape issue.
* Use a constant.
* BPE tokenization.
* Add a warning.
2024-03-02 18:50:01 +01:00
314630638d
Rustfmt fix. ( #1788 )
2024-03-02 10:35:07 +01:00
3e3def4134
Update StableLM config ( #1787 )
2024-03-02 09:56:57 +01:00
979deaca07
EfficientVit (MSRA) model ( #1783 )
...
* Add EfficientVit (Microsoft Research Asia) model.
* Mention models in README
2024-03-01 08:53:52 +01:00
b485e4b6ee
add models of rwkv v6 and quantized rwkv v6 ( #1781 )
...
* add models of rwkv v6 and quantized rwkv v6
* fix ci clippy fail
2024-03-01 08:37:56 +01:00
4fd00b8900
Add the StarCoder2 model. ( #1779 )
...
* Add the StarCoder2 model.
* Add the example code and get things to work.
* And also tweak the readme.
2024-02-28 21:02:41 +01:00
d0aca6c3c6
Encodec encoding demo. ( #1775 )
2024-02-28 06:49:03 +01:00
15e8644149
Apply dilations in the encodec model. ( #1772 )
...
* Apply dilations in the encodec model.
* Add some encoding bits.
2024-02-27 23:26:35 +01:00
0c49e95dfb
Encodec model. ( #1771 )
...
* Encodec model.
* Fixes.
* Add the padding functions.
* Get the LSTM bit to work.
* Get the encodec model to generate some tokens (decoder only for now).
* Minor tweak.
* Minor tweak.
2024-02-27 22:59:40 +01:00
205767f9de
Avoid tensor copying in the quantized example. ( #1770 )
2024-02-27 20:32:30 +01:00
918136ba46
add quantized rwkv v5 model ( #1743 )
...
* and quantized rwkv v5 model
* Integrate the quantized rwkv model in the initial example.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-02-25 21:43:40 +01:00
1a6043af51
Tweak the VarMap set type. ( #1758 )
2024-02-25 20:50:08 +01:00
28057781aa
Make the cache for the llama model explicit too. ( #1745 )
2024-02-22 12:04:33 +01:00
544018b6d0
Explicit caching in llama2.c.
2024-02-22 10:22:03 +01:00
c753f72c85
Support for attention bias in gemma + refactor things a bit. ( #1744 )
...
* Support for attention bias in gemma + refactor things a bit.
* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
45d5322d62
Add the Gemma models. ( #1741 )
...
* Add the Gemma models.
* Add the gemma example.
* Adapt the RmsNorm.
* Get the 2b model to work.
* 7b support.
* Use the config head dim.
* Yet another fix.
* Make the matrixes contiguous.
* Also get the 7b model to work.
* And add to the readme.
2024-02-21 22:02:50 +01:00
5ebcfeaf0f
Make the r, k, v tensors contiguous. ( #1719 )
2024-02-16 09:17:35 +01:00
26fe162ab5
Custom tokenizer for rwkv. ( #1711 )
...
* Custom tokenizer for rwkv.
* Custom tokenizer.
* Getting the tokenizer to work.
2024-02-14 15:11:38 +01:00
2d5f2a728d
Add the RWKV model (v5). ( #1707 )
...
* Start adding the RWKV model.
* More of the forward step.
* Handle rescaling.
* FeedForward.
* More work on RWKV.
* Better state tracking.
* Finish a first pass on forward.
* Fix the shape mismatches.
* Do not rescale in f32.
* Rename to rwkv-v5.
* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
68f7655895
Add ConvNeXt-V2 and smaller model variants. ( #1709 )
2024-02-14 10:53:07 +01:00
c1b418586c
Fixing quantized llama demo on metal. ( #1703 )
2024-02-13 16:28:56 +01:00