Commit Graph

214 Commits

Author SHA1 Message Date
0c49e95dfb Encodec model. (#1771)
* Encodec model.

* Fixes.

* Add the padding functions.

* Get the LSTM bit to work.

* Get the encodec model to generate some tokens (decoder only for now).

* Minor tweak.

* Minor tweak.
2024-02-27 22:59:40 +01:00
205767f9de Avoid tensor copying in the quantized example. (#1770) 2024-02-27 20:32:30 +01:00
918136ba46 add quantized rwkv v5 model (#1743)
* and quantized rwkv v5 model

* Integrate the quantized rwkv model in the initial example.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-02-25 21:43:40 +01:00
1a6043af51 Tweak the VarMap set type. (#1758) 2024-02-25 20:50:08 +01:00
28057781aa Make the cache for the llama model explicit too. (#1745) 2024-02-22 12:04:33 +01:00
544018b6d0 Explicit caching in llama2.c. 2024-02-22 10:22:03 +01:00
c753f72c85 Support for attention bias in gemma + refactor things a bit. (#1744)
* Support for attention bias in gemma + refactor things a bit.

* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
45d5322d62 Add the Gemma models. (#1741)
* Add the Gemma models.

* Add the gemma example.

* Adapt the RmsNorm.

* Get the 2b model to work.

* 7b support.

* Use the config head dim.

* Yet another fix.

* Make the matrixes contiguous.

* Also get the 7b model to work.

* And add to the readme.
2024-02-21 22:02:50 +01:00
5ebcfeaf0f Make the r, k, v tensors contiguous. (#1719) 2024-02-16 09:17:35 +01:00
26fe162ab5 Custom tokenizer for rwkv. (#1711)
* Custom tokenizer for rwkv.

* Custom tokenizer.

* Getting the tokenizer to work.
2024-02-14 15:11:38 +01:00
2d5f2a728d Add the RWKV model (v5). (#1707)
* Start adding the RWKV model.

* More of the forward step.

* Handle rescaling.

* FeedForward.

* More work on RWKV.

* Better state tracking.

* Finish a first pass on forward.

* Fix the shape mismatches.

* Do not rescale in f32.

* Rename to rwkv-v5.

* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
68f7655895 Add ConvNeXt-V2 and smaller model variants. (#1709) 2024-02-14 10:53:07 +01:00
c1b418586c Fixing quantized llama demo on metal. (#1703) 2024-02-13 16:28:56 +01:00
13c67226e6 feat: support microphone whisper streaming (#1678)
* feat: support microphone whisper streaming

* fix: cleanup print stmts and adjust how input is read

* fix: remove incorrect comment

* feat: split into new example and simplify

* fix: feature flag example file

* fix: fmt fixes

* feat: simplify and remove redundant files
2024-02-12 18:01:21 +01:00
1e26d539d9 Improved mamba model optimized for inference (#1694)
* Sketch the mamba model for inference.

* Complete the forward pass.

* Add the mamba example.

* Optimize the selective-scan part.

* Fix a couple shape mismatches and get inference to work.

* Tweak the readmes.

* More readme tweaks.
2024-02-11 17:04:57 +01:00
bf20cc854c Support sinusoidal embeddings in trocr. (#1690)
* Support sinusoidal embeddings in trocr.

* Support tie-word-embeddings.
2024-02-10 15:17:51 +01:00
42ce593ec6 Use the repo config for trocr rather than hardcoding it + small tweaks. (#1689)
* Use the repo config for trocr rather than hardcoding it + small tweaks.

* Add support for the printed models.

* Fail with an appropriate error message on missing position embeddings.
2024-02-10 13:15:03 +01:00
67589791d2 Remove the unused pragma in vit + handle the final layernorm. (#1688) 2024-02-10 11:08:50 +01:00
5657e596cd Add the Qwen2 model (#1684)
* Initial check-in for the qwen2 model.

* More qwen2 inference.

* Polish the qwen example.

* Fix the rope basis.

* Get the inference to work.

* Support different model sizes.
2024-02-09 15:02:49 +01:00
0dee8ea19b Add the ChatGLM model. (#1237)
* Add the ChatGLM model.

* Rotary embeddings.

* Add to the forward pass.

* Add to the forward pass.

* Add the rotary embeddings.

* Add the KV cache.

* Add the chatglm example.

* Bugfix.

* More glm fixes.

* Fix some shape issues.

* Get the inference to work.
2024-02-09 11:51:38 +01:00
9cadd4e644 feat: support multithread spectrogram and small perf tweaks (#1674)
* feat: support multithread spectrogram and small perf tweaks

* feat: clippy improvement for loop variable

* fix: add back speed up scale down logic

* fix: readd mirroring logic

* feat: prefer scoped thread and simplify/improve logic/traits
2024-02-08 21:54:12 +01:00
50be8a98ba Quantized support for stable-lm2. (#1654)
* Quantized support for stable-lm2.

* Quantized support for v2-zephyr.
2024-02-04 11:57:05 +01:00
58cc896e69 make llama derive clone (#1648)
Co-authored-by: danielclough <danielclough@users.noreply.github.com>
2024-02-04 11:56:03 +01:00
d32abbce53 Add StableLM-2, StableLM Code and Zephyr variants (#1650)
* Add StableLM Code and Zephyr variants

* Add V2 models

* Update README
2024-02-03 14:58:41 +01:00
dfab45e1c8 Supports more audio formats (#1628)
* Supports more audio formats

* Simplify the handling of the different buffer types.

* Check the sample rate.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-02-03 14:26:04 +01:00
96bc704d17 Update mixformer.rs (#1601)
Update the source of the configuration_mixformer_sequential.py
It has been removed, therefore, it is still available in this -> d38e6f954ec29b96fe2cf033937dad64e279b5d9
2024-02-03 13:42:16 +01:00
a52d407ae6 Add ConvNeXt model. (#1604) 2024-02-03 13:34:28 +01:00
403680f17d Quantized GGUF style (#1523)
* Metal quantized modifications proposal.

- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.

Fix Python.

Fixing examples.

Fix: fmt + clippy + stub.

Moving everything around.

Only missing the actual implems.

Fixing everything + adding dequantized kernels.

More work.

Fixing matmul.

Fmt + Clippy

Some clippy fixes.

Working state.

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

Fixing Q2K bug (present in ggml).

* Cleanup.

* Fix the rebase.

* Removing the fences speeds everything up and *is* correct this time...

* Cleanup the fence.

* After rebase.

* Bad code removal.

* Rebase after phi2 merge + fix replit default to CPU.

* Making the CI happy.

* More happy tests.

---------

Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
2024-01-17 10:27:58 +01:00
5270224f40 Add MobileOne model. (#1595)
* Add MobileOne model.

* Clippy fixes

* Remove a comment.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-01-16 06:34:16 +01:00
88618255cb Fix the rotary embeddings for the new phi implementation. (#1582)
* Fix the rotary embeddings for the new phi implementation.

* Match the activation.

* KV cache fix.

* Use the config activation function.
2024-01-13 19:44:41 +01:00
539ead927a Update the Phi model to use the updated architecture. (#1580)
* Update the Phi model to use the updated architecture.

* Add more of the phi model.

* Repeat KV + caching.

* Apply the rotary embeddings.

* Add support for the new phi model in the phi example.

* Fix a couple glitches.

* Fix a couple more glitches.
2024-01-13 17:38:27 +01:00
2480c5dbdd Add RepVGG model. (#1561)
* Add RepVGG model.

* Add RepVGG README

* Extract var to top level

* Replace hashmap with a match

* Add a variant for the model kind + avoid some unnecessary config cloning.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-11 07:07:40 +01:00
63944714f2 Use candle_nn::embedding instead of local copies in a few models. (#1562) 2024-01-10 21:36:27 +01:00
b4cb982e49 Simplifying our internal cargo dependencies. (#1529) 2024-01-07 12:04:14 +01:00
b0fe5e4453 Do not implement Module for BatchNorm. (#1513) 2024-01-01 10:13:13 +01:00
1e442d4bb9 Fix lints for clippy 1.75. (#1494) 2023-12-28 20:26:20 +01:00
cd889c0f8a add config_amazon_mistral_lite (#1493)
Co-authored-by: Ubuntu <danielclough@users.noreply.github.com>
2023-12-28 19:59:58 +01:00
d35f0a1376 Bump the crate version to 0.3.3. (#1490) 2023-12-28 13:38:30 +01:00
f6408a3779 feat: add clear_kv_cache to mistral and qmistral models (#1464) 2023-12-21 21:19:19 +01:00
563a79afa1 make fn name generic (#1459)
Co-authored-by: Ubuntu <danielclough@users.noreply.github.com>
2023-12-21 02:16:31 +01:00
8ede5f4210 add fn config_chat_ml (#1458)
* add fn config_chat_ml

* Add a link to the original config.

---------

Co-authored-by: Ubuntu <danielclough@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com>
2023-12-20 21:03:24 +01:00
9fc210fae8 Merge pull request #1318 from huggingface/metal4
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
94817dac56 Bump the crate version to 0.3.2. (#1452) 2023-12-17 05:34:53 -06:00
1e86717bf2 Fix a couple typos (#1451)
* Mixtral quantized instruct.

* Fix a couple typos.
2023-12-17 05:20:05 -06:00
30a958e5dd Quantized mixtral model (#1442)
* Add the Mixtral model.

* Add more of the mixtral layers.

* Add the final layers for mixtral.

* Sketch the expert selection.

* Add some expert routing logic.

* Hopefully finish the routing logic for mixtral.

* Add the mixtral example.

* Fix the weight filenames.

* Bugfix.

* Another fix.

* Yet another fix + remove the unused pragma.

* Shape fix.

* Support for quantized mixtral.

* Support mixtral in the quantized example.

* Mlp or moe type.

* Fix the expert field namings.

* Refactor the mlp bit.

* More MoE logic.

* Add the MoE quantized logic.

* Fix the experts length.
2023-12-15 19:16:06 -06:00
614842b311 Add the Mixtral model. (#1437)
* Add the Mixtral model.

* Add more of the mixtral layers.

* Add the final layers for mixtral.

* Sketch the expert selection.

* Add some expert routing logic.

* Hopefully finish the routing logic for mixtral.

* Add the mixtral example.

* Fix the weight filenames.

* Bugfix.

* Another fix.

* Yet another fix + remove the unused pragma.

* Shape fix.

* Add a readme.
2023-12-15 14:19:56 -06:00
916a8c5464 Revert candle-transformers. 2023-12-15 11:15:21 +01:00
ece4c69a68 Fixing softmax. 2023-12-15 01:35:08 +01:00
5e33c85c8f Quantized version for phi-v2. (#1430)
* Quantized version for phi-v2.

* More quantized support.
2023-12-13 21:16:34 -06:00
2b3a018be7 Support for phi-2. (#1429)
* Support for phi-2.

* Use the v2 naming scheme.
2023-12-13 20:59:29 -06:00