3fba2b5fc4
Add the SmolLM2 models. ( #2595 )
...
* Add the SmolLM2 models.
* More SmolLM2 support.
2024-11-03 17:11:12 +01:00
3699c1a053
Fix the repo name for llama 3.1. ( #2576 )
...
* Fix the repo name for llama 3.1.
* Fix the book.
2024-10-26 11:25:04 +02:00
ad8a4c5e5a
Add some llama-3.2 examples. ( #2508 )
...
* Add some llama-3.2 examples.
* Support tie-word-embeddings for llama.
2024-09-26 21:00:18 +02:00
0f5cbb08b3
Add support for Llama 3.1 ( #2359 )
...
* Add Llama 3.1 rope
* Clippy
* Format
* Clippy
* Add support for multiple eos tokens:
* Untagged either
* Remove either dep and fix settings.json
* Make the max positional embeddings configurable
2024-07-26 21:32:26 +02:00
a09d451d11
Support top-k in tthe llama example. ( #2150 )
2024-05-01 22:25:47 +02:00
618ecf5e23
Better time measurement for the llama example. ( #2106 )
2024-04-22 17:54:27 +02:00
52ae332910
Use llama v3 by default + add to readme. ( #2094 )
2024-04-20 16:11:24 +02:00
9c532aef47
Also enable llama-v3 8b instruct. ( #2088 )
2024-04-19 08:50:06 +02:00
e6ee7ba4d4
Llama v3. ( #2085 )
...
* Llama v3.
* Tweak the default params + handle special tokens.
* Small tweak.
2024-04-18 22:19:54 +02:00
28057781aa
Make the cache for the llama model explicit too. ( #1745 )
2024-02-22 12:04:33 +01:00
7c7400fb63
Use the tokenizer-output-stream in the llama example. ( #1715 )
...
* Use the tokenizer-output-stream in the llama example.
* Also use tokenizer-output-stream for llama2-c.
2024-02-15 16:47:33 +01:00
84250bf52f
fix index_pos bug when kv cache is disabled. ( #1517 )
...
* fix index_pos bug when kv cache is disabled
* Tweak the fix.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-01-06 11:43:01 +01:00
1fb2dd905c
Add support for tiny-llama-1.1b. ( #1512 )
2023-12-31 12:18:25 +01:00
996a7f2e24
Rework the llama example config, add the solar model. ( #1485 )
2023-12-26 22:24:04 +01:00
bb3471ea31
Adapt more examples to the updated safetensor api. ( #947 )
...
* Simplify the safetensor usage.
* Convert more examples.
* Move more examples.
* Adapt stable-diffusion.
2023-09-23 21:26:03 +01:00
805bf9ffa7
Implement top_p / nucleus sampling ( #819 )
...
* Implement top_p / nucleus sampling
* Update changelog
* rustfmt
* Add tests
* Fix clippy warning
* Fix another clippy error
2023-09-12 18:10:16 +02:00
d3f05eae8c
Move some models to candle-transformers so that it's easier to re-use. ( #794 )
...
* Move some models to candle-transformers so that they can be shared.
* Also move falcon.
* Move Llama.
* Move whisper (partial).
2023-09-10 09:40:27 +01:00
6e485f2deb
Add some optional repeat penalty. ( #623 )
...
* Add some optional repeat penalty.
* Add the missing files.
2023-08-27 10:48:45 +01:00
c105550405
s/panic/bail/
2023-08-25 18:05:07 +02:00
4826a4212e
Adding support for codellama in examples.
...
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
c5f45887dc
Add some tracing to the quantized example. ( #473 )
2023-08-16 18:49:08 +01:00
76804730c6
Using the real config from the hub when available.
2023-08-16 10:36:01 +02:00
5b1690fffa
Tweak the llama example. ( #450 )
2023-08-15 12:18:20 +01:00
3cc87058b7
Support local weights & dynamic outputs ( #447 )
...
* Support local weights & dynamic outputs
* Revise as suggested
* Cargo code format
2023-08-15 11:51:57 +01:00
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
906c0f3eb5
Remove the checkpoint conversion script. ( #405 )
...
* Remove the checkpoint conversion script.
* Remove references to the script.
2023-08-11 05:59:48 +01:00
b278834267
Support the Accelerate BLAS on macOS. ( #325 )
...
* Add the accelerate feature.
* Ffi tweaks.
2023-08-05 17:25:24 +01:00
df6667ba88
Add some tracing to llama. ( #318 )
2023-08-03 13:52:22 +01:00
50d8273ae4
Support both llama v1 and llama v2. ( #272 )
2023-07-28 18:40:59 +01:00
ca479a873e
Upgrading hf-hub to 0.2.0
(Modified API to not pass the Repo around
...
all the time)
2023-07-27 20:05:02 +02:00
84ad558e50
Switch to using llama-v2 by default. ( #251 )
2023-07-26 17:18:27 +01:00
e40b150bbe
Better handling of dtypes in llama. ( #243 )
2023-07-26 08:28:33 +01:00
d9f9c859af
Add flash attention ( #241 )
...
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.
2023-07-26 07:48:10 +01:00
12d6dc018d
Support for MQA for llama v2. ( #205 )
...
* Support for MQA for llama v2.
* More llama-v2.
* Move the rotary embedding precomputation in the cache.
* Add a v2 flag.
* Use the hf model.
2023-07-20 06:39:04 +01:00
439321745a
Removing candle-hub
internal to extract into hf-hub
standalone.
2023-07-19 15:04:38 +02:00
66750f9827
Add some 'cuda-if-available' helper function. ( #172 )
2023-07-15 08:25:15 +01:00
4ed56d7861
Removing cuda default.
...
Seems very important for a lot of exploring users usually on laptop
without GPUs.
Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
3c02ea56b0
Add a cli argument to easily switch the dtype. ( #161 )
2023-07-13 19:18:49 +01:00
ba35d895e7
Sketch the candle-transformers crate. ( #147 )
...
* Sketch the candle-transformers crate.
* Format the empty files.
2023-07-12 13:49:31 +01:00
eae646d322
Use arange in the examples. ( #146 )
2023-07-12 12:12:34 +01:00
20599172ac
Add from_iter and arange, use it in the doctests. ( #145 )
2023-07-12 12:03:01 +01:00
b3b39cca92
Llama batch ( #144 )
...
* Add a batch dimension to llama.
* Bugfixes.
2023-07-12 11:38:19 +01:00
fa760759e5
Allow for lazy loading of npz files, use it in llama to reduce memory usage in the cpu version. ( #141 )
2023-07-11 20:22:34 +01:00
37cad85869
Resurrect the llama npy support. ( #140 )
2023-07-11 19:32:10 +01:00
760f1d7055
Refactor the llama example to make it more in sync with the other ones. ( #139 )
...
* Refactor the llama example to make it more in sync with the other ones.
* Make clippy happy.
* Properly load the safetensor weights.
* Get llama back to a working state for the safetensors case.
2023-07-11 17:20:55 +01:00
e923b3adc2
Add a KV cache to falcon. ( #104 )
2023-07-07 17:24:38 +01:00
115629fe08
Creating new sync Api for candle-hub
.
...
- `api::Api` -> `api::tokio::api` (And created new `api::sync::Api`).
- Remove `tokio` from all our examples.
- Using similar codebase for now instead of ureq (for simplicity).
2023-07-06 15:15:25 +02:00
dd60bd84bb
MKL adjustments. ( #87 )
2023-07-06 11:37:27 +01:00
c297a50960
Add mkl support for matrix multiply. ( #86 )
...
* Fix some rebase issues.
* Use mkl instead.
* Use mkl in bert.
* Add the optional mkl feature.
* Conditional compilation based on the mkl feature.
* Add more mkl support.
2023-07-06 11:05:05 +01:00
e2bfbcb79c
Support dim indexes in cat.
2023-07-05 20:39:08 +01:00