|
ebb0fedf14
|
Very simple pyo3 bindings for candle.
|
2023-07-01 20:36:44 +01:00 |
|
|
dd879f5b67
|
Merge pull request #51 from LaurentMazare/custom-prompt
Add a flag for custom prompt.
|
2023-07-01 06:40:36 +01:00 |
|
|
7c65e2d187
|
Add a flag for custom prompt.
|
2023-07-01 06:36:22 +01:00 |
|
|
2c04bff12f
|
Merge pull request #50 from LaurentMazare/rayon1
Do not use rayon for a single thread
|
2023-06-30 18:56:26 +01:00 |
|
|
bbe0c5fbaa
|
Do not use rayon for a single thread (bis).
|
2023-06-30 18:47:22 +01:00 |
|
|
6b67d25d9f
|
Do not use rayon for a single thread.
|
2023-06-30 18:46:32 +01:00 |
|
|
b8b175c01e
|
Merge pull request #49 from LaurentMazare/llama-dtype
Early conversion for the llama weights.
|
2023-06-30 16:43:56 +01:00 |
|
|
679b6987b6
|
Early conversion for the llama weights.
|
2023-06-30 16:42:53 +01:00 |
|
|
dbd7d5b3fd
|
Merge pull request #47 from LaurentMazare/llama-f32
Add a const to easily tweak the dtype used by llama
|
2023-06-30 15:04:33 +01:00 |
|
|
ed4d0959d3
|
Add a const to easily tweak the dtype used for llama internal computations.
|
2023-06-30 15:01:39 +01:00 |
|
|
a243504f53
|
Merge pull request #46 from LaurentMazare/bugfix-cuda-u8-bf16
Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous.
|
2023-06-30 10:45:48 +01:00 |
|
|
313fa022a5
|
Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous.
|
2023-06-30 10:43:32 +01:00 |
|
|
d2ab4f86bf
|
Merge pull request #45 from LaurentMazare/u8
Add support for u8
|
2023-06-30 10:35:51 +01:00 |
|
|
fbc329ed85
|
Add the verbose cpu cast operations.
|
2023-06-30 10:33:29 +01:00 |
|
|
8ad47907f3
|
Add the kernels.
|
2023-06-30 10:26:56 +01:00 |
|
|
a7b16cbb98
|
Merge pull request #44 from LaurentMazare/check-dim
Improve how we check that the dims are in bounds.
|
2023-06-30 09:14:45 +01:00 |
|
|
19cbbc5212
|
Improve how we check that the dims are in bounds.
|
2023-06-30 09:11:00 +01:00 |
|
|
00476d37f8
|
Merge pull request #43 from LaurentMazare/bf16
Support for bf16 in cuda kernels
|
2023-06-30 05:48:58 +01:00 |
|
|
6486a6d7b2
|
Avoid some cast kernels.
|
2023-06-29 23:23:44 +01:00 |
|
|
ec79fc43f2
|
Add the bf16 cuda kernels.
|
2023-06-29 23:12:02 +01:00 |
|
|
018e017e7e
|
Merge pull request #42 from LaurentMazare/kv-cache-enable
Enable the KV cache
|
2023-06-29 22:22:11 +01:00 |
|
|
f6152e74b6
|
Tweak the kv-cache flag.
|
2023-06-29 22:16:40 +01:00 |
|
|
ae3f202f3b
|
Add a flag.
|
2023-06-29 22:12:15 +01:00 |
|
|
23389b1bd7
|
Enable the KV cache after fixing the caching length and the rope bits.
|
2023-06-29 22:00:57 +01:00 |
|
|
e87a99d16e
|
Merge pull request #41 from LaurentMazare/kv-cache
Kv cache
|
2023-06-29 19:11:52 +01:00 |
|
|
af66f0829e
|
Revert the new profile.
|
2023-06-29 19:08:50 +01:00 |
|
|
b50bd880ce
|
Only narrow when needed + deactivate the kv cache.
|
2023-06-29 19:07:52 +01:00 |
|
|
4b148b5414
|
Merge pull request #40 from LaurentMazare/fix_kernel_cache
Fixing kernel cache (a bit brutal for now, but if build triggers, rebuild ALL kernels).
|
2023-06-29 18:02:06 +02:00 |
|
|
1ea08a19cb
|
Rerun on new files.
|
2023-06-29 15:59:58 +00:00 |
|
|
b5bdbef53a
|
Fixing kernel cache (a bit brutal for now, but if build triggers,
rebuild ALL kernels).
|
2023-06-29 15:51:08 +00:00 |
|
|
3232df9458
|
Add some KV cache to llama.
|
2023-06-29 15:29:40 +01:00 |
|
|
889f7e0971
|
Merge pull request #39 from LaurentMazare/anyhow-backtrace
Add backtraces.
|
2023-06-29 13:17:53 +01:00 |
|
|
e27ee98d3f
|
Add backtraces.
|
2023-06-29 13:17:20 +01:00 |
|
|
e90f4aad26
|
Merge pull request #38 from LaurentMazare/llama_f16
Moving llama to f16.
|
2023-06-29 14:12:31 +02:00 |
|
|
78ec40b077
|
Typo.
|
2023-06-29 12:09:53 +00:00 |
|
|
de48e6fd59
|
Putting back main.
|
2023-06-29 12:08:35 +00:00 |
|
|
0958c588f6
|
Putting back seed.
|
2023-06-29 12:07:21 +00:00 |
|
|
c5e8f788be
|
Revert some changes.
|
2023-06-29 12:05:53 +00:00 |
|
|
e63ed6aaa3
|
Remove unwrap.
|
2023-06-29 12:04:25 +00:00 |
|
|
2fe1d3e36d
|
Moving llama to f16.
|
2023-06-29 12:00:16 +00:00 |
|
|
31396a3b9f
|
Merge pull request #37 from LaurentMazare/llama-seed
Add a seed parameter to llama.
|
2023-06-29 12:51:45 +01:00 |
|
|
b4dc9f6108
|
Add a seed parameter to llama.
|
2023-06-29 12:47:19 +01:00 |
|
|
53628db3a9
|
Merge pull request #36 from LaurentMazare/fix_example
Simple example fix.
|
2023-06-29 13:36:05 +02:00 |
|
|
1913512f42
|
Simple example fix.
|
2023-06-29 11:10:57 +00:00 |
|
|
c0719b7781
|
Merge pull request #35 from LaurentMazare/const-scalar
Use broadcasted scalars for const tensors.
|
2023-06-29 12:10:19 +01:00 |
|
|
2741b39ad3
|
Use broadcasted scalars for const tensors.
|
2023-06-29 11:56:40 +01:00 |
|
|
3872dc4751
|
Merge pull request #19 from LaurentMazare/llama_safetensors
Llama safetensors
|
2023-06-29 12:49:26 +02:00 |
|
|
5930168457
|
Merge pull request #34 from LaurentMazare/simpler-dtype-trait
Put more requirements on the withdtype trait.
|
2023-06-29 11:41:17 +01:00 |
|
|
b4aab7b95f
|
Put more requirements on the withdtype trait.
|
2023-06-29 11:37:42 +01:00 |
|
|
c8fc9da737
|
Merge pull request #33 from LaurentMazare/cuda-map
Simplify the dtype matchings in the cuda backend
|
2023-06-29 10:14:12 +01:00 |
|