4845d5cc64
More realistic training setup. ( #210 )
...
* More realistic training setup.
* Compute the model accuracy.
* Very inefficient backprop for index select.
* More backprop.
* Fix some backprop issues.
* Backprop fix.
* Another broadcasting backprop fix.
* Better backprop for reducing ops.
* Training again.
* Add some gradient tests.
* Get the training to work.
2023-07-20 18:25:41 +01:00
12d6dc018d
Support for MQA for llama v2. ( #205 )
...
* Support for MQA for llama v2.
* More llama-v2.
* Move the rotary embedding precomputation in the cache.
* Add a v2 flag.
* Use the hf model.
2023-07-20 06:39:04 +01:00
9515e8ea6c
Merge branch 'main' into remove_wrapper
2023-07-19 18:53:55 +02:00
e6584476c4
Merge pull request #200 from LaurentMazare/removing_candle_hub
...
Removing `candle-hub` internal to extract into `hf-hub` standalone.
2023-07-19 17:27:55 +02:00
cb687b4897
Add some more developed training examples. ( #199 )
...
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
2023-07-19 15:37:52 +01:00
dfd624dbd3
[Proposal] Remove SafeTensor wrapper (allows finer control for users).
2023-07-19 16:25:44 +02:00
439321745a
Removing candle-hub
internal to extract into hf-hub
standalone.
2023-07-19 15:04:38 +02:00
ff61a42ad7
Use mkl to accelerate binary ops. ( #190 )
...
* Vectorized binary ops with mkl.
* Improve the binary op mkl support.
* Push the support for mkl binary ops.
* Proper vectorization of binary ops.
* Proper mkl'isation when broadcasting binary ops.
2023-07-18 12:04:39 +01:00
b706f32839
Add Shape try into ( #189 )
...
* Add the TryInto trait for shapes.
* Use the vectorized operations in block mode too.
2023-07-18 10:52:16 +01:00
d6313d2447
Add more tracing details to bert. ( #188 )
2023-07-18 08:11:05 +01:00
b8abe2bb4b
Factorize the tokenizers version in the workspace cargo def. ( #186 )
2023-07-18 06:48:13 +01:00
f0cccd08f0
Bert tracing ( #184 )
...
* Add some tracing to bert.
* More tracing.
* Add a flag for tracing.
2023-07-17 19:40:42 +01:00
104f89df31
Centralize the dependency versions and inherit them. ( #177 )
2023-07-16 07:47:17 +01:00
66750f9827
Add some 'cuda-if-available' helper function. ( #172 )
2023-07-15 08:25:15 +01:00
4ed56d7861
Removing cuda default.
...
Seems very important for a lot of exploring users usually on laptop
without GPUs.
Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
a2f72edc0d
Simplify the parameters used by sum and sum_keepdim. ( #165 )
2023-07-14 08:22:08 +01:00
2bfa791336
Use the same default as pytorch for sum. ( #164 )
2023-07-13 21:32:32 +01:00
3c02ea56b0
Add a cli argument to easily switch the dtype. ( #161 )
2023-07-13 19:18:49 +01:00
50b0946a2d
Tensor mutability ( #154 )
...
* Working towards tensor mutability.
* Use a ref-cell to provide tensor mutability.
2023-07-13 11:04:40 +01:00
a3663ce2f2
Encodec forward pass ( #153 )
...
* Sketch the forward pass for encodec.
* Forward pass for the encodec resnet block.
* Encodec decoding.
2023-07-13 08:18:39 +01:00
6c75a98ad2
Add the forward pass for the T5 model. ( #152 )
...
* Add the forward pass for the T5 model.
* More t5 forward pass.
2023-07-12 22:02:40 +01:00
ba35d895e7
Sketch the candle-transformers crate. ( #147 )
...
* Sketch the candle-transformers crate.
* Format the empty files.
2023-07-12 13:49:31 +01:00
eae646d322
Use arange in the examples. ( #146 )
2023-07-12 12:12:34 +01:00
20599172ac
Add from_iter and arange, use it in the doctests. ( #145 )
2023-07-12 12:03:01 +01:00
b3b39cca92
Llama batch ( #144 )
...
* Add a batch dimension to llama.
* Bugfixes.
2023-07-12 11:38:19 +01:00
fa760759e5
Allow for lazy loading of npz files, use it in llama to reduce memory usage in the cpu version. ( #141 )
2023-07-11 20:22:34 +01:00
37cad85869
Resurrect the llama npy support. ( #140 )
2023-07-11 19:32:10 +01:00
760f1d7055
Refactor the llama example to make it more in sync with the other ones. ( #139 )
...
* Refactor the llama example to make it more in sync with the other ones.
* Make clippy happy.
* Properly load the safetensor weights.
* Get llama back to a working state for the safetensors case.
2023-07-11 17:20:55 +01:00
674eb35e10
Remove some dead-code pragmas. ( #137 )
2023-07-11 09:33:59 +01:00
0e9d3afd77
Simplify the var-builder layer setup. ( #133 )
2023-07-10 23:22:58 +01:00
6fc1ab4f0d
MusicGen var-store path cleanup. ( #132 )
2023-07-10 23:13:11 +01:00
b46c28a2ac
VarBuilder path creation ( #131 )
...
* Use a struct for the safetensor+routing.
* Group the path and the var-builder together.
* Fix for the empty path case.
2023-07-10 22:37:34 +01:00
1aa7fbbc33
Move the var-builder in a central place. ( #130 )
2023-07-10 20:49:50 +01:00
89a5b602a6
Move the conv1d layer to candle_nn. ( #117 )
2023-07-10 11:02:06 +01:00
b06e1a7e54
[nn] Move the Embedding and Activation parts. ( #116 )
...
* Share the Embedding and Activation parts.
* Tweak some activations.
2023-07-10 10:24:52 +01:00
9ce0f1c010
Sketch the candle-nn crate. ( #115 )
...
* Sketch the candle-nn crate.
* Tweak the cuda dependencies.
* More cuda tweaks.
2023-07-10 08:50:09 +01:00
ea5dfa69bc
Sketching the musicgen model. ( #66 )
...
* Skeleton files for musicgen.
* Add a musicgen model module.
* Sketch the model loading.
* Start adding the forward pass.
* More forward pass.
* Positional embeddings.
* Forward for the decoder layers.
* Add an empty function.
* Fix the musicgen weight names.
* More musicgen modeling.
* Add the T5 loading bits.
* Add the encodec config.
* Add the encodec module hierarchy.
* More Encodec modeling.
* Encodec modeling.
* Encodec modeling.
* Add more to the encodec modeling.
* Load the weights.
* Populate the resnet blocks.
* Also load the conv transpose weights.
* Split musicgen in multiple files.
2023-07-09 19:53:35 +01:00
c187f347bf
Make it easier to use whisper samples from the repo. ( #112 )
...
* Make it easier to use samples from the repo.
* Use f32 for accumulation in the f16/bf16 kernels.
2023-07-08 18:48:27 +01:00
f35cfc5e97
Sample with temperature. ( #106 )
2023-07-07 18:12:25 +01:00
03dffe9ecc
Use F32 for the reduce ops. ( #105 )
2023-07-07 17:55:21 +01:00
e923b3adc2
Add a KV cache to falcon. ( #104 )
2023-07-07 17:24:38 +01:00
05ff1cff66
Add some caching to the causal mask. ( #103 )
2023-07-07 12:56:44 +01:00
2df044f9a1
Clippy after rebase.
2023-07-07 09:22:09 +02:00
1ec221a749
Fixing falcon example.
2023-07-07 09:13:55 +02:00
d38a926c14
Convert the logits to f32 before extracting them. ( #102 )
2023-07-07 08:07:57 +01:00
bac4ef40f3
Add some text generation pipeline for falcon. ( #98 )
2023-07-07 06:34:22 +01:00
2b8e8c9f14
Bugfixes. ( #97 )
2023-07-06 23:26:11 +01:00
a3f3b93d16
Add the call to dense in the attention layer. ( #96 )
2023-07-06 23:22:08 +01:00
0a2c82e301
Merge pull request #92 from LaurentMazare/sync_hub
...
Creating new sync Api for `candle-hub`.
2023-07-07 00:10:47 +02:00
0f679fe42e
Fix some shape issues in falcon. ( #95 )
...
* Fix some shape issues.
* Use different dtypes.
2023-07-06 19:23:54 +01:00