37cad85869
Resurrect the llama npy support. ( #140 )
2023-07-11 19:32:10 +01:00
760f1d7055
Refactor the llama example to make it more in sync with the other ones. ( #139 )
...
* Refactor the llama example to make it more in sync with the other ones.
* Make clippy happy.
* Properly load the safetensor weights.
* Get llama back to a working state for the safetensors case.
2023-07-11 17:20:55 +01:00
64264d97c1
Modular backends ( #138 )
...
* Add some trait to formalize backends.
* Use the generic backend trait.
2023-07-11 11:17:02 +01:00
674eb35e10
Remove some dead-code pragmas. ( #137 )
2023-07-11 09:33:59 +01:00
ae79c00e48
Allow for uniform initialization in a single step. ( #136 )
2023-07-11 08:52:29 +01:00
b31a3bbdcb
Sketch the tensor initialization module. ( #134 )
2023-07-11 07:41:46 +01:00
0e9d3afd77
Simplify the var-builder layer setup. ( #133 )
2023-07-10 23:22:58 +01:00
6fc1ab4f0d
MusicGen var-store path cleanup. ( #132 )
2023-07-10 23:13:11 +01:00
b46c28a2ac
VarBuilder path creation ( #131 )
...
* Use a struct for the safetensor+routing.
* Group the path and the var-builder together.
* Fix for the empty path case.
2023-07-10 22:37:34 +01:00
1aa7fbbc33
Move the var-builder in a central place. ( #130 )
2023-07-10 20:49:50 +01:00
2be09dbb1d
Macroify the repeating bits. ( #129 )
2023-07-10 19:44:06 +01:00
23849cb6e6
Merge pull request #124 from LaurentMazare/new_doc
...
Squeeze/unsqueeze/reshape
2023-07-10 20:43:23 +02:00
fba07d6b6b
Merge pull request #127 from LaurentMazare/tensor_indexing
...
`i(..)` indexing sugar (partial).
2023-07-10 19:56:34 +02:00
1ad235953b
Clippy ?
2023-07-10 19:34:38 +02:00
c9d354f5ae
Update candle-core/src/tensor.rs
2023-07-10 19:29:22 +02:00
f29b77ec19
Random initializers. ( #128 )
...
* Random initialization.
* CPU rng generation.
2023-07-10 18:26:21 +01:00
5ea747c047
Update candle-core/src/indexer.rs
2023-07-10 19:02:35 +02:00
ef0375d8bc
i(..)
indexing sugar (partial).
...
- Only range, and select (no tensor_select)
- No negative indexing
2023-07-10 17:34:04 +02:00
e2807c78a4
Enable the doctests to run with mkl (though they are broken for now). ( #126 )
2023-07-10 16:27:46 +01:00
548b1df7ea
Remove the dependency to blas and use mkl directly. ( #125 )
2023-07-10 15:52:03 +01:00
e01d099b71
Squeeze/unsqueeze/reshape
2023-07-10 16:40:25 +02:00
221b1aff65
Support dgemm in mkl matmul. ( #122 )
2023-07-10 15:02:37 +01:00
71cd3745a9
Add some layer-norm tests. ( #121 )
2023-07-10 14:43:04 +01:00
dc58259679
Merge pull request #120 from LaurentMazare/some_doc_plus_fix_stack
...
Adding some doc + Extended `stack` to work with extra final dimensions.
2023-07-10 15:21:24 +02:00
9a667155fd
Removed commented deny
2023-07-10 15:18:23 +02:00
2c8fbe8155
oops.
2023-07-10 15:13:52 +02:00
49f4a77ffd
Put them back.
2023-07-10 15:11:48 +02:00
38ac50eeda
Adding some doc + Extended stack
to work with extra final dimensions.
2023-07-10 14:51:10 +02:00
204618b7d3
Merge pull request #118 from LaurentMazare/readme_update
...
Expanding a bit the README
2023-07-10 13:12:23 +02:00
868743b8b9
Expanding a bit the README
2023-07-10 12:51:37 +02:00
89a5b602a6
Move the conv1d layer to candle_nn. ( #117 )
2023-07-10 11:02:06 +01:00
b06e1a7e54
[nn] Move the Embedding and Activation parts. ( #116 )
...
* Share the Embedding and Activation parts.
* Tweak some activations.
2023-07-10 10:24:52 +01:00
9ce0f1c010
Sketch the candle-nn crate. ( #115 )
...
* Sketch the candle-nn crate.
* Tweak the cuda dependencies.
* More cuda tweaks.
2023-07-10 08:50:09 +01:00
bc3be6f9b0
Add the elu cuda kernel. ( #114 )
2023-07-10 07:57:01 +01:00
270997a055
Add the elu op. ( #113 )
2023-07-09 21:56:31 +01:00
ea5dfa69bc
Sketching the musicgen model. ( #66 )
...
* Skeleton files for musicgen.
* Add a musicgen model module.
* Sketch the model loading.
* Start adding the forward pass.
* More forward pass.
* Positional embeddings.
* Forward for the decoder layers.
* Add an empty function.
* Fix the musicgen weight names.
* More musicgen modeling.
* Add the T5 loading bits.
* Add the encodec config.
* Add the encodec module hierarchy.
* More Encodec modeling.
* Encodec modeling.
* Encodec modeling.
* Add more to the encodec modeling.
* Load the weights.
* Populate the resnet blocks.
* Also load the conv transpose weights.
* Split musicgen in multiple files.
2023-07-09 19:53:35 +01:00
c187f347bf
Make it easier to use whisper samples from the repo. ( #112 )
...
* Make it easier to use samples from the repo.
* Use f32 for accumulation in the f16/bf16 kernels.
2023-07-08 18:48:27 +01:00
eb64ad0d4d
Cuda kernel for the conv1d op ( #111 )
...
* Boilerplate code for conv1d.
* Boilerplate code for conv1d.
* More boilerplate for conv1d.
* Conv1d work.
* Get the conv1d cuda kernel to work.
* Conv1d support when no batch dim.
2023-07-08 18:13:25 +01:00
5c3864f9f7
Add more sum tests. ( #110 )
...
* Add some tests for the sum.
* More sum testing.
2023-07-08 13:15:36 +01:00
e676f85f00
Sketch a fast cuda kernel for reduce-sum. ( #109 )
...
* Sketch a fast cuda kernel for reduce-sum.
* Sketch the rust support code for the fast sum kernel.
* More work on the fast kernel.
* Add some testing ground.
* A couple fixes for the fast sum kernel.
2023-07-08 12:43:56 +01:00
33479c5f1b
Add some very simple sum benchmark. ( #108 )
...
* Add some very simple sum benchmark.
* Rename the file.
2023-07-08 08:39:27 +01:00
f35cfc5e97
Sample with temperature. ( #106 )
2023-07-07 18:12:25 +01:00
03dffe9ecc
Use F32 for the reduce ops. ( #105 )
2023-07-07 17:55:21 +01:00
e923b3adc2
Add a KV cache to falcon. ( #104 )
2023-07-07 17:24:38 +01:00
05ff1cff66
Add some caching to the causal mask. ( #103 )
2023-07-07 12:56:44 +01:00
65937612d0
Merge pull request #91 from LaurentMazare/tweak_parallel_download
...
Getting tokio tasks stuck on smaller machines.
2023-07-07 09:43:55 +02:00
2df044f9a1
Clippy after rebase.
2023-07-07 09:22:09 +02:00
1ec221a749
Fixing falcon example.
2023-07-07 09:13:55 +02:00
514b171f75
Getting tokio tasks stuck on smaller machines.
2023-07-07 09:13:28 +02:00
d38a926c14
Convert the logits to f32 before extracting them. ( #102 )
2023-07-07 08:07:57 +01:00