76804730c6
Using the real config from the hub when available.
2023-08-16 10:36:01 +02:00
ca449f9ee1
Add quantized tensors. ( #458 )
...
* Add quantized tensors.
* Implement the debug trait for QTensor.
* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c
Quantized support for f16 and f32 ( #457 )
...
* Add f32 as a quantized type.
* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00
e68b2accb4
Split out the quantized file. ( #456 )
2023-08-15 20:26:27 +01:00
5b1690fffa
Tweak the llama example. ( #450 )
2023-08-15 12:18:20 +01:00
3cc87058b7
Support local weights & dynamic outputs ( #447 )
...
* Support local weights & dynamic outputs
* Revise as suggested
* Cargo code format
2023-08-15 11:51:57 +01:00
c84883ecf2
Add a cuda kernel for upsampling. ( #441 )
...
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
9e7e6e0288
Add dequantization for ggmls q4_0
, q4_1
, q5_0
, q5_1
and q8_0
( #407 )
...
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`
* expose `tensor_from_ggml` for external usage
* bugfixes & example
2023-08-13 23:22:57 +01:00
8bd2b22b33
Optimize the logit computations in the whisper example. ( #434 )
2023-08-13 22:00:13 +01:00
9af438ac1b
Track the conv2d operations in stable-diffusion. ( #431 )
...
* Track the conv2d operations in stable-diffusion.
* Add more tracing to stable-diffusion.
* Also trace the resnet bits.
* Trace the attention blocks.
* Also trace the attention inner part.
* Small tweak.
2023-08-13 15:58:26 +01:00
b1ff78f762
Allow using accelerate with stable-diffusion. ( #430 )
2023-08-13 14:14:20 +01:00
6d694554b8
Support longer sequences in language detection. ( #428 )
2023-08-13 13:16:15 +01:00
9aca398a4f
More accelerate optimizations ( #427 )
...
* Add more tracing to the whisper example.
* Support accelerate in more examples.
* Use accelerate for pointwise functions.
* Use accelerate for binary operations too.
* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
60cd1551ca
Add a KV cache to whisper. ( #426 )
2023-08-12 21:17:08 +01:00
a0908d212c
Add a -language argument. ( #425 )
2023-08-12 17:08:40 +01:00
0741ebbd51
More multilingual support for whisper. ( #419 )
...
* More multilingual support for whisper.
* Use the language token appropriately.
2023-08-12 15:32:52 +01:00
0c3f109faa
Basic multilingual support for whisper ( #417 )
...
* Multi-lingual support for whisper.
* Avoid hardcoding the token names.
* More multi-lingual support.
* Remove the todo.
2023-08-12 11:23:04 +01:00
1d0157bbc4
Stable diffusion: retrieve the model files from the HF hub. ( #414 )
...
* Retrieve the model files from the HF hub in the stable diffusion example.
* Add to the readme.
2023-08-11 18:57:06 +01:00
91dbf907d3
Add more whisper variants. ( #413 )
2023-08-11 17:33:55 +01:00
906c0f3eb5
Remove the checkpoint conversion script. ( #405 )
...
* Remove the checkpoint conversion script.
* Remove references to the script.
2023-08-11 05:59:48 +01:00
80f0482f26
Fix the stable-diffusion vae. ( #398 )
...
* Fix the stable-diffusion vae.
* Fix for saving images.
2023-08-10 18:24:31 +01:00
385f0d261c
Normalize embeddings in the bert example. ( #390 )
2023-08-10 13:05:55 +01:00
c3a0761e62
Add some tracing to the whisper example. ( #375 )
2023-08-09 19:58:36 +01:00
a3b1699409
Embed the mel filters in the whisper binary. ( #373 )
2023-08-09 18:27:26 +01:00
3a62aee91f
Write the generated images using the image crate. ( #363 )
...
* Use the image crate to write the generated images.
* Make the dependency optional.
2023-08-09 15:26:44 +01:00
be21d7e75a
Fix the padding used in stable diffusion. ( #362 )
2023-08-09 13:23:59 +01:00
89d3926c9b
Fixes for the stable diffusion example. ( #342 )
...
* Fixes for the stable diffusion example.
* Bugfix.
* Another fix.
* Fix for group-norm.
* More fixes to get SD to work.
2023-08-08 14:57:09 +01:00
fc265d9dcf
Some CLIP fixes for stable diffusion. ( #338 )
...
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
2345b8ce3f
Skeleton for the avg-pool2d and upsample-nearest2d ops. ( #337 )
...
* Skeleton for the avg-pool2d and upsample-nearest2d ops.
* Preliminary conv2d support.
2023-08-07 16:15:38 +01:00
f53a333ea9
Simple pad support. ( #336 )
...
* Simple pad support.
* Fix the tensor indexing when padding.
2023-08-07 15:24:56 +01:00
5bb2fce998
Implement group-norm. ( #334 )
...
* Implement group-norm.
* Add some testing for group-norm.
2023-08-07 06:53:05 +01:00
141df4ad2b
Main diffusion loop for the SD example. ( #332 )
2023-08-06 21:39:53 +01:00
166bfd5847
Add the recip op + use it in stable-diffusion. ( #331 )
...
* Add the recip unary op.
* Fix the cuda kernel.
* Use the recip op in sigmoid.
2023-08-06 21:14:52 +01:00
1c062bf06b
Add the ddim scheduler. ( #330 )
2023-08-06 20:44:00 +01:00
d34039e352
Add a stable diffusion example ( #328 )
...
* Start adding a stable-diffusion example.
* Proper computation of the causal mask.
* Add the chunk operation.
* Work in progress: port the attention module.
* Add some dummy modules for conv2d and group-norm, get the attention module to compile.
* Re-enable the 2d convolution.
* Add the embeddings module.
* Add the resnet module.
* Add the unet blocks.
* Add the unet.
* And add the variational auto-encoder.
* Use the pad function from utils.
2023-08-06 17:49:43 +01:00
b278834267
Support the Accelerate BLAS on macOS. ( #325 )
...
* Add the accelerate feature.
* Ffi tweaks.
2023-08-05 17:25:24 +01:00
620f83cf66
Add the candle-datasets crate ( #322 )
...
* Move the vision datasets to a separate crate.
* Move the batcher bits.
* Update the readme.
* Move the tiny-stories bits.
---------
Co-authored-by: Jane Doe <jane.doe@example.org >
2023-08-05 08:56:50 +01:00
f7b2a0391d
Transpose the weight matrixes for llama2.c. ( #321 )
2023-08-04 13:32:20 +01:00
df6667ba88
Add some tracing to llama. ( #318 )
2023-08-03 13:52:22 +01:00
a79286885c
Support safetensors weights in llama2.c inference. ( #317 )
2023-08-03 11:10:58 +01:00
4f17290ce0
Use AdamW in the llama2 training. ( #308 )
2023-08-02 14:14:02 +01:00
51e51da896
Rename the candle crate to candle-core ( #301 )
...
* Rename to candle-core.
* More candle-core renaming.
2023-08-02 08:20:22 +01:00
4b3bd79fbd
Remove the embedding ops in favor of index-select. ( #299 )
...
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
ff876c2103
Llama more training ( #297 )
...
* Rework the var-builder to handle initializations.
* Add some helper functions for layer creation.
* Improve the layer initializations.
* Get initialized variables.
* Precompute the rot embeddings when training lamas.
2023-08-01 19:53:41 +01:00
a27239f3d9
Add training for the llama2.c example ( #296 )
...
* Rework the commands and run inference by default.
* Add the training module and load the training dataset.
* Random dataset iterator.
* Proper valid-loss computation.
* Compute the evaluation loss.
* Add more substance to the training loop.
2023-08-01 17:23:07 +01:00
75e0448114
Move the weight bits in a separate module. ( #295 )
2023-08-01 10:37:06 +01:00
614f911e9e
Add some batcher variants that handle errors. ( #294 )
2023-08-01 09:40:34 +01:00
e1e8127f15
Add the batcher. ( #293 )
2023-08-01 09:16:10 +01:00
fa98ca0c35
Use subcommands in llama2. ( #292 )
2023-08-01 05:57:41 +01:00
1a07ff8d17
Pre-tokenized evaluation mode for llama2.c. ( #291 )
2023-08-01 05:36:25 +01:00