Commit Graph

227 Commits

Author SHA1 Message Date
5b1690fffa Tweak the llama example. (#450) 2023-08-15 12:18:20 +01:00
3cc87058b7 Support local weights & dynamic outputs (#447)
* Support local weights & dynamic outputs

* Revise as suggested

* Cargo code format
2023-08-15 11:51:57 +01:00
531f23b4d0 Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
90374097dc Cudnn support (#445)
* Add a cudnn feature to be used for conv2d.

* Allocate the proper workspace.

* Only create a single cudnn handle per cuda device.

* Proper cudnn usage.

* Bugfix.
2023-08-14 21:30:41 +01:00
c84883ecf2 Add a cuda kernel for upsampling. (#441)
* Add a cuda kernel for upsampling.

* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
9e7e6e0288 Add dequantization for ggmls q4_0, q4_1, q5_0, q5_1 and q8_0 (#407)
* Added dequantization for `q4_0`, `q4_1`, `q5_0`, `q5_1` and `q8_0`

* expose `tensor_from_ggml` for external usage

* bugfixes & example
2023-08-13 23:22:57 +01:00
8bd2b22b33 Optimize the logit computations in the whisper example. (#434) 2023-08-13 22:00:13 +01:00
9af438ac1b Track the conv2d operations in stable-diffusion. (#431)
* Track the conv2d operations in stable-diffusion.

* Add more tracing to stable-diffusion.

* Also trace the resnet bits.

* Trace the attention blocks.

* Also trace the attention inner part.

* Small tweak.
2023-08-13 15:58:26 +01:00
b1ff78f762 Allow using accelerate with stable-diffusion. (#430) 2023-08-13 14:14:20 +01:00
6d694554b8 Support longer sequences in language detection. (#428) 2023-08-13 13:16:15 +01:00
9aca398a4f More accelerate optimizations (#427)
* Add more tracing to the whisper example.

* Support accelerate in more examples.

* Use accelerate for pointwise functions.

* Use accelerate for binary operations too.

* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
60cd1551ca Add a KV cache to whisper. (#426) 2023-08-12 21:17:08 +01:00
a0908d212c Add a -language argument. (#425) 2023-08-12 17:08:40 +01:00
0741ebbd51 More multilingual support for whisper. (#419)
* More multilingual support for whisper.

* Use the language token appropriately.
2023-08-12 15:32:52 +01:00
0c3f109faa Basic multilingual support for whisper (#417)
* Multi-lingual support for whisper.

* Avoid hardcoding the token names.

* More multi-lingual support.

* Remove the todo.
2023-08-12 11:23:04 +01:00
1d0157bbc4 Stable diffusion: retrieve the model files from the HF hub. (#414)
* Retrieve the model files from the HF hub in the stable diffusion example.

* Add to the readme.
2023-08-11 18:57:06 +01:00
91dbf907d3 Add more whisper variants. (#413) 2023-08-11 17:33:55 +01:00
906c0f3eb5 Remove the checkpoint conversion script. (#405)
* Remove the checkpoint conversion script.

* Remove references to the script.
2023-08-11 05:59:48 +01:00
80f0482f26 Fix the stable-diffusion vae. (#398)
* Fix the stable-diffusion vae.

* Fix for saving images.
2023-08-10 18:24:31 +01:00
385f0d261c Normalize embeddings in the bert example. (#390) 2023-08-10 13:05:55 +01:00
c3a0761e62 Add some tracing to the whisper example. (#375) 2023-08-09 19:58:36 +01:00
a3b1699409 Embed the mel filters in the whisper binary. (#373) 2023-08-09 18:27:26 +01:00
dece0b8a76 Merge pull request #263 from huggingface/book_3
Book 3 (advanced loading + hub)
2023-08-09 16:50:11 +02:00
3a62aee91f Write the generated images using the image crate. (#363)
* Use the image crate to write the generated images.

* Make the dependency optional.
2023-08-09 15:26:44 +01:00
be21d7e75a Fix the padding used in stable diffusion. (#362) 2023-08-09 13:23:59 +01:00
89d3926c9b Fixes for the stable diffusion example. (#342)
* Fixes for the stable diffusion example.

* Bugfix.

* Another fix.

* Fix for group-norm.

* More fixes to get SD to work.
2023-08-08 14:57:09 +01:00
fc265d9dcf Some CLIP fixes for stable diffusion. (#338)
* Some CLIP fixes for stable diffusion.

* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
2345b8ce3f Skeleton for the avg-pool2d and upsample-nearest2d ops. (#337)
* Skeleton for the avg-pool2d and upsample-nearest2d ops.

* Preliminary conv2d support.
2023-08-07 16:15:38 +01:00
f53a333ea9 Simple pad support. (#336)
* Simple pad support.

* Fix the tensor indexing when padding.
2023-08-07 15:24:56 +01:00
5bb2fce998 Implement group-norm. (#334)
* Implement group-norm.

* Add some testing for group-norm.
2023-08-07 06:53:05 +01:00
141df4ad2b Main diffusion loop for the SD example. (#332) 2023-08-06 21:39:53 +01:00
166bfd5847 Add the recip op + use it in stable-diffusion. (#331)
* Add the recip unary op.

* Fix the cuda kernel.

* Use the recip op in sigmoid.
2023-08-06 21:14:52 +01:00
1c062bf06b Add the ddim scheduler. (#330) 2023-08-06 20:44:00 +01:00
d34039e352 Add a stable diffusion example (#328)
* Start adding a stable-diffusion example.

* Proper computation of the causal mask.

* Add the chunk operation.

* Work in progress: port the attention module.

* Add some dummy modules for conv2d and group-norm, get the attention module to compile.

* Re-enable the 2d convolution.

* Add the embeddings module.

* Add the resnet module.

* Add the unet blocks.

* Add the unet.

* And add the variational auto-encoder.

* Use the pad function from utils.
2023-08-06 17:49:43 +01:00
b278834267 Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
620f83cf66 Add the candle-datasets crate (#322)
* Move the vision datasets to a separate crate.

* Move the batcher bits.

* Update the readme.

* Move the tiny-stories bits.

---------

Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
f7b2a0391d Transpose the weight matrixes for llama2.c. (#321) 2023-08-04 13:32:20 +01:00
df6667ba88 Add some tracing to llama. (#318) 2023-08-03 13:52:22 +01:00
a79286885c Support safetensors weights in llama2.c inference. (#317) 2023-08-03 11:10:58 +01:00
dba31473d4 Typos and format and CD only when PR lands. 2023-08-02 19:18:43 +02:00
c11e78b334 Odd rebase artifact. 2023-08-02 18:40:24 +02:00
1b705a426f Remove duplicate. 2023-08-02 18:40:24 +02:00
a44471a305 Adding more details on how to load things.
- Loading with memmap
- Loading a sharded tensor
- Moved some snippets to `candle-examples/src/lib.rs` This is because
managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706
- This causes a non aligned inclusion  https://github.com/rust-lang/mdBook/pull/1856 which we have
to ignore fmt to remove.

mdbook might need some more love :)
2023-08-02 18:40:24 +02:00
4f17290ce0 Use AdamW in the llama2 training. (#308) 2023-08-02 14:14:02 +01:00
4fe8a02f88 Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
03a421f714 Add some missing readme files. (#304) 2023-08-02 10:57:12 +01:00
d38943aadc Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
51e51da896 Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
4b3bd79fbd Remove the embedding ops in favor of index-select. (#299)
* Remove the embedding ops in favor of index-select.

* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
ff876c2103 Llama more training (#297)
* Rework the var-builder to handle initializations.

* Add some helper functions for layer creation.

* Improve the layer initializations.

* Get initialized variables.

* Precompute the rot embeddings when training lamas.
2023-08-01 19:53:41 +01:00