Commit Graph

346 Commits

Author SHA1 Message Date
166bfd5847 Add the recip op + use it in stable-diffusion. (#331)
* Add the recip unary op.

* Fix the cuda kernel.

* Use the recip op in sigmoid.
2023-08-06 21:14:52 +01:00
1c062bf06b Add the ddim scheduler. (#330) 2023-08-06 20:44:00 +01:00
d34039e352 Add a stable diffusion example (#328)
* Start adding a stable-diffusion example.

* Proper computation of the causal mask.

* Add the chunk operation.

* Work in progress: port the attention module.

* Add some dummy modules for conv2d and group-norm, get the attention module to compile.

* Re-enable the 2d convolution.

* Add the embeddings module.

* Add the resnet module.

* Add the unet blocks.

* Add the unet.

* And add the variational auto-encoder.

* Use the pad function from utils.
2023-08-06 17:49:43 +01:00
b278834267 Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
620f83cf66 Add the candle-datasets crate (#322)
* Move the vision datasets to a separate crate.

* Move the batcher bits.

* Update the readme.

* Move the tiny-stories bits.

---------

Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
f7b2a0391d Transpose the weight matrixes for llama2.c. (#321) 2023-08-04 13:32:20 +01:00
df6667ba88 Add some tracing to llama. (#318) 2023-08-03 13:52:22 +01:00
a79286885c Support safetensors weights in llama2.c inference. (#317) 2023-08-03 11:10:58 +01:00
dba31473d4 Typos and format and CD only when PR lands. 2023-08-02 19:18:43 +02:00
c11e78b334 Odd rebase artifact. 2023-08-02 18:40:24 +02:00
1b705a426f Remove duplicate. 2023-08-02 18:40:24 +02:00
a44471a305 Adding more details on how to load things.
- Loading with memmap
- Loading a sharded tensor
- Moved some snippets to `candle-examples/src/lib.rs` This is because
managing book specific dependencies is a pain https://github.com/rust-lang/mdBook/issues/706
- This causes a non aligned inclusion  https://github.com/rust-lang/mdBook/pull/1856 which we have
to ignore fmt to remove.

mdbook might need some more love :)
2023-08-02 18:40:24 +02:00
4f17290ce0 Use AdamW in the llama2 training. (#308) 2023-08-02 14:14:02 +01:00
4fe8a02f88 Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
03a421f714 Add some missing readme files. (#304) 2023-08-02 10:57:12 +01:00
d38943aadc Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
51e51da896 Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
4b3bd79fbd Remove the embedding ops in favor of index-select. (#299)
* Remove the embedding ops in favor of index-select.

* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
ff876c2103 Llama more training (#297)
* Rework the var-builder to handle initializations.

* Add some helper functions for layer creation.

* Improve the layer initializations.

* Get initialized variables.

* Precompute the rot embeddings when training lamas.
2023-08-01 19:53:41 +01:00
a27239f3d9 Add training for the llama2.c example (#296)
* Rework the commands and run inference by default.

* Add the training module and load the training dataset.

* Random dataset iterator.

* Proper valid-loss computation.

* Compute the evaluation loss.

* Add more substance to the training loop.
2023-08-01 17:23:07 +01:00
75e0448114 Move the weight bits in a separate module. (#295) 2023-08-01 10:37:06 +01:00
614f911e9e Add some batcher variants that handle errors. (#294) 2023-08-01 09:40:34 +01:00
e1e8127f15 Add the batcher. (#293) 2023-08-01 09:16:10 +01:00
fa98ca0c35 Use subcommands in llama2. (#292) 2023-08-01 05:57:41 +01:00
1a07ff8d17 Pre-tokenized evaluation mode for llama2.c. (#291) 2023-08-01 05:36:25 +01:00
f28558d0b7 Evaluate on the pre-tokenized file. (#290) 2023-07-31 21:31:38 +01:00
6b98b66eb3 Remove the end of text tokens. (#289) 2023-07-31 20:43:57 +01:00
9ae1f6afee Add an eval mode to llama2-c (#288)
* Add an eval mode to llama2-c.

* Encode line by line.

* Get the eval to run.
2023-07-31 17:22:14 +01:00
ffeafbfc43 Make the nll op closer to the pytorch version + add a test. (#286) 2023-07-31 14:14:01 +01:00
b3ea96b62b Add a prompt and support more models in llama2-c. (#285)
* Support more models in llama2-c.

* Add a prompt.
2023-07-31 13:09:30 +01:00
94a43faaca Use the hub models for llama2.c (#284) 2023-07-31 12:51:14 +01:00
62a9b03715 Add a flag to set the number of epochs in the mnist training (#283)
* Add a flag to change the number of epochs for the mnist training.

* Increase the learning rate for the MLP.
2023-07-31 10:32:14 +01:00
a8d8f9f206 Load a trained checkpoint in the mnist example. (#280) 2023-07-30 17:01:45 +01:00
38ff693af0 Add a flag to save the trained weights. (#279) 2023-07-30 15:41:42 +01:00
c950a5c6b1 Cuda support for the mnist training. (#277)
* Cuda support for the mnist training.

* min/max fix + testing.

* Add the argmin/argmax tests.

* More cuda support for argmin/argmax.

* Cuda kernels for argmin and argmax.
2023-07-29 19:48:04 +01:00
16c33383eb Improve the mnist training example. (#276)
* Improve the mnist training example.

* Add some initialization routine that can be used for nn.

* Proper initialization in the mnist example.
2023-07-29 16:28:22 +01:00
40c80bfbb2 Merge branch 'main' into update_multiprocess 2023-07-29 16:38:35 +02:00
07eb899729 More mnist training. (#275) 2023-07-29 13:29:31 +01:00
4bf2ebf836 Use u8 tensors for masks. (#273) 2023-07-29 11:32:58 +01:00
97d8712ba5 Remove single function. 2023-07-28 23:31:25 +02:00
97181a77c0 Making multiprocess require flash-attn. 2023-07-28 23:31:24 +02:00
50d8273ae4 Support both llama v1 and llama v2. (#272) 2023-07-28 18:40:59 +01:00
7513a5e005 Line-up the llama implementation with the python-transformers one. (#271)
* Line-up the llama implementation with the python-transformers one.

* Also lineup the multiprocess version.
2023-07-28 18:31:28 +01:00
cb8dd5cd53 Back to using the main branch now that the PR has been merged. (#270) 2023-07-28 16:22:44 +01:00
a0e47aba98 Fix the revision used in starcoder to use the safetensors PR. (#269) 2023-07-28 14:02:31 +01:00
3eb2bc6d07 Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
68eab38de6 Cuda fix for starcoder. (#266)
* Cuda fix for starcoder.

* Nicer output.
2023-07-28 12:13:41 +01:00
4002968cf5 Put back `"dep:half" 2023-07-28 10:34:21 +00:00
be256a6ba6 Fixing. 2023-07-28 10:23:05 +00:00
d2dea11ef6 Fixing nccl feature. 2023-07-28 12:19:20 +02:00