Commit Graph

34 Commits

Author SHA1 Message Date
01545f7303 Add a slice_set op. (#2193)
* Add a slice_set op.

* Add some testing.

* Add the dedicated kv-cache module.

* Derive debug and clone.

* Expose more kv-cache functions.

* Return the current data when appending.

* Use the new cache in the quantized phi3 model.
2024-05-18 15:58:18 +02:00
1b98f84a2b Fast kernels for rotary embeddings. (#1928)
* Fast kernels for rotary embeddings.

* Add a test for the fast CPU kernel.

* Rope cuda bindings.

* Cuda kernel.

* Metal kernel (part 1).

* Cuda kernels.

* Finish the metal kernel.

* Use the new kernels in the quantized example.

* Fix warning.
2024-03-24 22:48:52 +01:00
c753f72c85 Support for attention bias in gemma + refactor things a bit. (#1744)
* Support for attention bias in gemma + refactor things a bit.

* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
41416d2376 Expose more conv1d functions/structs. (#1726) 2024-02-17 18:50:55 +01:00
41614b4a9b Add one-hot/cold encoding (#1489)
* add one-hot encoding

* one_hot: improve error handling, use generic to_vecN::<D>

Bails if the index value is equal to or greater than the depth value,
which would result in an out-of-bounds error.

A redundant check is added to ensure the index value does not exceed
the length of the one-hot matrix size, which would also result in an
out-of-bounds error.

Bails if the index value is less than -1. If the index value is -1,
then it ignores the setting of the on_value for the index value. Only
values that are less than -1 are considered errors.

* one-hot: use two generics, one_hot::<I, O>, for input and output data types

Separating the input and output data types allows the input tensor
indices to be a different data type than the output encoded tensor data type.

For example, one_hot::<i64, u8>(...) will take an input tensor of i64 values
and encode the output tensor using u8 values.

The generic I::DTYPE must match the data type of the input indices, otherwise
the method will bail.

Additionally, this method adds an `allow_f64` option to enable the input indices
data type to be f64 values. f64 values are disabled by default.

TODO: indices data type and the generic I data type are currently not compile-time
checked.

* one_hot: remove input generic, use indices dtype matching

This commit removes the to_f64() type cast and explicitly
matches the DType from the input tensor. Currently, only U8,
U32 and I64 is supported for input tensors.

The match arms on the dtype is verbose. It would be nice
to use a generic type with the WithDtype traitbound to
pass to the to_vecN method and then return an inner value.

Open to suggestions for better approaches here to reduce
the match arm verbosity.

* one_hot: use flat_map iterator over dims instead of nested for loop

This commit replaces the nested for loops with an flat map iter over
the dimensions of the input tensor.

This commit also adds a test for a rank 3 input tensor.

* one_hot: use mandatory on/off-values, remove const msgs

This commit also updates doc tests, comments and test cases.

* Small cleanups.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-01-01 11:18:40 +01:00
b5c283e86f Add the prelu layer. (#1402) 2023-12-03 16:06:09 +00:00
55bc3382cf Allow for different behavior between training and eval (#1213)
* Forward with training.

* Do not use dropout on vgg evaluation.
2023-10-29 07:53:09 +01:00
99cf13e8e2 Add the sequential layer. (#1136) 2023-10-20 16:08:50 +01:00
000fa00e31 Expose the conv2d-transpose layers. (#761) 2023-09-07 06:04:52 +01:00
7529531056 Add the optimizer trait. (#702) 2023-09-01 12:55:39 +01:00
db59816087 Add a GRU layer. (#688)
* Add a GRU layer.

* Fix the n gate computation.
2023-08-31 08:43:10 +01:00
3159982a89 Add a Dropout layer (#676)
* Add a dropout layer.

* Add an actual layer.
2023-08-30 16:19:28 +01:00
f35b9f6baa Add some recurrent neural networks (#674)
* Add the rnn module.

* More LSTM.

* Implement the RNN forward pass.

* More forward pass for LSTM.
2023-08-30 13:27:09 +01:00
2d3fcad267 Simplify usage of the pool functions. (#662)
* Simplify usage of the pool functions.

* Small tweak.

* Attempt at using apply to simplify the convnet definition.
2023-08-29 19:12:16 +01:00
e3d2786ffb Add a couple functions required for yolo. (#527) 2023-08-20 17:02:05 +01:00
d2622a8160 Move the VarMap to a separate file (#525)
* Move the var-map struct in a separate file.

* Fix some typos.
2023-08-20 14:25:07 +01:00
42e1cc8062 Add a batch normalization layer (#508)
* Add BatchNormalization.

* More batch-norm.

* Add some validation of the inputs.

* More validation.
2023-08-18 20:05:56 +01:00
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1 Add an abstract type for RmsNorm. (#499) 2023-08-18 08:52:14 +01:00
d32e8199cd Layer norm tweaks (#482)
* Add some options to make layer-norm more configurable.

* Add the rms-norm variant.

* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
d34039e352 Add a stable diffusion example (#328)
* Start adding a stable-diffusion example.

* Proper computation of the causal mask.

* Add the chunk operation.

* Work in progress: port the attention module.

* Add some dummy modules for conv2d and group-norm, get the attention module to compile.

* Re-enable the 2d convolution.

* Add the embeddings module.

* Add the resnet module.

* Add the unet blocks.

* Add the unet.

* And add the variational auto-encoder.

* Use the pad function from utils.
2023-08-06 17:49:43 +01:00
620f83cf66 Add the candle-datasets crate (#322)
* Move the vision datasets to a separate crate.

* Move the batcher bits.

* Update the readme.

* Move the tiny-stories bits.

---------

Co-authored-by: Jane Doe <jane.doe@example.org>
2023-08-05 08:56:50 +01:00
0902846f25 Add the AdamW optimizer. (#307)
* Add the AdamW optimizer.

* Add some AdamW test validated against PyTorch.
2023-08-02 14:03:49 +01:00
ff876c2103 Llama more training (#297)
* Rework the var-builder to handle initializations.

* Add some helper functions for layer creation.

* Improve the layer initializations.

* Get initialized variables.

* Precompute the rot embeddings when training lamas.
2023-08-01 19:53:41 +01:00
e1e8127f15 Add the batcher. (#293) 2023-08-01 09:16:10 +01:00
16c33383eb Improve the mnist training example. (#276)
* Improve the mnist training example.

* Add some initialization routine that can be used for nn.

* Proper initialization in the mnist example.
2023-07-29 16:28:22 +01:00
1f26042693 Move some shared functions to the nn module. (#221) 2023-07-22 13:25:11 +01:00
2a74019ec6 Vision dataset (#179)
* Add some readers for the mnist dataset.

* Import the cifar and mnist dataset.
2023-07-16 23:43:55 +01:00
ded93a1169 Add the SGD optimizer (#160)
* Add the nn::optim and some conversion traits.

* Add the backward_step function for SGD.

* Get the SGD optimizer to work and add a test.

* Make the test slighly simpler.
2023-07-13 19:05:44 +01:00
b31a3bbdcb Sketch the tensor initialization module. (#134) 2023-07-11 07:41:46 +01:00
1aa7fbbc33 Move the var-builder in a central place. (#130) 2023-07-10 20:49:50 +01:00
89a5b602a6 Move the conv1d layer to candle_nn. (#117) 2023-07-10 11:02:06 +01:00
b06e1a7e54 [nn] Move the Embedding and Activation parts. (#116)
* Share the Embedding and Activation parts.

* Tweak some activations.
2023-07-10 10:24:52 +01:00
9ce0f1c010 Sketch the candle-nn crate. (#115)
* Sketch the candle-nn crate.

* Tweak the cuda dependencies.

* More cuda tweaks.
2023-07-10 08:50:09 +01:00