* Add a slice_set op.
* Add some testing.
* Add the dedicated kv-cache module.
* Derive debug and clone.
* Expose more kv-cache functions.
* Return the current data when appending.
* Use the new cache in the quantized phi3 model.
* Fast kernels for rotary embeddings.
* Add a test for the fast CPU kernel.
* Rope cuda bindings.
* Cuda kernel.
* Metal kernel (part 1).
* Cuda kernels.
* Finish the metal kernel.
* Use the new kernels in the quantized example.
* Fix warning.
* add one-hot encoding
* one_hot: improve error handling, use generic to_vecN::<D>
Bails if the index value is equal to or greater than the depth value,
which would result in an out-of-bounds error.
A redundant check is added to ensure the index value does not exceed
the length of the one-hot matrix size, which would also result in an
out-of-bounds error.
Bails if the index value is less than -1. If the index value is -1,
then it ignores the setting of the on_value for the index value. Only
values that are less than -1 are considered errors.
* one-hot: use two generics, one_hot::<I, O>, for input and output data types
Separating the input and output data types allows the input tensor
indices to be a different data type than the output encoded tensor data type.
For example, one_hot::<i64, u8>(...) will take an input tensor of i64 values
and encode the output tensor using u8 values.
The generic I::DTYPE must match the data type of the input indices, otherwise
the method will bail.
Additionally, this method adds an `allow_f64` option to enable the input indices
data type to be f64 values. f64 values are disabled by default.
TODO: indices data type and the generic I data type are currently not compile-time
checked.
* one_hot: remove input generic, use indices dtype matching
This commit removes the to_f64() type cast and explicitly
matches the DType from the input tensor. Currently, only U8,
U32 and I64 is supported for input tensors.
The match arms on the dtype is verbose. It would be nice
to use a generic type with the WithDtype traitbound to
pass to the to_vecN method and then return an inner value.
Open to suggestions for better approaches here to reduce
the match arm verbosity.
* one_hot: use flat_map iterator over dims instead of nested for loop
This commit replaces the nested for loops with an flat map iter over
the dimensions of the input tensor.
This commit also adds a test for a rank 3 input tensor.
* one_hot: use mandatory on/off-values, remove const msgs
This commit also updates doc tests, comments and test cases.
* Small cleanups.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Start adding a stable-diffusion example.
* Proper computation of the causal mask.
* Add the chunk operation.
* Work in progress: port the attention module.
* Add some dummy modules for conv2d and group-norm, get the attention module to compile.
* Re-enable the 2d convolution.
* Add the embeddings module.
* Add the resnet module.
* Add the unet blocks.
* Add the unet.
* And add the variational auto-encoder.
* Use the pad function from utils.
* Move the vision datasets to a separate crate.
* Move the batcher bits.
* Update the readme.
* Move the tiny-stories bits.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
* Rework the var-builder to handle initializations.
* Add some helper functions for layer creation.
* Improve the layer initializations.
* Get initialized variables.
* Precompute the rot embeddings when training lamas.
* Add the nn::optim and some conversion traits.
* Add the backward_step function for SGD.
* Get the SGD optimizer to work and add a test.
* Make the test slighly simpler.