* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
* Refactor the reduce ops in order to introduce argmin/argmax.
* Clippy fixes.
* Use the newly introduced argmax.
* Fix the strided case.
* Handle the non-contiguous case.
* More realistic training setup.
* Compute the model accuracy.
* Very inefficient backprop for index select.
* More backprop.
* Fix some backprop issues.
* Backprop fix.
* Another broadcasting backprop fix.
* Better backprop for reducing ops.
* Training again.
* Add some gradient tests.
* Get the training to work.
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
* Vectorized binary ops with mkl.
* Improve the binary op mkl support.
* Push the support for mkl binary ops.
* Proper vectorization of binary ops.
* Proper mkl'isation when broadcasting binary ops.
* Refactor the llama example to make it more in sync with the other ones.
* Make clippy happy.
* Properly load the safetensor weights.
* Get llama back to a working state for the safetensors case.
* Skeleton files for musicgen.
* Add a musicgen model module.
* Sketch the model loading.
* Start adding the forward pass.
* More forward pass.
* Positional embeddings.
* Forward for the decoder layers.
* Add an empty function.
* Fix the musicgen weight names.
* More musicgen modeling.
* Add the T5 loading bits.
* Add the encodec config.
* Add the encodec module hierarchy.
* More Encodec modeling.
* Encodec modeling.
* Encodec modeling.
* Add more to the encodec modeling.
* Load the weights.
* Populate the resnet blocks.
* Also load the conv transpose weights.
* Split musicgen in multiple files.