Commit Graph

12 Commits

Author SHA1 Message Date
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
13401df4d1 Add an abstract type for RmsNorm. (#499) 2023-08-18 08:52:14 +01:00
d32e8199cd Layer norm tweaks (#482)
* Add some options to make layer-norm more configurable.

* Add the rms-norm variant.

* Replace the RmsNorm with the shared bits.
2023-08-17 10:07:13 +01:00
ff876c2103 Llama more training (#297)
* Rework the var-builder to handle initializations.

* Add some helper functions for layer creation.

* Improve the layer initializations.

* Get initialized variables.

* Precompute the rot embeddings when training lamas.
2023-08-01 19:53:41 +01:00
a27239f3d9 Add training for the llama2.c example (#296)
* Rework the commands and run inference by default.

* Add the training module and load the training dataset.

* Random dataset iterator.

* Proper valid-loss computation.

* Compute the evaluation loss.

* Add more substance to the training loop.
2023-08-01 17:23:07 +01:00
75e0448114 Move the weight bits in a separate module. (#295) 2023-08-01 10:37:06 +01:00
9ae1f6afee Add an eval mode to llama2-c (#288)
* Add an eval mode to llama2-c.

* Encode line by line.

* Get the eval to run.
2023-07-31 17:22:14 +01:00
b3ea96b62b Add a prompt and support more models in llama2-c. (#285)
* Support more models in llama2-c.

* Add a prompt.
2023-07-31 13:09:30 +01:00
4bf2ebf836 Use u8 tensors for masks. (#273) 2023-07-29 11:32:58 +01:00
3eb2bc6d07 Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
550a13a547 Use the binary decoder for llama2.c. (#230)
* Use the binary decoder for llama2.c.

* Add the temperature.

* Formatting tweak.

* Fix the rotary embeddings.
2023-07-24 10:56:08 +01:00
35b65fed88 Add llama2.c as an example. (#229)
* Start adding llama2.c.

* Model loading.

* Add the llama-v2 model.

* Start converting the weights.

* Rotary embedding tweaks.

* Get the model to generate some tokens.
2023-07-24 09:13:50 +01:00