Commit Graph

30 Commits

Author SHA1 Message Date
e635f18eda Initial support for reading ggml files. (#311)
* Start adding support for reading ggml files.

* Compute the proper tensor size.

* Print the read tensors.

* Fix file reading.
2023-08-02 21:59:02 +01:00
0902846f25 Add the AdamW optimizer. (#307)
* Add the AdamW optimizer.

* Add some AdamW test validated against PyTorch.
2023-08-02 14:03:49 +01:00
51e51da896 Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
a27239f3d9 Add training for the llama2.c example (#296)
* Rework the commands and run inference by default.

* Add the training module and load the training dataset.

* Random dataset iterator.

* Proper valid-loss computation.

* Compute the evaluation loss.

* Add more substance to the training loop.
2023-08-01 17:23:07 +01:00
6475bfadfe Simplify Tensor::randn. (#255)
* Simplify Tensor::randn.

* Also switch Tensor::rand to use a generic dtype.

* Support sampling for f16.

* Cleanup.
2023-07-27 07:40:36 +01:00
d9f9c859af Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.

* More flash attn.

* Set up the flash attn parameters.

* Get things to compile locally.

* Move the flash attention files in a different directory.

* Build the static C library with nvcc.

* Add more flash attention.

* Update the build part.

* Better caching.

* Exclude flash attention from the default workspace.

* Put flash-attn behind a feature gate.

* Get the flash attn kernel to run.

* Move the flags to a more appropriate place.

* Enable flash attention in llama.

* Use flash attention in llama.
2023-07-26 07:48:10 +01:00
23827c49cd Cleanup some todos. (#226)
* Cleanup some todos.

* Fix more todo.

* Optimize for the contiguous case.

* Add the IntDType trait.

* Handle the intdtype trait for more ops.

* Remove a todo.

* Remove a todo.
2023-07-23 16:00:00 +01:00
a6bcdfb269 Custom ops with a single argument (#214)
* Add the CustomOp1 trait.

* Add an example of custom op.

* Polish the custom op example.

* Add some backward pass test for custom ops.
2023-07-21 15:18:05 +01:00
4845d5cc64 More realistic training setup. (#210)
* More realistic training setup.

* Compute the model accuracy.

* Very inefficient backprop for index select.

* More backprop.

* Fix some backprop issues.

* Backprop fix.

* Another broadcasting backprop fix.

* Better backprop for reducing ops.

* Training again.

* Add some gradient tests.

* Get the training to work.
2023-07-20 18:25:41 +01:00
acb2f90469 Broadcasting performance optimization (cpu) (#182)
* Avoid recomputing the index from scratch each time.

* More performance optimisations.
2023-07-17 13:41:09 +01:00
18ea92d83b Iteration over strided blocks (#175)
* Introduce the strided blocks.

* Use the strided blocks to fasten the copy.

* Add more testing.
2023-07-15 21:30:35 +01:00
ded93a1169 Add the SGD optimizer (#160)
* Add the nn::optim and some conversion traits.

* Add the backward_step function for SGD.

* Get the SGD optimizer to work and add a test.

* Make the test slighly simpler.
2023-07-13 19:05:44 +01:00
5ee3c95582 Move the variable creation to the variable module. (#159)
* Move the variable creation to the variable module.

* Make it possible to set a variable.

* Add some basic gradient descent test.

* Get the gradient descent test to work.
2023-07-13 16:55:40 +01:00
6991036bc5 Introduce the variables api used for adjusting parameters during the training loop. (#158)
* Add the variable api.

* And add a comment.
2023-07-13 14:09:51 +01:00
20599172ac Add from_iter and arange, use it in the doctests. (#145) 2023-07-12 12:03:01 +01:00
fa760759e5 Allow for lazy loading of npz files, use it in llama to reduce memory usage in the cpu version. (#141) 2023-07-11 20:22:34 +01:00
64264d97c1 Modular backends (#138)
* Add some trait to formalize backends.

* Use the generic backend trait.
2023-07-11 11:17:02 +01:00
fba07d6b6b Merge pull request #127 from LaurentMazare/tensor_indexing
`i(..)` indexing sugar (partial).
2023-07-10 19:56:34 +02:00
ef0375d8bc i(..) indexing sugar (partial).
- Only range, and select (no tensor_select)
- No negative indexing
2023-07-10 17:34:04 +02:00
e2807c78a4 Enable the doctests to run with mkl (though they are broken for now). (#126) 2023-07-10 16:27:46 +01:00
548b1df7ea Remove the dependency to blas and use mkl directly. (#125) 2023-07-10 15:52:03 +01:00
868743b8b9 Expanding a bit the README 2023-07-10 12:51:37 +02:00
2c3d871b2e Add a simpler way to specify the dim index for some ops. 2023-07-05 20:22:43 +01:00
a424d95473 Add more of the conv1d op. 2023-07-04 11:15:45 +01:00
cf2789fb81 Move some safetensors bits in the candle-core crate. 2023-07-03 08:37:46 +01:00
c1bbbf94f6 Start refactoring the stride. 2023-06-28 12:57:30 +01:00
8c81a70170 PyTorch like display implementation. 2023-06-27 21:16:35 +01:00
1d504cc6b3 Rework the debug trait. 2023-06-27 19:10:30 +01:00
ca6aa8ff12 Use num-cpus to enable parallelism. 2023-06-27 14:42:26 +01:00
d7f729fb8f Refactor the hierarchy. 2023-06-27 11:57:27 +02:00