Commit Graph

159 Commits

Author SHA1 Message Date
306c8eee7a AVX version of the vecdot for q4_0. (#474)
* AVX version of the vecdot for q4_0.

* Tweak the avx bits.

* Add a qmatmul benchmark.

* Fix the quantized test.
2023-08-17 07:03:32 +01:00
098909de40 Add vecdot for q6k-q8k. (#476)
* Add vecdot for q6k-q8k.

* Add some testing for q8k.

* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
575e88a999 Add a quantized test that use negative values. (#470)
* Add a quantized test that use negative values.

* Add a default tokenizer.
2023-08-16 16:32:58 +01:00
3071134788 Get the ggml based llama to generate some text. (#464)
* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.
2023-08-16 12:41:07 +01:00
965597a873 Add a test for qmatmul. (#459) 2023-08-16 06:36:27 +01:00
e68b2accb4 Split out the quantized file. (#456) 2023-08-15 20:26:27 +01:00
08effe3762 More quantization support (#455)
* Properly initialize wdata.

* Simplify the matmul bits.

* Add from_float for q4_0.

* Fix a couple bugs.

* Get the test to work.

* Get clippy to be happy.
2023-08-15 18:58:04 +01:00
c84883ecf2 Add a cuda kernel for upsampling. (#441)
* Add a cuda kernel for upsampling.

* Update for the latest tokenizers version.
2023-08-14 13:12:17 +01:00
a094dc503d Add a cuda kernel for avg-pool2d. (#440)
* Add a cuda kernel for avg-pool2d.

* Avoid running out of bounds.

* Finish wiring the avg pool kernel + add some testing.

* Support for max-pool + testing.
2023-08-14 12:32:05 +01:00
34f4b3187e Add a naive conv2d cuda kernel. (#438)
* Add a naive conv2d cuda kernel.

* Proper conv2d support on the rust side.

* Conv1d testing on gpu.

* Also use the test on gpus.

* Fix the clean-ptx target.
2023-08-14 10:34:42 +01:00
9aca398a4f More accelerate optimizations (#427)
* Add more tracing to the whisper example.

* Support accelerate in more examples.

* Use accelerate for pointwise functions.

* Use accelerate for binary operations too.

* Bugfix for binary operation: use the rhs before the lhs.
2023-08-13 12:53:34 +01:00
01ea57da8c Fix the conv tests. (#409) 2023-08-11 14:59:54 +01:00
a325c1aa50 Upsample test + bugfix. (#399) 2023-08-10 21:02:35 +02:00
c7f92f985e Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383) 2023-08-10 05:48:19 +01:00
Lei
3bbc08a8df Fix randn cpu (#382)
* Change distributions

Standard generates in [0, 1), Normal is correct.

* Add test

Not sure if this is the best place to put  the test

* Remove unnecessary use
2023-08-10 05:33:44 +01:00
a5c5a893aa add max_pool2d (#371)
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
2023-08-09 18:05:26 +01:00
cd225bd3b1 More testing for avg-pool2d. (#366)
* More testing for avg-pool2d.

* Another fix.

* Add a max-pool test with non-divisible kernel sizes.
2023-08-09 16:12:23 +01:00
b80348d22f Bugfix for avg-pool + add some test. (#365) 2023-08-09 15:44:16 +01:00
dbc6f281c9 Conv1d test with padding. (#356) 2023-08-09 05:45:38 +01:00
608b2358c6 Add some conv1d test + bugfix using padding. (#349) 2023-08-08 20:50:20 +01:00
1e6dbeac01 Add some conv2d tests. (#347)
* Add some conv2d tests.

* Add a simpler conv2d test.

* More conv2d testing + bugfix.

* Add a todo.
2023-08-08 19:02:42 +01:00
51e51da896 Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
4b3bd79fbd Remove the embedding ops in favor of index-select. (#299)
* Remove the embedding ops in favor of index-select.

* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
cc76c63202 Use index-select for the embeddings as it supports backprop. (#298) 2023-08-01 20:44:43 +01:00
c950a5c6b1 Cuda support for the mnist training. (#277)
* Cuda support for the mnist training.

* min/max fix + testing.

* Add the argmin/argmax tests.

* More cuda support for argmin/argmax.

* Cuda kernels for argmin and argmax.
2023-07-29 19:48:04 +01:00
3eb2bc6d07 Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
f291065f6c Do not panic on empty ranges. (#257) 2023-07-27 09:28:47 +01:00
944d70bd9a Add a test for scatter add. (#238)
* Add a test for scatter add (segfaults on gpus for now).

* Bugfix for the scatter add cuda kernel.
2023-07-25 09:12:14 +01:00
18cc73954a Add some testing for index-add (#237)
* Add some testing for index-add.

* Fix the cpu implementation for index-add.
2023-07-25 08:38:33 +01:00
581b104f97 Indexing cuda (#235)
* Allow using uint8_t for indexing.

* Revert the default cuda feature.

* Add a cuda-kernel for index-select.

* Add a test for gather.
2023-07-24 20:22:47 +01:00
b50f932e7c Add some cmp tests. (#233)
* Add some cmp tests.

* Add the cuda kernels for comparison operations.
2023-07-24 16:53:45 +01:00
23827c49cd Cleanup some todos. (#226)
* Cleanup some todos.

* Fix more todo.

* Optimize for the contiguous case.

* Add the IntDType trait.

* Handle the intdtype trait for more ops.

* Remove a todo.

* Remove a todo.
2023-07-23 16:00:00 +01:00
43c7223292 Rename the .r functions to .dims so as to be a bit more explicit. (#220) 2023-07-22 10:39:27 +01:00
5cc843550d Add binary and ternary custom ops. (#217) 2023-07-21 17:29:50 +01:00
4a100875bf Use a macro to handle the dtype pattern matching. (#215) 2023-07-21 16:03:51 +01:00
a6bcdfb269 Custom ops with a single argument (#214)
* Add the CustomOp1 trait.

* Add an example of custom op.

* Polish the custom op example.

* Add some backward pass test for custom ops.
2023-07-21 15:18:05 +01:00
b02229ce92 Add some epsilon tolerance to grad tests so that they work on cuda / mkl. (#213) 2023-07-21 12:45:14 +01:00
c60831aad4 Add more gradient tests + bugfixes. (#211)
* Add more gradient tests + bugfixes.

* More tests and fixes.

* More tests.
2023-07-21 06:52:39 +01:00
4845d5cc64 More realistic training setup. (#210)
* More realistic training setup.

* Compute the model accuracy.

* Very inefficient backprop for index select.

* More backprop.

* Fix some backprop issues.

* Backprop fix.

* Another broadcasting backprop fix.

* Better backprop for reducing ops.

* Training again.

* Add some gradient tests.

* Get the training to work.
2023-07-20 18:25:41 +01:00
fa08fb3126 Add the index-select op. (#209)
* Add the index-select op.

* Cpu implementation of index-select.

* Add the cpu implementation for index-select.
2023-07-20 14:01:03 +01:00
76dcc7a381 Test the broadcasting binary ops. (#196) 2023-07-19 06:18:36 +01:00
18ea92d83b Iteration over strided blocks (#175)
* Introduce the strided blocks.

* Use the strided blocks to fasten the copy.

* Add more testing.
2023-07-15 21:30:35 +01:00
a2f72edc0d Simplify the parameters used by sum and sum_keepdim. (#165) 2023-07-14 08:22:08 +01:00
2bfa791336 Use the same default as pytorch for sum. (#164) 2023-07-13 21:32:32 +01:00
23e105cd94 Add the gradient for reduce-sum. (#162)
* Add the gradient for reduce-sum.

* And add the gradient for the broadcast ops.

* Add some backprop tests.

* Add some linear regression example.
2023-07-13 20:14:10 +01:00
5ee3c95582 Move the variable creation to the variable module. (#159)
* Move the variable creation to the variable module.

* Make it possible to set a variable.

* Add some basic gradient descent test.

* Get the gradient descent test to work.
2023-07-13 16:55:40 +01:00
50b0946a2d Tensor mutability (#154)
* Working towards tensor mutability.

* Use a ref-cell to provide tensor mutability.
2023-07-13 11:04:40 +01:00
8aab787384 Test the index op + bugfix. (#148) 2023-07-12 15:42:36 +01:00
e2807c78a4 Enable the doctests to run with mkl (though they are broken for now). (#126) 2023-07-10 16:27:46 +01:00
5c3864f9f7 Add more sum tests. (#110)
* Add some tests for the sum.

* More sum testing.
2023-07-08 13:15:36 +01:00