Commit Graph

438 Commits

Author SHA1 Message Date
c265ac50fa Add a function to write gguf files. (#585)
* Add a function to write gguf files.

* More GGUF file writing.

* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
afd965f77c More non square testing (#582)
* Add more non square testing.

* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086 Referenze implementations of q2k and q3k vec-dot functions (#580)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
ca318a6ec7 Add to the cuda example a reproduction of the issue. (#579)
* Add to the cuda example a reproduction of the issue.

* Tweak.

* Add a test using non-square matrixes.

* Fix the conv2d kernel.

* Display the error.

* And tweak the comment.
2023-08-24 12:07:31 +01:00
dd64465899 Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578)
* Add a test for conv2d with padding.

* Cosmetic changes.

* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
431051cc32 Add Efficientnet (#572)
* EfficientNet.

* Complete the efficientnet implementation.

* Improve group handling.

* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
7478dda255 Cosmetic tweaks. (#570) 2023-08-23 15:45:40 +01:00
329f661d9b Trace softmax (#568)
* Trace the softmax op.

* Inline the sum.

* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
075b505480 Mirror GGML's unit tests (#569)
* Add ggml unit tests

* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00
aba1e90797 Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
9a5c7db91a Add support for i64 (#563)
* Add the i64 dtype.

* Adapt the cuda kernels.
2023-08-23 10:42:19 +01:00
508d34daf2 GGUF support in the quantized model. (#559)
* GGUF support in the quantized model.

* Get the GGUF support to work on llama.
2023-08-23 09:20:57 +01:00
0764741cc4 Handle GGUF files in tensor-tools. (#558) 2023-08-23 06:32:07 +01:00
6a30ecefad Preliminary GGUF support. (#557)
* Preliminary GGUF support.

* Tensor reading.
2023-08-23 00:14:10 +01:00
07067b01dc Avoid some mutable variables (take 2). (#554)
* Avoid some mutable variables (take 2).

* Fix.
2023-08-22 18:51:20 +01:00
ec665acad7 Revert "Avoid some mut in quantized functions. (#550)" (#552)
This reverts commit cf27b9b636.
2023-08-22 15:57:46 +01:00
cf27b9b636 Avoid some mut in quantized functions. (#550)
* Avoid a couple more 'let mut'.

* Tweaks.
2023-08-22 15:44:26 +01:00
352383cbc3 Add quantization support for q2k, q3k, q4k and q5k (#524)
* first q2 implementation

* First Q4K and Q5K implementations

* fix `q2k` and `q5k`

* Some first cleanups

* run `clippy` on tests

* finally implement `q3k`

* deactivate `q3k` test on macos

* also disable the test on linux

* Fix floating bits in `q3k` dequantization

* Refactoring pass + reorder quants in file

* `fmt`

* Re-add `src` asserts and redefine `dst`
2023-08-22 15:04:55 +01:00
d70cffdab6 Fix the minimum/maximum gradient computations. (#534) 2023-08-21 08:28:41 +01:00
8c232d706b Small tweaks to the pickle handling to be able to use libtorch files. (#530)
* Small tweaks to the pickle handling to be able to use libtorch files.

* Move the pytorch specific bits in a different function.
2023-08-20 23:25:34 +01:00
11c7e7bd67 Some fixes for yolo-v3. (#529)
* Some fixes for yolo-v3.

* Use the running stats for inference in the batch-norm layer.

* Get some proper predictions for yolo.

* Avoid the quadratic insertion.
2023-08-20 23:19:15 +01:00
a1812f934f Add a yolo-v3 example. (#528)
* Add a couple functions required for yolo.

* Add the yolo-v3 example.

* Add minimum and maximum.

* Use the newly introduced maximum.

* Cuda support for min/max + add some testing.

* Allow for more tests to work with accelerate.

* Fix a typo.
2023-08-20 18:19:37 +01:00
e3d2786ffb Add a couple functions required for yolo. (#527) 2023-08-20 17:02:05 +01:00
2fcb386f17 Add a broadcast variant to matmul. (#523)
* Add a broadcast variant to matmul.

* Get the test to pass.
2023-08-20 13:20:42 +01:00
a8f61e66cc Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
82410995a2 Neon support for quantization. (#519)
* Skeleton files for neon support of quantization.

* SIMD version for q4 vecdot.

* Also simdify the q6k multiplication.
2023-08-19 22:07:29 +01:00
551409092e Small tweaks to tensor-tools. (#517) 2023-08-19 16:50:26 +01:00
6431140250 Retrieve tensor data from PyTorch files. (#516) 2023-08-19 15:57:18 +01:00
607ffb9f1e Retrieve more information from PyTorch checkpoints. (#515)
* Retrieve more information from PyTorch checkpoints.

* Add enough support to load dino-v2 backbone weights.
2023-08-19 15:05:34 +01:00
f861a9df6e Add ggml support to tensor-tools (#512)
* Pickle work-in-progress.

* More unpickling.

* More pickling.

* Proper handling of setitems.

* Clippy.

* Again more pickling.

* Restore the example.

* Add enough pickle support to get the list of tensors.

* Read the data from zip files.

* Retrieve the tensor shape.

* Extract the size and dtype.

* More storage types.

* Improve the destructuring.

* Also support ggml files.
2023-08-19 11:45:22 +01:00
ad33715c61 Preliminary support for importing PyTorch weights. (#511)
* Pickle work-in-progress.

* More unpickling.

* More pickling.

* Proper handling of setitems.

* Clippy.

* Again more pickling.

* Restore the example.

* Add enough pickle support to get the list of tensors.

* Read the data from zip files.

* Retrieve the tensor shape.

* Extract the size and dtype.

* More storage types.

* Improve the destructuring.
2023-08-19 11:26:32 +01:00
90ff04e77e Add the tensor-tools binary. (#510) 2023-08-19 09:06:44 +01:00
cb069d6063 Add the permute op (similar to pytorch). (#504)
* Add the permute op (similar to pytorch).

* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
95462c6a2e Add a vision transformer example (dino-v2). (#502)
* Add a vision transformer example (dino-v2).

* Add some documentation + test.

* CI fix.

* Another fix (still unable to replicate the errors locally :( )
2023-08-18 11:58:06 +01:00
109e95b189 Basic qmatmul parallelization (#492)
* Basic `par_iter` parallelization

* Pass errors up

* Disable `avx` for x86 macs
2023-08-18 09:45:37 +01:00
c78ce76501 Add a simple Module trait and implement it for the various nn layers (#500)
* Start adding the module trait.

* Use the module trait.

* Implement module for qmatmul.
2023-08-18 09:38:22 +01:00
a22b1bed7b Tensor -> QTensor conversion (#496)
* Sketch some qmatmul test.

* Add the quantization function.

* More testing.

* Make the test smaller and faster.

* Add some shape checking.
2023-08-18 08:19:20 +01:00
557b2c28dd Q6K quantization (#495)
* Print the detected arch options.

* Add the q6k quantization.

* Add a currently broken test.

* Bugfix.

* Bugfix.

* Another bugfix.

* Another bugfix + get the test to work.
2023-08-17 22:22:57 +01:00
fc81af1712 AVX version of the q6k vec-dot. (#493)
* AVX version of the q6k vec-dot.

* Use the avx sum.
2023-08-17 20:13:18 +01:00
03be33eea4 Relax the requirements on CustomOp. (#486)
* Relax the requirements on CustomOp.

* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
d99cac3ec3 Move the avx specific bits to a separate file. (#481) 2023-08-17 09:01:06 +01:00
306c8eee7a AVX version of the vecdot for q4_0. (#474)
* AVX version of the vecdot for q4_0.

* Tweak the avx bits.

* Add a qmatmul benchmark.

* Fix the quantized test.
2023-08-17 07:03:32 +01:00
098909de40 Add vecdot for q6k-q8k. (#476)
* Add vecdot for q6k-q8k.

* Add some testing for q8k.

* Use QMatMul for the output layer.
2023-08-16 20:59:40 +01:00
3bedba1fce Use a zipped iterator. (#475)
* Use a zipped iterator.

* Add to/from float for q8k.
2023-08-16 20:15:11 +01:00
575e88a999 Add a quantized test that use negative values. (#470)
* Add a quantized test that use negative values.

* Add a default tokenizer.
2023-08-16 16:32:58 +01:00
a9101700b6 Add a kv-cache to the quantized llama example. (#466)
* Add a kv-cache to the quantized llama example.

* Also print the prompt.

* Bugfix in q6k dequantizing.

* Another bugfix.
2023-08-16 14:28:42 +01:00
3071134788 Get the ggml based llama to generate some text. (#464)
* Add more stats to the ggml example.

* Build a quantized model from the file content.

* Move the tensor retrieval in the main crate.

* Start adding the forward pass.

* Add more to the forward pass of the quantized llama.

* Apply the attention layers.

* Add the sampling loop.

* Get the sampling loop to work.

* Minor tweak.

* Add a quantize/dequantize test.

* Bugfix.

* Add a comment + swap the order.

* Bugfixes.
2023-08-16 12:41:07 +01:00
965597a873 Add a test for qmatmul. (#459) 2023-08-16 06:36:27 +01:00
ca449f9ee1 Add quantized tensors. (#458)
* Add quantized tensors.

* Implement the debug trait for QTensor.

* Add the QMatMul custom op.
2023-08-15 22:45:53 +01:00
b8263aa15c Quantized support for f16 and f32 (#457)
* Add f32 as a quantized type.

* Add f16 as a quantized type too.
2023-08-15 21:09:37 +01:00