f7980e07e0
Add ggufv2
support ( #725 )
2023-09-03 14:41:57 +01:00
74a82c358a
Add the mse loss. ( #723 )
2023-09-03 10:51:40 +01:00
84d003ff53
Handle arbitrary shapes in Tensor::new. ( #718 )
2023-09-02 19:59:21 +01:00
2ed78ab336
Support for quantized tensors in the python api. ( #706 )
...
* Add more pyo3 support.
* Add some support for quantized tensors in pyo3.
* Add an arc layer on qmatmul.
* Add the quantized matmul.
* Quantization support.
* More quantization support.
* Test the python quantization.
2023-09-01 15:53:42 +01:00
237323c2bc
Cleanup the pyo3 setup. ( #705 )
2023-09-01 14:26:18 +01:00
30a4b593d7
More ops again. ( #697 )
2023-08-31 22:28:48 +01:00
949f1eae6f
Implement a couple more binary ops. ( #693 )
2023-08-31 21:30:15 +01:00
9874d843f1
Fix the accelerate build ( #678 )
...
* Cosmetic changes.
* Fix the accelerate build for tanh.
2023-08-30 18:31:14 +02:00
ad8a62dbf5
Add tanh. ( #675 )
...
* Add tanh.
* Use tanh in the lstm block.
* Add a test for tanh forward and backward passes.
2023-08-30 13:54:50 +01:00
618f4e4c78
Add some documentation. ( #673 )
...
* Add some documentation.
* Bump the crate version.
2023-08-30 11:54:00 +01:00
393690387f
Support dilation in conv-transpose2d. ( #671 )
2023-08-30 09:22:00 +01:00
9b25113393
Small cleanups (avoid some possible mutations) ( #670 )
...
* More mut cleanup.
* Factor out some common bits.
2023-08-30 08:54:00 +01:00
a1a5ab8b0a
Neon optimized vecdot ( #666 )
...
* Q5k vecdot.
* Add the q3k vecdot.
* Q2k vecdot.
* Move the quantized model to its own file.
2023-08-29 22:28:46 +01:00
59b731de99
Add the powf op. ( #664 )
...
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
2023-08-29 20:48:18 +01:00
2d3fcad267
Simplify usage of the pool functions. ( #662 )
...
* Simplify usage of the pool functions.
* Small tweak.
* Attempt at using apply to simplify the convnet definition.
2023-08-29 19:12:16 +01:00
71221559d3
Fix the dilated convolutions. ( #659 )
2023-08-29 16:37:42 +01:00
a044907ffc
Dilated convolutions ( #657 )
...
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
2023-08-29 16:12:11 +01:00
ee8bb1bde1
Add avx
implemenetations of q2k
, q3k
and q5k
vec-dot functions ( #654 )
...
* `q2k` avx implementation
* `q3k` avx implementation
* `q5k` avx implementation
* `avx` make masks constant
* clippy stuff
2023-08-29 13:35:56 +01:00
d0a330448d
Backprop support for pooling ops. ( #652 )
...
* Backprop support for pooling ops.
* max-pool gradient.
2023-08-29 10:17:59 +01:00
4b8d57ba15
AVX version of the q4k vecdot. ( #651 )
2023-08-29 09:41:17 +01:00
fd3131a4ce
Fix the debug implementation. ( #648 )
2023-08-28 22:51:39 +01:00
037b41c9dc
Cuda conv transpose ( #645 )
...
* Cuda kernel for conv-transpose.
* Fix the cuda kernel.
* Fix the tests.
2023-08-28 20:58:49 +01:00
72fae3140c
Optimize the conv2d transpose cpu kernel. ( #644 )
...
* Optimize the conv2d transpose cpu kernel.
* Use multiple cores.
2023-08-28 20:06:31 +01:00
ca26198b95
Fix the cpu kernel for conv-transpose. ( #643 )
2023-08-28 16:45:12 +01:00
b292047882
Backprop for conv2d. ( #638 )
...
* Start adding backprop for conv2d.
* Backprop for conv2d.
* Bugfix + start adding a conv2d test.
* Conv2d backprop testing.
* More conv fixes.
2023-08-28 16:08:55 +01:00
3cca89cc70
Add conv-transpose. ( #635 )
...
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
2023-08-28 10:10:12 +01:00
1da71a5da1
Neon optimized version of the q4k vecdot product. ( #632 )
2023-08-27 21:30:47 +01:00
a3f97c143d
Bump the crate version + update CHANGELOG. ( #628 )
2023-08-27 18:17:11 +01:00
be471d50ab
Llama quantization. ( #625 )
2023-08-27 14:08:15 +01:00
7151f2cf63
Add the quantize command. ( #624 )
...
* Add the quantize command.
* Bugfix for writing gguf files.
* And add a comment.
2023-08-27 11:35:19 +01:00
5320aa6b7d
Move the test-utils bits to a shared place. ( #619 )
2023-08-27 09:42:22 +01:00
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
28658054ff
More missing quantized bits. ( #615 )
...
* Q4_1 support.
* Add Q5_1 quantization.
* Tweak.
2023-08-27 07:52:26 +01:00
ab36a7f3e3
Fix for when f16c is not available. ( #614 )
2023-08-27 07:19:52 +01:00
f704e39761
Missing quants ops ( #611 )
...
* Another transmute tweak.
* Changelog tweak.
* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
fdf15f0e05
Another transmute tweak. ( #610 )
...
* Another transmute tweak.
* Changelog tweak.
2023-08-26 13:00:24 +01:00
06b37ea7ad
Avoid using tmp values. ( #609 )
2023-08-26 12:28:28 +01:00
c72eb3d75b
Add reference implementation for q4k
and q5k
( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
71518caeee
Align tensor device print more with PyTorch ( #590 )
...
* Improve tensor print
* Use CudaDevice only if enabled with cuda feature
* run rust fmt
* up
* improve
* rustfmt
2023-08-26 11:20:22 +01:00
6559eae72c
Avoid some transmutes. ( #607 )
2023-08-25 18:21:37 +01:00
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
afc10a3232
AVX version for the q8-0 multiplications. ( #598 )
2023-08-25 10:14:49 +01:00
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
d8ba0452dc
Fail on bf16. ( #594 )
2023-08-25 06:10:38 +01:00
2cde0cb74b
More pickle support. ( #588 )
...
* More pickle support.
* Be more verbose.
2023-08-24 18:45:10 +01:00
e21c686cdc
Fixes for clippy 1.72. ( #587 )
2023-08-24 17:46:17 +01:00
c265ac50fa
Add a function to write gguf files. ( #585 )
...
* Add a function to write gguf files.
* More GGUF file writing.
* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
afd965f77c
More non square testing ( #582 )
...
* Add more non square testing.
* More testing.
2023-08-24 13:01:04 +01:00
d2f42ab086
Referenze implementations of q2k
and q3k
vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00