7051fb8098
feat: add backprop for elu ( #1269 )
...
* feat: add backprop for elu
* Cosmetic tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-04 21:26:41 +01:00
6fa3151820
Allow using gguf-v3 files. ( #1262 )
2023-11-03 23:07:53 +01:00
3173b1ce3b
feat: impl backprop for erf and gelu-erf ( #1258 )
...
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
1cfc5d6d0c
Backprop support for conv1d (cpu only for now). ( #1255 )
2023-11-03 14:23:53 +01:00
b07b2350b6
Test for the transposed conv1d. ( #1254 )
2023-11-03 13:10:28 +01:00
be4555c5a5
Add the conv-transpose1d op. ( #1251 )
...
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
2023-11-03 09:44:46 +01:00
fbd69f952c
Lazy detach. ( #1242 )
2023-11-02 07:33:48 +00:00
36fb84f038
Add a hack for generating random uniform/normal for f16/bf16. ( #1228 )
2023-10-31 20:27:59 +00:00
c05c0a8213
PyO3: Add equal
and __richcmp__
to candle.Tensor
( #1099 )
...
* add `equal` to tensor
* add `__richcmp__` support for tensors and scalars
* typo
* more typos
* Add `abs` + `candle.testing`
* remove duplicated `broadcast_shape_binary_op`
* `candle.i16` => `candle.i64`
* `tensor.nelements` -> `tensor.nelement`
* Cleanup `abs`
2023-10-30 15:17:28 +00:00
5fc66bd4ba
Support negative steps in arange. ( #1218 )
2023-10-30 07:40:54 +00:00
154c674a79
Add i64-abs. ( #1216 )
2023-10-29 15:28:53 +00:00
7bbde55c61
Marian MT model ( #1210 )
...
* Skeleton files for the marian MT model.
* Marian initialization.
* Implement the attention forward method.
* Forward pass for the encoder side.
* Expose the encoder and decoder.
* Start plugging the decoder.
* Forward pass for the decoder layer.
* Set up the marian example.
* Add some missing backtraces.
* Bugfix.
2023-10-29 15:12:22 +00:00
46d6566c99
Fix the conv2d gradient computation. ( #1214 )
2023-10-29 09:50:04 +00:00
55bc3382cf
Allow for different behavior between training and eval ( #1213 )
...
* Forward with training.
* Do not use dropout on vgg evaluation.
2023-10-29 07:53:09 +01:00
ef33df7ae2
No need for the even constraint on vecdot-q40-q80. ( #1202 )
2023-10-28 07:23:59 +01:00
e2826e70b3
Add a quantized variant of llama2.c ( #1197 )
...
* Add a quantized variant of llama2.c
* Clippy fixes.
2023-10-27 15:34:06 +01:00
9b1158b315
Add some missing backtraces. ( #1193 )
2023-10-27 06:09:11 +01:00
c698e17619
Enable the test for meshgrid + fix the implementation. ( #1175 )
2023-10-25 13:47:54 +01:00
e4c9adfdbe
Implemented meshgrid ( #1174 )
...
* Implemented meshgrid
* Resolved feedback from LaurentMazare
* Rustfmt
* Updated docstring
* Removed outdated error mode from docstring
2023-10-25 12:49:11 +01:00
45dbe541bc
fix ucopy for f64
tensors ( #1170 )
2023-10-24 17:06:03 +01:00
807e3f9f52
derivative for GELU ( #1160 )
...
* derivative for GELU
* add tests
2023-10-23 20:23:45 +01:00
8a82d623e5
Handle LongStorage in pytorch checkpoints. ( #1152 )
2023-10-22 18:34:36 +01:00
62fc965617
Expose the track-op method. ( #1148 )
2023-10-22 06:57:03 +01:00
e8f760ee44
Add get_on_dim. ( #1142 )
2023-10-21 15:01:38 +01:00
87eb1658e1
Add pad_with_same. ( #1127 )
...
* More model cloning.
* More cloning on quantized models.
* Add pad-with-same.
* Add some tests.
2023-10-18 23:13:37 +01:00
662c186fd5
Better error message when overflowing in narrow. ( #1119 )
2023-10-18 08:40:14 +01:00
6c588c4792
Refactor the pth tensor exctraction. ( #1109 )
2023-10-16 18:16:34 +01:00
0106b0b04c
Read all the tensors in a PyTorch pth file. ( #1106 )
2023-10-16 13:50:07 +01:00
b73c35cc57
Improve the reshape error messages. ( #1096 )
...
* Improve the reshape error messages.
* Add the verbose-prompt flag to the phi example.
2023-10-15 10:43:10 +01:00
8f310cc666
Avoid trying to backprop through non-differentiable layers. ( #1094 )
2023-10-14 22:03:41 +01:00
9309cfc47d
Create a new curand instead of reseeding. ( #1089 )
2023-10-14 10:03:59 +01:00
7473c4ceca
Fix the npy read function and add some testing. ( #1080 )
2023-10-12 15:25:05 +02:00
37dbbff261
Use full tensors for zeros and ones ( #1071 )
...
* Only optimize float tensors.
* Use full tensors for zeros and ones.
2023-10-11 08:16:04 +01:00
9fea56d28e
Only optimize float tensors. ( #1069 )
2023-10-10 09:05:41 +01:00
b34d7f0248
Remove some unusued bits. ( #1067 )
2023-10-09 19:49:57 +01:00
9abeddd750
Make the cuda rng seedable. ( #1056 )
2023-10-08 09:32:36 +01:00
aa53368aeb
Better control on the optional dequantization in QMatMul ( #1049 )
...
* Cosmetic change to the quantized whisper model.
* Fix the dequantization.
* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
7f7d95e2c3
Add the round-to function. ( #1039 )
2023-10-05 20:28:09 +01:00
8f7973958c
fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 ( #1037 )
...
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0
* cargo fmt
2023-10-05 18:46:13 +01:00
c18a856e76
Add the rounding operators. ( #1030 )
...
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
2023-10-04 17:58:44 +01:00
11d3687cc6
Simd128 optimized q8k vecdot. ( #1026 )
2023-10-03 15:29:48 +01:00
dac73edb34
AVX optimized q8k vecdot. ( #1024 )
2023-10-03 12:10:58 +01:00
043cc25766
Fix for the index-select cuda setup. ( #1022 )
...
* Fix for index-select.
* Better fix + add some testing.
2023-10-03 10:21:46 +01:00
7670fe7d1f
neon optimized q8k multiplication. ( #1021 )
...
* neon optimized q8k multiplication.
* Bugfixes.
* simdification.
2023-10-02 23:26:34 +01:00
cddfc3944c
Add the q8k vec-dot multiplication. ( #1019 )
2023-10-02 21:53:34 +01:00
089fc3b584
Improve the quantized whisper setup. ( #1018 )
...
* Improve the quantized whisper setup.
* Fix the config file paths.
* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
263a172202
Improve the testing of the optimized quantized vec-dot ops ( #1016 )
...
* Expose the unopt functions for testing.
* Better testing of the optimized quantized computations.
2023-10-02 09:50:43 +01:00
5130a7da32
Simd128 version of q6k vec-dot. ( #1015 )
...
* Add a specific function for the simd128 q6k vec-dot.
* Simdification.
* More simdification.
2023-10-01 19:44:12 +01:00
096dee7073
Bump the version to 0.3.0. ( #1014 )
...
* Bump the version to 0.3.0.
* Changelog update.
2023-10-01 13:51:57 +01:00
4e55aaa51f
Simd128 version of the q2k-q8k vecdot product. ( #1011 )
...
* Sketch the simd128 version of q2k vecdot.
* Use a single accumulator.
* Simdify the q2k-q8k vecdot product.
* Cosmetic change.
2023-09-30 20:12:41 +01:00