4289984d32
Remove some prints.
2023-11-13 14:51:40 +01:00
1471f98f0b
BF16 metal fix.
2023-11-13 14:44:20 +01:00
dd4a40f1c0
Fixes + cache compute_pipeline_state.
2023-11-13 14:33:16 +01:00
79845bd93b
Working version for llama2-c.
2023-11-13 12:36:27 +01:00
6071797450
Add erf.
2023-11-11 18:22:16 +01:00
3900091e75
All tests are panicking instead of random failure.
2023-11-11 17:43:35 +01:00
54355ff997
Adding some half kernels.
2023-11-11 17:43:35 +01:00
e02f1912bb
Reusing a single buffer (for now) to speed things up.
2023-11-11 17:43:35 +01:00
7adfb70dff
Few fixes.
2023-11-11 17:43:35 +01:00
3ad02147e4
Starting to fix some tests.
2023-11-11 17:43:34 +01:00
4f39695465
Missing new test.
2023-11-11 17:42:53 +01:00
4cf4844c9d
Adding the test scaffolding.
2023-11-11 17:27:19 +01:00
d840838e95
Cleanup fixed a few ops removed debugging scaffolding.
2023-11-11 17:18:00 +01:00
61a070fdd1
Debugging rope.
2023-11-11 17:18:00 +01:00
e35669647d
Fixed matmul (display still broken without casting back to CPU first? )
2023-11-11 17:18:00 +01:00
53e8b7ee3e
Tmp state.
2023-11-11 17:18:00 +01:00
02c2ec2c71
Adding indexing.
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-11-11 17:18:00 +01:00
9a2784b8ab
Refactor to simplify our lives for settings the params in the encoder.
2023-11-11 17:18:00 +01:00
0f652f0e3d
Adding the actual backend
2023-11-11 17:18:00 +01:00
ddee9dc1dd
Remove tracing.
2023-11-11 17:18:00 +01:00
fc9bb7784a
Metal part 1 - Scaffolding for metal.
2023-11-11 17:18:00 +01:00
9e666d4229
Add the var method. ( #1315 )
...
* Add the var method.
* Add a test.
2023-11-10 22:47:57 +01:00
26c4e5bf1d
Metal part 1 - Scaffolding for metal. ( #1308 )
...
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
2023-11-10 08:35:48 +01:00
18d30005c5
Add support to UL2 model family ( #1300 )
...
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-09 18:55:09 +01:00
a773a4b22b
[ONNX] Support a couple more ops. ( #1284 )
...
* Support the shape op in ONNX.
* Share the axis normalization bits.
* Add some limited support for gather.
* Unsqueeze.
* Comparison with broadcasting.
* Add Not + handle i32.
2023-11-06 22:44:58 +01:00
60fdab4e17
Detach all grads during backprop. ( #1243 )
...
* Detach all grads during backprop.
* Add an environment variable to select the backprop behavior.
* Update the comment.
2023-11-05 14:07:41 +01:00
7051fb8098
feat: add backprop for elu ( #1269 )
...
* feat: add backprop for elu
* Cosmetic tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-04 21:26:41 +01:00
6fa3151820
Allow using gguf-v3 files. ( #1262 )
2023-11-03 23:07:53 +01:00
3173b1ce3b
feat: impl backprop for erf and gelu-erf ( #1258 )
...
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
1cfc5d6d0c
Backprop support for conv1d (cpu only for now). ( #1255 )
2023-11-03 14:23:53 +01:00
b07b2350b6
Test for the transposed conv1d. ( #1254 )
2023-11-03 13:10:28 +01:00
be4555c5a5
Add the conv-transpose1d op. ( #1251 )
...
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
2023-11-03 09:44:46 +01:00
fbd69f952c
Lazy detach. ( #1242 )
2023-11-02 07:33:48 +00:00
36fb84f038
Add a hack for generating random uniform/normal for f16/bf16. ( #1228 )
2023-10-31 20:27:59 +00:00
c05c0a8213
PyO3: Add equal
and __richcmp__
to candle.Tensor
( #1099 )
...
* add `equal` to tensor
* add `__richcmp__` support for tensors and scalars
* typo
* more typos
* Add `abs` + `candle.testing`
* remove duplicated `broadcast_shape_binary_op`
* `candle.i16` => `candle.i64`
* `tensor.nelements` -> `tensor.nelement`
* Cleanup `abs`
2023-10-30 15:17:28 +00:00
5fc66bd4ba
Support negative steps in arange. ( #1218 )
2023-10-30 07:40:54 +00:00
154c674a79
Add i64-abs. ( #1216 )
2023-10-29 15:28:53 +00:00
7bbde55c61
Marian MT model ( #1210 )
...
* Skeleton files for the marian MT model.
* Marian initialization.
* Implement the attention forward method.
* Forward pass for the encoder side.
* Expose the encoder and decoder.
* Start plugging the decoder.
* Forward pass for the decoder layer.
* Set up the marian example.
* Add some missing backtraces.
* Bugfix.
2023-10-29 15:12:22 +00:00
46d6566c99
Fix the conv2d gradient computation. ( #1214 )
2023-10-29 09:50:04 +00:00
55bc3382cf
Allow for different behavior between training and eval ( #1213 )
...
* Forward with training.
* Do not use dropout on vgg evaluation.
2023-10-29 07:53:09 +01:00
ef33df7ae2
No need for the even constraint on vecdot-q40-q80. ( #1202 )
2023-10-28 07:23:59 +01:00
e2826e70b3
Add a quantized variant of llama2.c ( #1197 )
...
* Add a quantized variant of llama2.c
* Clippy fixes.
2023-10-27 15:34:06 +01:00
9b1158b315
Add some missing backtraces. ( #1193 )
2023-10-27 06:09:11 +01:00
c698e17619
Enable the test for meshgrid + fix the implementation. ( #1175 )
2023-10-25 13:47:54 +01:00
e4c9adfdbe
Implemented meshgrid ( #1174 )
...
* Implemented meshgrid
* Resolved feedback from LaurentMazare
* Rustfmt
* Updated docstring
* Removed outdated error mode from docstring
2023-10-25 12:49:11 +01:00
45dbe541bc
fix ucopy for f64
tensors ( #1170 )
2023-10-24 17:06:03 +01:00
807e3f9f52
derivative for GELU ( #1160 )
...
* derivative for GELU
* add tests
2023-10-23 20:23:45 +01:00
8a82d623e5
Handle LongStorage in pytorch checkpoints. ( #1152 )
2023-10-22 18:34:36 +01:00
62fc965617
Expose the track-op method. ( #1148 )
2023-10-22 06:57:03 +01:00
e8f760ee44
Add get_on_dim. ( #1142 )
2023-10-21 15:01:38 +01:00