26540641c1
Renamed all kernel names.
2023-12-15 11:24:47 +01:00
77197379cc
More cleanup.
2023-12-15 11:17:05 +01:00
243e83f2b9
Adding a bunch of docs !
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-12-15 11:03:05 +01:00
40c3e1bd5a
cleanup.
2023-12-15 01:41:14 +01:00
ece4c69a68
Fixing softmax.
2023-12-15 01:35:08 +01:00
4eeaf205d6
Fix softmax for long sequences (missing barrier).
2023-12-14 19:37:03 +01:00
361f2ad2af
Working with merging encoders and using fences.
2023-12-14 16:05:33 +01:00
931432ed55
Fixing tests + matmul from MFA
2023-12-13 16:58:36 +01:00
0404a3eb5b
Removed MPSMatrix entirely (buggy).
2023-12-13 16:21:48 +01:00
a9d0657432
Better version ?
2023-12-13 12:09:20 +01:00
4cb443d00a
Fix the logsumexp test. ( #1426 )
2023-12-12 10:56:11 -06:00
87dc559817
Lots of updates including some stack of command buffers.
2023-12-12 17:41:56 +01:00
77252ffb82
Add logsumexp function ( #1424 )
2023-12-12 10:32:17 -06:00
18eb87f25f
Upsample grad ( #1420 )
...
* encode size of upsample in enum
* working convolution method for limited 2d kernels
* add test for sf 3 interpolation
* add higher dimensional tests, fix to work with multichannel input
* Remove commented out line.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-12-10 08:43:24 +01:00
4349ff1fc2
Starting to fix some tests.
...
Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e988
.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
e2eb6590ed
Merge pull request #1323 from huggingface/metal3
...
Adding the test scaffolding.
2023-11-27 13:06:01 +01:00
481c45d78d
Add a basic implementation for slice-assign. ( #1377 )
2023-11-26 17:31:22 +00:00
14a2bdc062
Small tweak: remove the macro usage for the range indexing trait. ( #1376 )
2023-11-26 16:30:59 +00:00
bfa7c8fc01
Implement the module trait directly for QMatMul. ( #1372 )
2023-11-25 10:09:45 +00:00
1edc3ddf24
Allowing feature metal to compile.
2023-11-20 20:17:16 +01:00
8d6c6de8e0
Missing new test.
2023-11-20 14:38:35 +01:00
7ec345c2eb
Adding the test scaffolding.
2023-11-20 14:38:35 +01:00
671fc29b36
Fmt.
2023-11-20 14:38:20 +01:00
c66e5d4716
Fix comments.
2023-11-20 14:13:44 +01:00
2813fb5dbc
Cleanup fixed a few ops removed debugging scaffolding.
2023-11-20 14:12:57 +01:00
7cfffcac10
Debugging rope.
2023-11-20 14:12:57 +01:00
38de52bc4b
Fixed matmul (display still broken without casting back to CPU first? )
2023-11-20 14:12:57 +01:00
d46670f7c0
Tmp state.
2023-11-20 14:12:57 +01:00
f82bf2d915
Adding indexing.
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-11-20 14:12:57 +01:00
df6814f34e
Refactor to simplify our lives for settings the params in the encoder.
2023-11-20 14:12:57 +01:00
39406a6721
Adding the actual backend
2023-11-20 14:12:56 +01:00
976ad9f9c2
Remove tracing.
2023-11-20 14:12:29 +01:00
a4c4a56429
Metal part 1 - Scaffolding for metal.
2023-11-20 14:12:05 +01:00
c6763e3b41
Add a simple implementation of cumsum. ( #1334 )
...
* Add a simple implementation of cumsum.
* Add another test.
2023-11-15 21:11:15 +00:00
347e31c9ff
Add the tril/triu/eye ops. ( #1333 )
...
* Add tril/triu/eye.
* Revert the metal crate tweak.
2023-11-15 20:34:37 +00:00
a209ce8ceb
Update for 0.3.1. ( #1324 )
2023-11-11 18:48:52 +00:00
9e666d4229
Add the var method. ( #1315 )
...
* Add the var method.
* Add a test.
2023-11-10 22:47:57 +01:00
26c4e5bf1d
Metal part 1 - Scaffolding for metal. ( #1308 )
...
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
2023-11-10 08:35:48 +01:00
18d30005c5
Add support to UL2 model family ( #1300 )
...
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-09 18:55:09 +01:00
a773a4b22b
[ONNX] Support a couple more ops. ( #1284 )
...
* Support the shape op in ONNX.
* Share the axis normalization bits.
* Add some limited support for gather.
* Unsqueeze.
* Comparison with broadcasting.
* Add Not + handle i32.
2023-11-06 22:44:58 +01:00
60fdab4e17
Detach all grads during backprop. ( #1243 )
...
* Detach all grads during backprop.
* Add an environment variable to select the backprop behavior.
* Update the comment.
2023-11-05 14:07:41 +01:00
7051fb8098
feat: add backprop for elu ( #1269 )
...
* feat: add backprop for elu
* Cosmetic tweaks.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-11-04 21:26:41 +01:00
6fa3151820
Allow using gguf-v3 files. ( #1262 )
2023-11-03 23:07:53 +01:00
3173b1ce3b
feat: impl backprop for erf and gelu-erf ( #1258 )
...
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
1cfc5d6d0c
Backprop support for conv1d (cpu only for now). ( #1255 )
2023-11-03 14:23:53 +01:00
b07b2350b6
Test for the transposed conv1d. ( #1254 )
2023-11-03 13:10:28 +01:00
be4555c5a5
Add the conv-transpose1d op. ( #1251 )
...
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
2023-11-03 09:44:46 +01:00
fbd69f952c
Lazy detach. ( #1242 )
2023-11-02 07:33:48 +00:00
36fb84f038
Add a hack for generating random uniform/normal for f16/bf16. ( #1228 )
2023-10-31 20:27:59 +00:00
c05c0a8213
PyO3: Add equal
and __richcmp__
to candle.Tensor
( #1099 )
...
* add `equal` to tensor
* add `__richcmp__` support for tensors and scalars
* typo
* more typos
* Add `abs` + `candle.testing`
* remove duplicated `broadcast_shape_binary_op`
* `candle.i16` => `candle.i64`
* `tensor.nelements` -> `tensor.nelement`
* Cleanup `abs`
2023-10-30 15:17:28 +00:00