13a5d15ebc
Adding upsample_nearest_2d.
2023-12-25 14:25:19 +01:00
1505d85276
Merge pull request #1461 from huggingface/metal-conv
...
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-25 12:48:09 +01:00
95e18ef675
Fixing matmul for convolutions.
2023-12-25 12:29:34 +01:00
7135791dd5
Fix the quantized mistral example. ( #1478 )
2023-12-25 09:31:24 +01:00
ba1fae590e
Validate the kernel size in pooling ops. ( #1473 )
...
* Validate the kernel size in pooling ops.
* Revert the changes to basics.
2023-12-23 11:19:22 +01:00
ceb78d3e28
Sketch the minimal mamba example. ( #1465 )
...
* Sketch the minimal mamba example.
* Fix rustfmt.
* Forward pass for mamba.
* Finish the forward pass.
* Inference fixes.
* Bugfixes.
* More fixes.
* Add a readme.
2023-12-22 00:28:50 +01:00
10d94659c3
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-21 10:39:24 +01:00
9fc210fae8
Merge pull request #1318 from huggingface/metal4
...
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
03641293ee
Clippy pass.
2023-12-18 15:22:43 +01:00
064ba17bd7
Remove print.
2023-12-18 11:04:16 +01:00
e8ee253ee0
Missing cast.
2023-12-18 11:01:18 +01:00
8bd3d6b94b
Index add.
2023-12-18 10:46:01 +01:00
6a3ca7da0c
Scatter add.
2023-12-18 10:32:22 +01:00
96f1a28e39
Add a simple full method. ( #1455 )
...
* Add a simple implementation of the full method.
* Add the docstring.
2023-12-17 20:15:57 -05:00
586b6f6fff
Adding gather op.
2023-12-17 23:34:12 +01:00
e4b0cc59f5
Adding CMP
2023-12-17 22:32:25 +01:00
0a6e0a8c9a
Implement randn (CPU-> device)
2023-12-17 19:09:08 +01:00
972903021c
Finish reduce kernels.
2023-12-17 19:07:00 +01:00
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
1e86717bf2
Fix a couple typos ( #1451 )
...
* Mixtral quantized instruct.
* Fix a couple typos.
2023-12-17 05:20:05 -06:00
6bc92e63cb
Addressing a lot of comments.
2023-12-15 13:06:04 +01:00
aa04015098
Remove unwrap()
.
2023-12-15 12:23:28 +01:00
26540641c1
Renamed all kernel names.
2023-12-15 11:24:47 +01:00
77197379cc
More cleanup.
2023-12-15 11:17:05 +01:00
243e83f2b9
Adding a bunch of docs !
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-12-15 11:03:05 +01:00
40c3e1bd5a
cleanup.
2023-12-15 01:41:14 +01:00
ece4c69a68
Fixing softmax.
2023-12-15 01:35:08 +01:00
4eeaf205d6
Fix softmax for long sequences (missing barrier).
2023-12-14 19:37:03 +01:00
361f2ad2af
Working with merging encoders and using fences.
2023-12-14 16:05:33 +01:00
931432ed55
Fixing tests + matmul from MFA
2023-12-13 16:58:36 +01:00
0404a3eb5b
Removed MPSMatrix entirely (buggy).
2023-12-13 16:21:48 +01:00
a9d0657432
Better version ?
2023-12-13 12:09:20 +01:00
4cb443d00a
Fix the logsumexp test. ( #1426 )
2023-12-12 10:56:11 -06:00
87dc559817
Lots of updates including some stack of command buffers.
2023-12-12 17:41:56 +01:00
77252ffb82
Add logsumexp function ( #1424 )
2023-12-12 10:32:17 -06:00
18eb87f25f
Upsample grad ( #1420 )
...
* encode size of upsample in enum
* working convolution method for limited 2d kernels
* add test for sf 3 interpolation
* add higher dimensional tests, fix to work with multichannel input
* Remove commented out line.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2023-12-10 08:43:24 +01:00
4349ff1fc2
Starting to fix some tests.
...
Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e988
.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
e2eb6590ed
Merge pull request #1323 from huggingface/metal3
...
Adding the test scaffolding.
2023-11-27 13:06:01 +01:00
481c45d78d
Add a basic implementation for slice-assign. ( #1377 )
2023-11-26 17:31:22 +00:00
14a2bdc062
Small tweak: remove the macro usage for the range indexing trait. ( #1376 )
2023-11-26 16:30:59 +00:00
bfa7c8fc01
Implement the module trait directly for QMatMul. ( #1372 )
2023-11-25 10:09:45 +00:00
1edc3ddf24
Allowing feature metal to compile.
2023-11-20 20:17:16 +01:00
8d6c6de8e0
Missing new test.
2023-11-20 14:38:35 +01:00
7ec345c2eb
Adding the test scaffolding.
2023-11-20 14:38:35 +01:00
671fc29b36
Fmt.
2023-11-20 14:38:20 +01:00
c66e5d4716
Fix comments.
2023-11-20 14:13:44 +01:00
2813fb5dbc
Cleanup fixed a few ops removed debugging scaffolding.
2023-11-20 14:12:57 +01:00
7cfffcac10
Debugging rope.
2023-11-20 14:12:57 +01:00
38de52bc4b
Fixed matmul (display still broken without casting back to CPU first? )
2023-11-20 14:12:57 +01:00