e8e24f1284
Follow crate conventions
2024-01-01 20:37:56 +01:00
6eb44d1bce
Added fill bench
2024-01-01 20:22:44 +01:00
7fc26764b6
Implement generic fill. u8 uses speedy blit encoder
2023-12-29 16:02:29 +01:00
0a29d2e9b8
Add fill kernel handler
2023-12-29 12:27:12 +01:00
13a5d15ebc
Adding upsample_nearest_2d.
2023-12-25 14:25:19 +01:00
95e18ef675
Fixing matmul for convolutions.
2023-12-25 12:29:34 +01:00
10d94659c3
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-21 10:39:24 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
8bd3d6b94b
Index add.
2023-12-18 10:46:01 +01:00
6a3ca7da0c
Scatter add.
2023-12-18 10:32:22 +01:00
586b6f6fff
Adding gather op.
2023-12-17 23:34:12 +01:00
e4b0cc59f5
Adding CMP
2023-12-17 22:32:25 +01:00
972903021c
Finish reduce kernels.
2023-12-17 19:07:00 +01:00
6bc92e63cb
Addressing a lot of comments.
2023-12-15 13:06:04 +01:00
26540641c1
Renamed all kernel names.
2023-12-15 11:24:47 +01:00
34d83377f6
Better error message on older macos
2023-12-15 11:18:54 +01:00
243e83f2b9
Adding a bunch of docs !
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-12-15 11:03:05 +01:00
cf27868b57
More cleanup.
2023-12-15 01:44:22 +01:00
f419a38e1a
Fix use resource.
2023-12-14 16:52:37 +01:00
361f2ad2af
Working with merging encoders and using fences.
2023-12-14 16:05:33 +01:00
0404a3eb5b
Removed MPSMatrix entirely (buggy).
2023-12-13 16:21:48 +01:00
87dc559817
Lots of updates including some stack of command buffers.
2023-12-12 17:41:56 +01:00
4349ff1fc2
Starting to fix some tests.
...
Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e988
.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
60f624a902
Moving tests around.
2023-11-20 16:17:19 +01:00
dc64adb8e4
Fixing cos_f16 test.
2023-11-20 14:17:07 +01:00
c66e5d4716
Fix comments.
2023-11-20 14:13:44 +01:00
2813fb5dbc
Cleanup fixed a few ops removed debugging scaffolding.
2023-11-20 14:12:57 +01:00
d46670f7c0
Tmp state.
2023-11-20 14:12:57 +01:00
f710fab02e
Fixing the kernels + launches to make them faster.
...
Cool work by @ivarflakstad
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-11-20 14:12:57 +01:00
f82bf2d915
Adding indexing.
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-11-20 14:12:57 +01:00
df6814f34e
Refactor to simplify our lives for settings the params in the encoder.
2023-11-20 14:12:57 +01:00
39406a6721
Adding the actual backend
2023-11-20 14:12:56 +01:00