a3d92ab226
Metal: Activate bfloat affine and add benchmark ( #1543 )
...
* Use cfg to seperate benchmark results based on features
* Add bfloat affine and benchmarks
* Fix flops calculation
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-01-12 11:19:49 +01:00
e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark ( #1545 )
...
* Use cfg to seperate benchmark results based on features
* Add metal where_cond for f16 and bf16. Add benchmark
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Updated feature separated benchmarks
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-12 11:18:11 +01:00
e06e8d0dbe
fmt
2024-01-12 07:26:42 +01:00
e63bb8661b
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-12 07:19:58 +01:00
85e5680277
remove metal version check
2024-01-11 21:02:03 +00:00
1327419776
close ifdef
2024-01-11 17:14:12 +00:00
402349d120
feat(bf16): add cast support + tests for cast + bin ops ( #1524 )
2024-01-11 15:49:13 +01:00
d3bdd788cf
Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version check ( #1540 )
2024-01-10 18:50:30 +01:00
ae06cb74bb
Add relu kernel for metal ( #1488 )
...
* Add relu kernel for metal
* Copy error messages proposed in #1491
* Revert non relu changes
* Fix name changes
* Fix the last of us (:
* Fix copy and paste mistakes
* Fix typo
* Revert order changes
* Revert order change
* Add deleted functions back
* Run rustfmt
2024-01-10 18:27:17 +01:00
6ebe043273
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-07 11:52:03 +01:00
6bf52b9fdf
Gaussian normal distribution of PRNG via Box-Muller transform
2024-01-07 11:39:46 +01:00
955e63c803
Implement hybrid Tausworthe + LCG psuedo random number generator in metal
2024-01-05 13:27:59 +01:00
fa3ea98ba9
Adding bfloat16 support for the cast kernels. ( #1520 )
2024-01-04 12:12:56 +01:00
0a245e6fa4
Metal: support unary abs ( #1503 )
...
* Metal: support unary abs
* cargo fmt
2023-12-30 00:00:12 +01:00
87d7f81b43
Metal: more u8/u32 ( #1502 )
...
* Adds more metal u8
* Metal: more u32
2023-12-29 23:56:21 +01:00
4373534d59
Metal: i64 basic support ( #1495 )
...
* Adds basic metal i64 support
* metal copy i64
2023-12-29 19:42:50 +01:00
cc06ba2294
fix bad pattern matching and function name
2023-12-29 09:46:24 +00:00
3922b42c18
add urecip op to metal backend
2023-12-28 21:50:12 +00:00
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
13a5d15ebc
Adding upsample_nearest_2d.
2023-12-25 14:25:19 +01:00
95e18ef675
Fixing matmul for convolutions.
2023-12-25 12:29:34 +01:00
10d94659c3
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-21 10:39:24 +01:00
9fc210fae8
Merge pull request #1318 from huggingface/metal4
...
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
03641293ee
Clippy pass.
2023-12-18 15:22:43 +01:00
e8ee253ee0
Missing cast.
2023-12-18 11:01:18 +01:00
8bd3d6b94b
Index add.
2023-12-18 10:46:01 +01:00
6a3ca7da0c
Scatter add.
2023-12-18 10:32:22 +01:00
586b6f6fff
Adding gather op.
2023-12-17 23:34:12 +01:00
e4b0cc59f5
Adding CMP
2023-12-17 22:32:25 +01:00
972903021c
Finish reduce kernels.
2023-12-17 19:07:00 +01:00
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
6bc92e63cb
Addressing a lot of comments.
2023-12-15 13:06:04 +01:00
8b5059e951
Remove test file.
2023-12-15 11:55:30 +01:00
26540641c1
Renamed all kernel names.
2023-12-15 11:24:47 +01:00
34d83377f6
Better error message on older macos
2023-12-15 11:18:54 +01:00
243e83f2b9
Adding a bunch of docs !
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-12-15 11:03:05 +01:00
cf27868b57
More cleanup.
2023-12-15 01:44:22 +01:00
ece4c69a68
Fixing softmax.
2023-12-15 01:35:08 +01:00
4eeaf205d6
Fix softmax for long sequences (missing barrier).
2023-12-14 19:37:03 +01:00
f419a38e1a
Fix use resource.
2023-12-14 16:52:37 +01:00
361f2ad2af
Working with merging encoders and using fences.
2023-12-14 16:05:33 +01:00
931432ed55
Fixing tests + matmul from MFA
2023-12-13 16:58:36 +01:00
0404a3eb5b
Removed MPSMatrix entirely (buggy).
2023-12-13 16:21:48 +01:00
87dc559817
Lots of updates including some stack of command buffers.
2023-12-12 17:41:56 +01:00
6e25822d4f
Fix gelu for large x
2023-12-06 09:59:44 -05:00
2ca086939f
Put back affine strided tests
2023-11-30 11:40:39 +01:00
4349ff1fc2
Starting to fix some tests.
...
Few fixes.
Going back on remote metal-rs.
Reusing a single buffer (for now) to speed things up.
Adding some half kernels.
All tests are panicking instead of random failure.
Putting back f16 index select.
Add erf.
Working version for llama2-c.
Fixes + cache compute_pipeline_state.
BF16 metal fix.
Remove some prints.
new_owned -> new()..to_owned().
Better batched matmul.
Metal operational.
Reuse buffers on our own reference counts.
Tmp gemm.
Revert "Tmp gemm."
This reverts commit c65f68e988
.
Interleave committing.
Speeding up copies using blit.
Fmt.
Fmt.
Remove the assert!
Fmt all.
Fixes after big rebase.
Add softmax for half and bfloat + tests
Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
60f624a902
Moving tests around.
2023-11-20 16:17:19 +01:00
dc64adb8e4
Fixing cos_f16 test.
2023-11-20 14:17:07 +01:00