86a8e58897
Update metal random kernel and set_seed method
...
* set_seed via buffer content pointer copy + did_modify_range
* ensure random.metal kernel does not write outside of buffer range when tid==0
2024-01-17 09:12:44 +01:00
79478ff5a1
Seed should be updated by random kernel result.
2024-01-15 11:58:25 +01:00
ecf88a6d38
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-14 17:10:54 +01:00
a3d92ab226
Metal: Activate bfloat affine and add benchmark ( #1543 )
...
* Use cfg to seperate benchmark results based on features
* Add bfloat affine and benchmarks
* Fix flops calculation
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-01-12 11:19:49 +01:00
e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark ( #1545 )
...
* Use cfg to seperate benchmark results based on features
* Add metal where_cond for f16 and bf16. Add benchmark
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Updated feature separated benchmarks
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-12 11:18:11 +01:00
e06e8d0dbe
fmt
2024-01-12 07:26:42 +01:00
e63bb8661b
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-12 07:19:58 +01:00
85e5680277
remove metal version check
2024-01-11 21:02:03 +00:00
1327419776
close ifdef
2024-01-11 17:14:12 +00:00
402349d120
feat(bf16): add cast support + tests for cast + bin ops ( #1524 )
2024-01-11 15:49:13 +01:00
d3bdd788cf
Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version check ( #1540 )
2024-01-10 18:50:30 +01:00
ae06cb74bb
Add relu kernel for metal ( #1488 )
...
* Add relu kernel for metal
* Copy error messages proposed in #1491
* Revert non relu changes
* Fix name changes
* Fix the last of us (:
* Fix copy and paste mistakes
* Fix typo
* Revert order changes
* Revert order change
* Add deleted functions back
* Run rustfmt
2024-01-10 18:27:17 +01:00
6ebe043273
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-07 11:52:03 +01:00
6bf52b9fdf
Gaussian normal distribution of PRNG via Box-Muller transform
2024-01-07 11:39:46 +01:00
955e63c803
Implement hybrid Tausworthe + LCG psuedo random number generator in metal
2024-01-05 13:27:59 +01:00
fa3ea98ba9
Adding bfloat16 support for the cast kernels. ( #1520 )
2024-01-04 12:12:56 +01:00
0a245e6fa4
Metal: support unary abs ( #1503 )
...
* Metal: support unary abs
* cargo fmt
2023-12-30 00:00:12 +01:00
87d7f81b43
Metal: more u8/u32 ( #1502 )
...
* Adds more metal u8
* Metal: more u32
2023-12-29 23:56:21 +01:00
4373534d59
Metal: i64 basic support ( #1495 )
...
* Adds basic metal i64 support
* metal copy i64
2023-12-29 19:42:50 +01:00
cc06ba2294
fix bad pattern matching and function name
2023-12-29 09:46:24 +00:00
3922b42c18
add urecip op to metal backend
2023-12-28 21:50:12 +00:00
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
13a5d15ebc
Adding upsample_nearest_2d.
2023-12-25 14:25:19 +01:00
95e18ef675
Fixing matmul for convolutions.
2023-12-25 12:29:34 +01:00
10d94659c3
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-21 10:39:24 +01:00
9fc210fae8
Merge pull request #1318 from huggingface/metal4
...
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
03641293ee
Clippy pass.
2023-12-18 15:22:43 +01:00
e8ee253ee0
Missing cast.
2023-12-18 11:01:18 +01:00
8bd3d6b94b
Index add.
2023-12-18 10:46:01 +01:00
6a3ca7da0c
Scatter add.
2023-12-18 10:32:22 +01:00
586b6f6fff
Adding gather op.
2023-12-17 23:34:12 +01:00
e4b0cc59f5
Adding CMP
2023-12-17 22:32:25 +01:00
972903021c
Finish reduce kernels.
2023-12-17 19:07:00 +01:00
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
6bc92e63cb
Addressing a lot of comments.
2023-12-15 13:06:04 +01:00
8b5059e951
Remove test file.
2023-12-15 11:55:30 +01:00
26540641c1
Renamed all kernel names.
2023-12-15 11:24:47 +01:00
34d83377f6
Better error message on older macos
2023-12-15 11:18:54 +01:00
243e83f2b9
Adding a bunch of docs !
...
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com >
2023-12-15 11:03:05 +01:00
cf27868b57
More cleanup.
2023-12-15 01:44:22 +01:00
ece4c69a68
Fixing softmax.
2023-12-15 01:35:08 +01:00
4eeaf205d6
Fix softmax for long sequences (missing barrier).
2023-12-14 19:37:03 +01:00
f419a38e1a
Fix use resource.
2023-12-14 16:52:37 +01:00
361f2ad2af
Working with merging encoders and using fences.
2023-12-14 16:05:33 +01:00
931432ed55
Fixing tests + matmul from MFA
2023-12-13 16:58:36 +01:00
0404a3eb5b
Removed MPSMatrix entirely (buggy).
2023-12-13 16:21:48 +01:00
87dc559817
Lots of updates including some stack of command buffers.
2023-12-12 17:41:56 +01:00
6e25822d4f
Fix gelu for large x
2023-12-06 09:59:44 -05:00
2ca086939f
Put back affine strided tests
2023-11-30 11:40:39 +01:00