9563a5fee4
Add support for conv_transpose2d on Metal backend ( #1903 )
...
* add support for conv transpose 2d and add bench mark for float types
* update bench calculation
* enable testing all conv operations on metal
2024-03-21 18:08:45 +01:00
ecf88a6d38
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-14 17:10:54 +01:00
bafe95b660
Fix format. ( #1576 )
2024-01-12 14:23:17 +01:00
a3d92ab226
Metal: Activate bfloat affine and add benchmark ( #1543 )
...
* Use cfg to seperate benchmark results based on features
* Add bfloat affine and benchmarks
* Fix flops calculation
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-01-12 11:19:49 +01:00
e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark ( #1545 )
...
* Use cfg to seperate benchmark results based on features
* Add metal where_cond for f16 and bf16. Add benchmark
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Updated feature separated benchmarks
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-12 11:18:11 +01:00
e63bb8661b
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-12 07:19:58 +01:00
9f0c99f0c1
Seperate benchmarks by enabled features ( #1538 )
...
* Use cfg to seperate benchmark results based on features
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Derive bench_name from actual device
* Run CPU benchmarks even when GPU feature is enabled
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-11 15:35:38 +01:00
87efb5d8eb
Updated feature separated benchmarks
2024-01-09 19:04:31 +01:00
ad181f9cdc
Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng
2024-01-09 18:55:40 +01:00
88945f2c22
Improve benchmarks layout
2024-01-09 18:31:28 +01:00
fb05af4c42
Avoid some unnecessary returns.
2024-01-08 07:19:59 +01:00
ad075a5f7e
Remove allow pragma
2024-01-08 06:48:33 +01:00
3f04a79ada
Use cfg to seperate benchmark results based on features
2024-01-07 14:40:15 +01:00
6bf52b9fdf
Gaussian normal distribution of PRNG via Box-Muller transform
2024-01-07 11:39:46 +01:00
955e63c803
Implement hybrid Tausworthe + LCG psuedo random number generator in metal
2024-01-05 13:27:59 +01:00
ceb78d3e28
Sketch the minimal mamba example. ( #1465 )
...
* Sketch the minimal mamba example.
* Fix rustfmt.
* Forward pass for mamba.
* Finish the forward pass.
* Inference fixes.
* Bugfixes.
* More fixes.
* Add a readme.
2023-12-22 00:28:50 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00