Commit Graph

20 Commits

Author SHA1 Message Date
0f5cbb08b3 Add support for Llama 3.1 (#2359)
* Add Llama 3.1 rope

* Clippy

* Format

* Clippy

* Add support for multiple eos tokens:

* Untagged either

* Remove either dep and fix settings.json

* Make the max positional embeddings configurable
2024-07-26 21:32:26 +02:00
0067fe00a8 Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)
* add basic unary bench for sqrt

* process unary commands in tiles of 4

* re-enable all benchmarks

* rename helper to unary

* modify approach to split up tiled and non-tiled operations

* undo bench ignore for other tests

* update tile size to 2

* only perform the optimization on the contiguous even numbered element case
2024-04-21 00:10:33 +02:00
4c88c3ce06 Add benchmarks for qmatmul operations (#2048)
* Add qmatmul bench

* add all dtypes
2024-04-13 12:30:14 +02:00
9563a5fee4 Add support for conv_transpose2d on Metal backend (#1903)
* add support for conv transpose 2d and add bench mark for float types

* update bench calculation

* enable testing all conv operations on metal
2024-03-21 18:08:45 +01:00
ecf88a6d38 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-14 17:10:54 +01:00
bafe95b660 Fix format. (#1576) 2024-01-12 14:23:17 +01:00
a3d92ab226 Metal: Activate bfloat affine and add benchmark (#1543)
* Use cfg to seperate benchmark results based on features

* Add bfloat affine and benchmarks

* Fix flops calculation

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-01-12 11:19:49 +01:00
e90bcdcc7c Metal: f16 and bf16 where_cond + benchmark (#1545)
* Use cfg to seperate benchmark results based on features

* Add metal where_cond for f16 and bf16. Add benchmark

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

* Updated feature separated benchmarks

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-12 11:18:11 +01:00
e63bb8661b Merge branch 'main' into ivarflakstad/metal-prng 2024-01-12 07:19:58 +01:00
9f0c99f0c1 Seperate benchmarks by enabled features (#1538)
* Use cfg to seperate benchmark results based on features

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

* Derive bench_name from actual device

* Run CPU benchmarks even when GPU feature is enabled

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-11 15:35:38 +01:00
87efb5d8eb Updated feature separated benchmarks 2024-01-09 19:04:31 +01:00
ad181f9cdc Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng 2024-01-09 18:55:40 +01:00
88945f2c22 Improve benchmarks layout 2024-01-09 18:31:28 +01:00
fb05af4c42 Avoid some unnecessary returns. 2024-01-08 07:19:59 +01:00
ad075a5f7e Remove allow pragma 2024-01-08 06:48:33 +01:00
3f04a79ada Use cfg to seperate benchmark results based on features 2024-01-07 14:40:15 +01:00
6bf52b9fdf Gaussian normal distribution of PRNG via Box-Muller transform 2024-01-07 11:39:46 +01:00
955e63c803 Implement hybrid Tausworthe + LCG psuedo random number generator in metal 2024-01-05 13:27:59 +01:00
ceb78d3e28 Sketch the minimal mamba example. (#1465)
* Sketch the minimal mamba example.

* Fix rustfmt.

* Forward pass for mamba.

* Finish the forward pass.

* Inference fixes.

* Bugfixes.

* More fixes.

* Add a readme.
2023-12-22 00:28:50 +01:00
9b5e4843a6 Optimizing decode matmul (Phi at 28tok/s on M3).
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00