bdd8107fda
Expose the ndarray trait. ( #1586 )
2024-01-14 20:09:49 +01:00
ecf88a6d38
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-14 17:10:54 +01:00
e6d86b0819
Add the pow operator. ( #1583 )
...
* Add the pow operator.
* Support the pow operation in onnx.
2024-01-13 20:24:06 +01:00
bafe95b660
Fix format. ( #1576 )
2024-01-12 14:23:17 +01:00
a3d92ab226
Metal: Activate bfloat affine and add benchmark ( #1543 )
...
* Use cfg to seperate benchmark results based on features
* Add bfloat affine and benchmarks
* Fix flops calculation
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-01-12 11:19:49 +01:00
e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark ( #1545 )
...
* Use cfg to seperate benchmark results based on features
* Add metal where_cond for f16 and bf16. Add benchmark
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Updated feature separated benchmarks
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-12 11:18:11 +01:00
e63bb8661b
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-12 07:19:58 +01:00
41915184bb
Bugfix for dequantizing q5k layers. ( #1569 )
2024-01-11 23:15:11 +01:00
402349d120
feat(bf16): add cast support + tests for cast + bin ops ( #1524 )
2024-01-11 15:49:13 +01:00
9f0c99f0c1
Seperate benchmarks by enabled features ( #1538 )
...
* Use cfg to seperate benchmark results based on features
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Derive bench_name from actual device
* Run CPU benchmarks even when GPU feature is enabled
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-11 15:35:38 +01:00
0fc95c9f0c
Add a dequantize command to tensor-tools. ( #1565 )
...
* Add a dequantize command to tensor-tools.
* Clippy fixes.
2024-01-11 11:21:01 +01:00
ae06cb74bb
Add relu kernel for metal ( #1488 )
...
* Add relu kernel for metal
* Copy error messages proposed in #1491
* Revert non relu changes
* Fix name changes
* Fix the last of us (:
* Fix copy and paste mistakes
* Fix typo
* Revert order changes
* Revert order change
* Add deleted functions back
* Run rustfmt
2024-01-10 18:27:17 +01:00
87efb5d8eb
Updated feature separated benchmarks
2024-01-09 19:04:31 +01:00
ad181f9cdc
Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng
2024-01-09 18:55:40 +01:00
88945f2c22
Improve benchmarks layout
2024-01-09 18:31:28 +01:00
12b2a337f3
Handle start-offset when loading a tensor from a pickle file. ( #1546 )
2024-01-08 09:20:48 +01:00
fb05af4c42
Avoid some unnecessary returns.
2024-01-08 07:19:59 +01:00
ad075a5f7e
Remove allow pragma
2024-01-08 06:48:33 +01:00
0eb90ed783
Simpler repro for the neon optimization issue + bugfix ( #1544 )
...
* Simpler repro for the neon optimization issue.
* Bugfix for q4k.
* Improve the fix, share the dot-prod bit.
* Clippy fixes.
* Fix for q6k.
* Also fix for q2k.
* Use the new shared dotprod.
* Add more testing.
2024-01-07 20:21:49 +01:00
3f04a79ada
Use cfg to seperate benchmark results based on features
2024-01-07 14:40:15 +01:00
b4cb982e49
Simplifying our internal cargo dependencies. ( #1529 )
2024-01-07 12:04:14 +01:00
6ebe043273
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-07 11:52:03 +01:00
6bf52b9fdf
Gaussian normal distribution of PRNG via Box-Muller transform
2024-01-07 11:39:46 +01:00
955e63c803
Implement hybrid Tausworthe + LCG psuedo random number generator in metal
2024-01-05 13:27:59 +01:00
fa3ea98ba9
Adding bfloat16 support for the cast kernels. ( #1520 )
2024-01-04 12:12:56 +01:00
0a245e6fa4
Metal: support unary abs ( #1503 )
...
* Metal: support unary abs
* cargo fmt
2023-12-30 00:00:12 +01:00
87d7f81b43
Metal: more u8/u32 ( #1502 )
...
* Adds more metal u8
* Metal: more u32
2023-12-29 23:56:21 +01:00
4373534d59
Metal: i64 basic support ( #1495 )
...
* Adds basic metal i64 support
* metal copy i64
2023-12-29 19:42:50 +01:00
488e02a3f6
Merge pull request #1496 from bayedieng/unary
...
Implement urecip op for metal backend
2023-12-29 12:20:52 +01:00
f5c98f22c7
Merge pull request #1491 from mimiquate/metal-errors
...
Improves metal's not implemented error messages
2023-12-29 12:03:40 +01:00
cc06ba2294
fix bad pattern matching and function name
2023-12-29 09:46:24 +00:00
3922b42c18
add urecip op to metal backend
2023-12-28 21:50:12 +00:00
1e442d4bb9
Fix lints for clippy 1.75. ( #1494 )
2023-12-28 20:26:20 +01:00
8e93e76a91
fixes error message
2023-12-28 15:03:05 -03:00
b3e838f3e2
cargo fmt
2023-12-28 14:07:34 -03:00
8bf892403a
Improves metal's not implemented error messages
2023-12-28 11:04:06 -03:00
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
13a5d15ebc
Adding upsample_nearest_2d.
2023-12-25 14:25:19 +01:00
1505d85276
Merge pull request #1461 from huggingface/metal-conv
...
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-25 12:48:09 +01:00
95e18ef675
Fixing matmul for convolutions.
2023-12-25 12:29:34 +01:00
7135791dd5
Fix the quantized mistral example. ( #1478 )
2023-12-25 09:31:24 +01:00
ba1fae590e
Validate the kernel size in pooling ops. ( #1473 )
...
* Validate the kernel size in pooling ops.
* Revert the changes to basics.
2023-12-23 11:19:22 +01:00
ceb78d3e28
Sketch the minimal mamba example. ( #1465 )
...
* Sketch the minimal mamba example.
* Fix rustfmt.
* Forward pass for mamba.
* Finish the forward pass.
* Inference fixes.
* Bugfixes.
* More fixes.
* Add a readme.
2023-12-22 00:28:50 +01:00
10d94659c3
Adding the convolutions (1d + 2d) to candle on metal.
2023-12-21 10:39:24 +01:00
9fc210fae8
Merge pull request #1318 from huggingface/metal4
...
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
03641293ee
Clippy pass.
2023-12-18 15:22:43 +01:00
064ba17bd7
Remove print.
2023-12-18 11:04:16 +01:00
e8ee253ee0
Missing cast.
2023-12-18 11:01:18 +01:00
8bd3d6b94b
Index add.
2023-12-18 10:46:01 +01:00