Commit Graph

92 Commits

Author SHA1 Message Date
80b1c689f9 Revert public EncoderParam 2024-01-17 18:09:28 +01:00
db923517b3 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-17 18:03:57 +01:00
403680f17d Quantized GGUF style (#1523)
* Metal quantized modifications proposal.

- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.

Fix Python.

Fixing examples.

Fix: fmt + clippy + stub.

Moving everything around.

Only missing the actual implems.

Fixing everything + adding dequantized kernels.

More work.

Fixing matmul.

Fmt + Clippy

Some clippy fixes.

Working state.

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

Fixing Q2K bug (present in ggml).

* Cleanup.

* Fix the rebase.

* Removing the fences speeds everything up and *is* correct this time...

* Cleanup the fence.

* After rebase.

* Bad code removal.

* Rebase after phi2 merge + fix replit default to CPU.

* Making the CI happy.

* More happy tests.

---------

Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
2024-01-17 10:27:58 +01:00
79478ff5a1 Seed should be updated by random kernel result. 2024-01-15 11:58:25 +01:00
e63bb8661b Merge branch 'main' into ivarflakstad/metal-prng 2024-01-12 07:19:58 +01:00
ae06cb74bb Add relu kernel for metal (#1488)
* Add relu kernel for metal

* Copy error messages proposed in #1491

* Revert non relu changes

* Fix name changes

* Fix the last of us (:

* Fix copy and paste mistakes

* Fix typo

* Revert order changes

* Revert order change

* Add deleted functions back

* Run rustfmt
2024-01-10 18:27:17 +01:00
6ebe043273 Merge branch 'main' into ivarflakstad/metal-prng 2024-01-07 11:52:03 +01:00
6bf52b9fdf Gaussian normal distribution of PRNG via Box-Muller transform 2024-01-07 11:39:46 +01:00
955e63c803 Implement hybrid Tausworthe + LCG psuedo random number generator in metal 2024-01-05 13:27:59 +01:00
0a245e6fa4 Metal: support unary abs (#1503)
* Metal: support unary abs

* cargo fmt
2023-12-30 00:00:12 +01:00
87d7f81b43 Metal: more u8/u32 (#1502)
* Adds more metal u8

* Metal: more u32
2023-12-29 23:56:21 +01:00
4373534d59 Metal: i64 basic support (#1495)
* Adds basic metal i64 support

* metal copy i64
2023-12-29 19:42:50 +01:00
cc06ba2294 fix bad pattern matching and function name 2023-12-29 09:46:24 +00:00
3922b42c18 add urecip op to metal backend 2023-12-28 21:50:12 +00:00
13a5d15ebc Adding upsample_nearest_2d. 2023-12-25 14:25:19 +01:00
95e18ef675 Fixing matmul for convolutions. 2023-12-25 12:29:34 +01:00
10d94659c3 Adding the convolutions (1d + 2d) to candle on metal. 2023-12-21 10:39:24 +01:00
9b5e4843a6 Optimizing decode matmul (Phi at 28tok/s on M3).
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
8bd3d6b94b Index add. 2023-12-18 10:46:01 +01:00
6a3ca7da0c Scatter add. 2023-12-18 10:32:22 +01:00
586b6f6fff Adding gather op. 2023-12-17 23:34:12 +01:00
e4b0cc59f5 Adding CMP 2023-12-17 22:32:25 +01:00
972903021c Finish reduce kernels. 2023-12-17 19:07:00 +01:00
6bc92e63cb Addressing a lot of comments. 2023-12-15 13:06:04 +01:00
26540641c1 Renamed all kernel names. 2023-12-15 11:24:47 +01:00
34d83377f6 Better error message on older macos 2023-12-15 11:18:54 +01:00
243e83f2b9 Adding a bunch of docs !
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
2023-12-15 11:03:05 +01:00
cf27868b57 More cleanup. 2023-12-15 01:44:22 +01:00
f419a38e1a Fix use resource. 2023-12-14 16:52:37 +01:00
361f2ad2af Working with merging encoders and using fences. 2023-12-14 16:05:33 +01:00
0404a3eb5b Removed MPSMatrix entirely (buggy). 2023-12-13 16:21:48 +01:00
87dc559817 Lots of updates including some stack of command buffers. 2023-12-12 17:41:56 +01:00
4349ff1fc2 Starting to fix some tests.
Few fixes.

Going back on remote metal-rs.

Reusing a single buffer (for now) to speed things up.

Adding some half kernels.

All tests are panicking instead of random failure.

Putting back f16 index select.

Add erf.

Working version for llama2-c.

Fixes + cache compute_pipeline_state.

BF16 metal fix.

Remove some prints.

new_owned -> new()..to_owned().

Better batched matmul.

Metal operational.

Reuse buffers on our own reference counts.

Tmp gemm.

Revert "Tmp gemm."

This reverts commit c65f68e988.

Interleave committing.

Speeding up copies using blit.

Fmt.

Fmt.

Remove the assert!

Fmt all.

Fixes after big rebase.

Add softmax for half and bfloat + tests

Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
60f624a902 Moving tests around. 2023-11-20 16:17:19 +01:00
dc64adb8e4 Fixing cos_f16 test. 2023-11-20 14:17:07 +01:00
c66e5d4716 Fix comments. 2023-11-20 14:13:44 +01:00
2813fb5dbc Cleanup fixed a few ops removed debugging scaffolding. 2023-11-20 14:12:57 +01:00
d46670f7c0 Tmp state. 2023-11-20 14:12:57 +01:00
f710fab02e Fixing the kernels + launches to make them faster.
Cool work by @ivarflakstad

Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
2023-11-20 14:12:57 +01:00
f82bf2d915 Adding indexing.
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
2023-11-20 14:12:57 +01:00
df6814f34e Refactor to simplify our lives for settings the params in the encoder. 2023-11-20 14:12:57 +01:00
39406a6721 Adding the actual backend 2023-11-20 14:12:56 +01:00