db923517b3
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-17 18:03:57 +01:00
403680f17d
Quantized GGUF style ( #1523 )
...
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local >
2024-01-17 10:27:58 +01:00
86a8e58897
Update metal random kernel and set_seed method
...
* set_seed via buffer content pointer copy + did_modify_range
* ensure random.metal kernel does not write outside of buffer range when tid==0
2024-01-17 09:12:44 +01:00
5270224f40
Add MobileOne model. ( #1595 )
...
* Add MobileOne model.
* Clippy fixes
* Remove a comment.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com >
2024-01-16 06:34:16 +01:00
7e3349d7c3
Update parquet requirement from 45.0.0 to 50.0.0 ( #1592 )
...
Updates the requirements on [parquet](https://github.com/apache/arrow-rs ) to permit the latest version.
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/compare/45.0.0...45.0.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-15 22:35:01 +01:00
1257fc6719
Update safetensors requirement from 0.3.1 to 0.4.1 ( #1591 )
...
Updates the requirements on [safetensors](https://github.com/huggingface/safetensors ) to permit the latest version.
- [Release notes](https://github.com/huggingface/safetensors/releases )
- [Changelog](https://github.com/huggingface/safetensors/blob/main/RELEASE.md )
- [Commits](https://github.com/huggingface/safetensors/compare/v0.3.1...v0.3.3 )
---
updated-dependencies:
- dependency-name: safetensors
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-15 22:34:40 +01:00
ea36f3b11f
Use the new phi model by default. ( #1589 )
2024-01-15 12:30:27 +01:00
79478ff5a1
Seed should be updated by random kernel result.
2024-01-15 11:58:25 +01:00
86b7c01b30
Update gemm to the latest version. ( #1587 )
2024-01-15 09:44:51 +01:00
bdd8107fda
Expose the ndarray trait. ( #1586 )
2024-01-14 20:09:49 +01:00
ecf88a6d38
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-14 17:10:54 +01:00
e6d86b0819
Add the pow operator. ( #1583 )
...
* Add the pow operator.
* Support the pow operation in onnx.
2024-01-13 20:24:06 +01:00
88618255cb
Fix the rotary embeddings for the new phi implementation. ( #1582 )
...
* Fix the rotary embeddings for the new phi implementation.
* Match the activation.
* KV cache fix.
* Use the config activation function.
2024-01-13 19:44:41 +01:00
539ead927a
Update the Phi model to use the updated architecture. ( #1580 )
...
* Update the Phi model to use the updated architecture.
* Add more of the phi model.
* Repeat KV + caching.
* Apply the rotary embeddings.
* Add support for the new phi model in the phi example.
* Fix a couple glitches.
* Fix a couple more glitches.
2024-01-13 17:38:27 +01:00
a46864bd56
Fix "Minimal Mamba" link in README. ( #1577 )
2024-01-12 17:47:07 +01:00
bafe95b660
Fix format. ( #1576 )
2024-01-12 14:23:17 +01:00
a3d92ab226
Metal: Activate bfloat affine and add benchmark ( #1543 )
...
* Use cfg to seperate benchmark results based on features
* Add bfloat affine and benchmarks
* Fix flops calculation
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com >
2024-01-12 11:19:49 +01:00
e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark ( #1545 )
...
* Use cfg to seperate benchmark results based on features
* Add metal where_cond for f16 and bf16. Add benchmark
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Updated feature separated benchmarks
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-12 11:18:11 +01:00
8e06bfb4fd
Mention VGG in the readme. ( #1573 )
2024-01-12 09:59:29 +01:00
6242276c09
Pin the revision used for phi-v2 + make it the default. ( #1572 )
...
* Pin the revision used for phi-v2 + make it the default.
* Tweak the custom-ops build.
2024-01-12 09:19:30 +01:00
e06e8d0dbe
fmt
2024-01-12 07:26:42 +01:00
e63bb8661b
Merge branch 'main' into ivarflakstad/metal-prng
2024-01-12 07:19:58 +01:00
41915184bb
Bugfix for dequantizing q5k layers. ( #1569 )
2024-01-11 23:15:11 +01:00
c1876b8041
Merge pull request #1567 from bayedieng/close-ifdef
2024-01-11 22:14:38 +01:00
85e5680277
remove metal version check
2024-01-11 21:02:03 +00:00
1327419776
close ifdef
2024-01-11 17:14:12 +00:00
402349d120
feat(bf16): add cast support + tests for cast + bin ops ( #1524 )
2024-01-11 15:49:13 +01:00
9f0c99f0c1
Seperate benchmarks by enabled features ( #1538 )
...
* Use cfg to seperate benchmark results based on features
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Derive bench_name from actual device
* Run CPU benchmarks even when GPU feature is enabled
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-11 15:35:38 +01:00
0fc95c9f0c
Add a dequantize command to tensor-tools. ( #1565 )
...
* Add a dequantize command to tensor-tools.
* Clippy fixes.
2024-01-11 11:21:01 +01:00
2480c5dbdd
Add RepVGG model. ( #1561 )
...
* Add RepVGG model.
* Add RepVGG README
* Extract var to top level
* Replace hashmap with a match
* Add a variant for the model kind + avoid some unnecessary config cloning.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-11 07:07:40 +01:00
63944714f2
Use candle_nn::embedding instead of local copies in a few models. ( #1562 )
2024-01-10 21:36:27 +01:00
d3bdd788cf
Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version check ( #1540 )
2024-01-10 18:50:30 +01:00
ae06cb74bb
Add relu kernel for metal ( #1488 )
...
* Add relu kernel for metal
* Copy error messages proposed in #1491
* Revert non relu changes
* Fix name changes
* Fix the last of us (:
* Fix copy and paste mistakes
* Fix typo
* Revert order changes
* Revert order change
* Add deleted functions back
* Run rustfmt
2024-01-10 18:27:17 +01:00
a897fda74e
Update memmap2 requirement from 0.7.1 to 0.9.3 ( #1556 )
...
Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs ) to permit the latest version.
- [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.7.1 )
---
updated-dependencies:
- dependency-name: memmap2
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-10 16:27:59 +01:00
1f1179913a
Update gloo requirement from 0.8 to 0.11 ( #1558 )
...
Updates the requirements on [gloo](https://github.com/rustwasm/gloo ) to permit the latest version.
- [Release notes](https://github.com/rustwasm/gloo/releases )
- [Changelog](https://github.com/rustwasm/gloo/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rustwasm/gloo/commits )
---
updated-dependencies:
- dependency-name: gloo
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-10 16:27:20 +01:00
6e98cf2a92
Update cudarc requirement from 0.9.14 to 0.10.0 ( #1559 )
...
Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc ) to permit the latest version.
- [Release notes](https://github.com/coreylowman/cudarc/releases )
- [Commits](https://github.com/coreylowman/cudarc/compare/v0.9.14...v0.9.15 )
---
updated-dependencies:
- dependency-name: cudarc
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-10 16:27:05 +01:00
2cc1247999
Update tokenizers requirement from 0.13.4 to 0.15.0 ( #1555 )
...
Updates the requirements on [tokenizers](https://github.com/huggingface/tokenizers ) to permit the latest version.
- [Release notes](https://github.com/huggingface/tokenizers/releases )
- [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md )
- [Commits](https://github.com/huggingface/tokenizers/commits )
---
updated-dependencies:
- dependency-name: tokenizers
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-01-10 16:26:53 +01:00
edf3fcd1c4
fix: deprecated option field (open-pull-requests-limit-per-dependency) ( #1554 )
2024-01-10 15:12:46 +01:00
53e4755015
feat: add dependabot to the project ( #1553 )
...
* feat: add dependabot to the project
* feat: add let's accept patches/fix from other libs
* Revert "feat: add let's accept patches/fix from other libs"
This reverts commit d31a956f81
.
2024-01-10 14:57:20 +01:00
87efb5d8eb
Updated feature separated benchmarks
2024-01-09 19:04:31 +01:00
ad181f9cdc
Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng
2024-01-09 18:55:40 +01:00
88945f2c22
Improve benchmarks layout
2024-01-09 18:31:28 +01:00
12b2a337f3
Handle start-offset when loading a tensor from a pickle file. ( #1546 )
2024-01-08 09:20:48 +01:00
fb05af4c42
Avoid some unnecessary returns.
2024-01-08 07:19:59 +01:00
ad075a5f7e
Remove allow pragma
2024-01-08 06:48:33 +01:00
0eb90ed783
Simpler repro for the neon optimization issue + bugfix ( #1544 )
...
* Simpler repro for the neon optimization issue.
* Bugfix for q4k.
* Improve the fix, share the dot-prod bit.
* Clippy fixes.
* Fix for q6k.
* Also fix for q2k.
* Use the new shared dotprod.
* Add more testing.
2024-01-07 20:21:49 +01:00
89b5a06858
Use bindgen-cuda for the custom-kernel example. ( #1536 )
...
* Use bindgen-cuda for the custom-kernel example.
* Only depend on the kernels when cuda is enabled.
* Skip rustfmt.
2024-01-07 17:18:46 +01:00
3f04a79ada
Use cfg to seperate benchmark results based on features
2024-01-07 14:40:15 +01:00
30313c3081
Moving to a proper build crate bindgen_cuda
. ( #1531 )
...
* Moving to a proper build crate `bindgen_cuda`.
* Fmt.
2024-01-07 12:29:24 +01:00
e72d52b1a2
Unpin more of the workplace relative dependencies. ( #1535 )
2024-01-07 12:26:20 +01:00