6381023982
Adding cuda
feature for easier integration with extensions.
2025-04-15 16:28:51 +02:00
ec6d7ca773
Cudarc static-linking enabled.
2025-03-29 09:27:53 +01:00
9862cd3ba2
Splitting the features to enable different mkl linking.
2025-03-28 10:13:13 +01:00
0af3e428ec
fix: place ug
dep behind not wasm32
flag ( #2760 )
...
* place `ug` behind not wasm32 attr
so that wasm32 can compile
* mv `ug` to conditional target dep
assuming every non-wasm32 user wants this
2025-02-01 23:05:52 +01:00
0e2c8c17fb
UG metal integration. ( #2580 )
2024-10-27 15:20:37 +01:00
594d984f9c
Support for UG kernels. ( #2579 )
...
* Support for UG kernels.
* Add a dedicated test.
2024-10-27 13:37:19 +01:00
25960676ca
Add a basic metal example with capture ( #2324 )
...
* Add some tracing.
* Get the trace to work.
2024-07-09 12:38:11 +02:00
402349d120
feat(bf16): add cast support + tests for cast + bin ops ( #1524 )
2024-01-11 15:49:13 +01:00
9f0c99f0c1
Seperate benchmarks by enabled features ( #1538 )
...
* Use cfg to seperate benchmark results based on features
* Remove allow pragma
* Avoid some unnecessary returns.
* Improve benchmarks layout
* Derive bench_name from actual device
* Run CPU benchmarks even when GPU feature is enabled
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com >
2024-01-11 15:35:38 +01:00
b4cb982e49
Simplifying our internal cargo dependencies. ( #1529 )
2024-01-07 12:04:14 +01:00
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
9fc210fae8
Merge pull request #1318 from huggingface/metal4
...
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
9b5e4843a6
Optimizing decode matmul (Phi at 28tok/s on M3).
...
Adding some benchmark in order to help checking out matmul performance.
2023-12-20 09:54:19 +01:00
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
c66e5d4716
Fix comments.
2023-11-20 14:13:44 +01:00
39406a6721
Adding the actual backend
2023-11-20 14:12:56 +01:00
a209ce8ceb
Update for 0.3.1. ( #1324 )
2023-11-11 18:48:52 +00:00
26c4e5bf1d
Metal part 1 - Scaffolding for metal. ( #1308 )
...
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
2023-11-10 08:35:48 +01:00
096dee7073
Bump the version to 0.3.0. ( #1014 )
...
* Bump the version to 0.3.0.
* Changelog update.
2023-10-01 13:51:57 +01:00
ccf352f3d1
Use yoke to provide a self-referential container for mmaped safetenso… ( #939 )
...
* Use yoke to provide a self-referential container for mmaped safetensor files.
* Add the new self-owned type for safetensor files without removing the previous version.
* Add routing.
* Add an initializer for the case of multiple files.
2023-09-23 15:43:11 +01:00
7dd8e12472
Bump the crate versions to v0.2.3. ( #886 )
...
* Bump the crate version.
* Also update the python bindings.
2023-09-18 12:14:03 +01:00
2257f4d475
Bump the crate version + update the changelog. ( #822 )
2023-09-12 06:39:24 +01:00
618f4e4c78
Add some documentation. ( #673 )
...
* Add some documentation.
* Bump the crate version.
2023-08-30 11:54:00 +01:00
a3f97c143d
Bump the crate version + update CHANGELOG. ( #628 )
2023-08-27 18:17:11 +01:00
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
a8f61e66cc
Bump the crates version to 0.1.2. ( #522 )
2023-08-20 08:07:07 +01:00
531f23b4d0
Rename vec-dot to vec-ops. ( #449 )
...
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
495e0b7580
Simd support ( #448 )
...
* Import the simd intrinsics in candle-core.
* simd version of reduce-sum.
* Bugfix.
* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
90374097dc
Cudnn support ( #445 )
...
* Add a cudnn feature to be used for conv2d.
* Allocate the proper workspace.
* Only create a single cudnn handle per cuda device.
* Proper cudnn usage.
* Bugfix.
2023-08-14 21:30:41 +01:00
e29c7809ec
Parallelise the CPU kernels for the conv ops. ( #401 )
...
* Parallelise the conv2d op.
* Tighter control on threading.
* Also parallelise conv1d.
* Add some safety comment.
2023-08-11 05:51:58 +01:00
ff53f38467
Small example for benchmarking some cpu ops ( #394 )
...
* Refactor the benchmark example.
* Rename the example.
* Add some comments.
2023-08-10 17:00:17 +01:00
c8039579a5
Conv1d optimize ( #392 )
...
* Reorder the conv1d loops in the cpu backend.
* Optimize the 1d convolution.
* Conv1D optimize.
* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
3bbc08a8df
Fix randn cpu ( #382 )
...
* Change distributions
Standard generates in [0, 1), Normal is correct.
* Add test
Not sure if this is the best place to put the test
* Remove unnecessary use
2023-08-10 05:33:44 +01:00
b278834267
Support the Accelerate BLAS on macOS. ( #325 )
...
* Add the accelerate feature.
* Ffi tweaks.
2023-08-05 17:25:24 +01:00
4fe8a02f88
Update the repo location. ( #305 )
2023-08-02 11:12:18 +01:00
d38943aadc
Add version numbers for all the candle crates ( #303 )
...
* Switch to candle-gemm for the time being.
* Add the missing versions.
2023-08-02 10:52:13 +01:00
51e51da896
Rename the candle crate to candle-core ( #301 )
...
* Rename to candle-core.
* More candle-core renaming.
2023-08-02 08:20:22 +01:00
104f89df31
Centralize the dependency versions and inherit them. ( #177 )
2023-07-16 07:47:17 +01:00
4ed56d7861
Removing cuda default.
...
Seems very important for a lot of exploring users usually on laptop
without GPUs.
Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
f29b77ec19
Random initializers. ( #128 )
...
* Random initialization.
* CPU rng generation.
2023-07-10 18:26:21 +01:00
548b1df7ea
Remove the dependency to blas and use mkl directly. ( #125 )
2023-07-10 15:52:03 +01:00
9ce0f1c010
Sketch the candle-nn crate. ( #115 )
...
* Sketch the candle-nn crate.
* Tweak the cuda dependencies.
* More cuda tweaks.
2023-07-10 08:50:09 +01:00
02b5c38049
Use cublas bf16. ( #101 )
2023-07-07 08:00:12 +01:00
c297a50960
Add mkl support for matrix multiply. ( #86 )
...
* Fix some rebase issues.
* Use mkl instead.
* Use mkl in bert.
* Add the optional mkl feature.
* Conditional compilation based on the mkl feature.
* Add more mkl support.
2023-07-06 11:05:05 +01:00
fdb1acd2ff
Move llama in a cargo-examples directory.
2023-07-03 11:30:58 +01:00
639270b796
Use the patched gemm for the time being.
2023-07-03 10:29:15 +01:00
783b7054ee
Move more safetensors bits to the shared module.
2023-07-03 09:34:08 +01:00
e27ee98d3f
Add backtraces.
2023-06-29 13:17:20 +01:00
e29dae044d
Tmp.
2023-06-28 14:56:38 +00:00
ca6aa8ff12
Use num-cpus to enable parallelism.
2023-06-27 14:42:26 +01:00