candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Author	SHA1	Message	Date
Nicolas Patry	6381023982	Adding `cuda` feature for easier integration with extensions.	2025-04-15 16:28:51 +02:00
Nicolas Patry	ec6d7ca773	Cudarc static-linking enabled.	2025-03-29 09:27:53 +01:00
Nicolas Patry	9862cd3ba2	Splitting the features to enable different mkl linking.	2025-03-28 10:13:13 +01:00
Doug A	0af3e428ec	fix: place `ug` dep behind `not wasm32` flag (#2760 ) * place `ug` behind not wasm32 attr so that wasm32 can compile * mv `ug` to conditional target dep assuming every non-wasm32 user wants this	2025-02-01 23:05:52 +01:00
Laurent Mazare	0e2c8c17fb	UG metal integration. (#2580 )	2024-10-27 15:20:37 +01:00
Laurent Mazare	594d984f9c	Support for UG kernels. (#2579 ) * Support for UG kernels. * Add a dedicated test.	2024-10-27 13:37:19 +01:00
Laurent Mazare	25960676ca	Add a basic metal example with capture (#2324 ) * Add some tracing. * Get the trace to work.	2024-07-09 12:38:11 +02:00
Kyle McCarthy	402349d120	feat(bf16): add cast support + tests for cast + bin ops (#1524 )	2024-01-11 15:49:13 +01:00
ivarflakstad	9f0c99f0c1	Seperate benchmarks by enabled features (#1538 ) * Use cfg to seperate benchmark results based on features * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Derive bench_name from actual device * Run CPU benchmarks even when GPU feature is enabled --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-11 15:35:38 +01:00
Nicolas Patry	b4cb982e49	Simplifying our internal cargo dependencies. (#1529 )	2024-01-07 12:04:14 +01:00
Laurent Mazare	d35f0a1376	Bump the crate version to 0.3.3. (#1490 )	2023-12-28 13:38:30 +01:00
Nicolas Patry	9fc210fae8	Merge pull request #1318 from huggingface/metal4 Starting to fix some tests.	2023-12-20 15:37:31 +01:00
Nicolas Patry	9b5e4843a6	Optimizing decode matmul (Phi at 28tok/s on M3). Adding some benchmark in order to help checking out matmul performance.	2023-12-20 09:54:19 +01:00
Laurent Mazare	94817dac56	Bump the crate version to 0.3.2. (#1452 )	2023-12-17 05:34:53 -06:00
Nicolas Patry	c66e5d4716	Fix comments.	2023-11-20 14:13:44 +01:00
Nicolas Patry	39406a6721	Adding the actual backend	2023-11-20 14:12:56 +01:00
Laurent Mazare	a209ce8ceb	Update for 0.3.1. (#1324 )	2023-11-11 18:48:52 +00:00
Nicolas Patry	26c4e5bf1d	Metal part 1 - Scaffolding for metal. (#1308 ) * Metal part 1 - Scaffolding for metal. * Remove tracing.	2023-11-10 08:35:48 +01:00
Laurent Mazare	096dee7073	Bump the version to 0.3.0. (#1014 ) * Bump the version to 0.3.0. * Changelog update.	2023-10-01 13:51:57 +01:00
Laurent Mazare	ccf352f3d1	Use yoke to provide a self-referential container for mmaped safetenso… (#939 ) * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files.	2023-09-23 15:43:11 +01:00
Laurent Mazare	7dd8e12472	Bump the crate versions to v0.2.3. (#886 ) * Bump the crate version. * Also update the python bindings.	2023-09-18 12:14:03 +01:00
Laurent Mazare	2257f4d475	Bump the crate version + update the changelog. (#822 )	2023-09-12 06:39:24 +01:00
Laurent Mazare	618f4e4c78	Add some documentation. (#673 ) * Add some documentation. * Bump the crate version.	2023-08-30 11:54:00 +01:00
Laurent Mazare	a3f97c143d	Bump the crate version + update CHANGELOG. (#628 )	2023-08-27 18:17:11 +01:00
Laurent Mazare	aba1e90797	Add some group parameter to convolutions. (#566 ) * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.	2023-08-23 12:58:55 +01:00
Laurent Mazare	a8f61e66cc	Bump the crates version to 0.1.2. (#522 )	2023-08-20 08:07:07 +01:00
Laurent Mazare	531f23b4d0	Rename vec-dot to vec-ops. (#449 ) * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.	2023-08-15 10:48:57 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Laurent Mazare	ff53f38467	Small example for benchmarking some cpu ops (#394 ) * Refactor the benchmark example. * Rename the example. * Add some comments.	2023-08-10 17:00:17 +01:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Lei	3bbc08a8df	Fix randn cpu (#382 ) * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use	2023-08-10 05:33:44 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	4fe8a02f88	Update the repo location. (#305 )	2023-08-02 11:12:18 +01:00
Laurent Mazare	d38943aadc	Add version numbers for all the candle crates (#303 ) * Switch to candle-gemm for the time being. * Add the missing versions.	2023-08-02 10:52:13 +01:00
Laurent Mazare	51e51da896	Rename the candle crate to candle-core (#301 ) * Rename to candle-core. * More candle-core renaming.	2023-08-02 08:20:22 +01:00
Laurent Mazare	104f89df31	Centralize the dependency versions and inherit them. (#177 )	2023-07-16 07:47:17 +01:00
Nicolas Patry	4ed56d7861	Removing cuda default. Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.	2023-07-14 16:52:15 +02:00
Laurent Mazare	f29b77ec19	Random initializers. (#128 ) * Random initialization. * CPU rng generation.	2023-07-10 18:26:21 +01:00
Laurent Mazare	548b1df7ea	Remove the dependency to blas and use mkl directly. (#125 )	2023-07-10 15:52:03 +01:00
Laurent Mazare	9ce0f1c010	Sketch the candle-nn crate. (#115 ) * Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks.	2023-07-10 08:50:09 +01:00
Laurent Mazare	02b5c38049	Use cublas bf16. (#101 )	2023-07-07 08:00:12 +01:00
Laurent Mazare	c297a50960	Add mkl support for matrix multiply. (#86 ) * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.	2023-07-06 11:05:05 +01:00
laurent	fdb1acd2ff	Move llama in a cargo-examples directory.	2023-07-03 11:30:58 +01:00
laurent	639270b796	Use the patched gemm for the time being.	2023-07-03 10:29:15 +01:00
laurent	783b7054ee	Move more safetensors bits to the shared module.	2023-07-03 09:34:08 +01:00
laurent	e27ee98d3f	Add backtraces.	2023-06-29 13:17:20 +01:00
Ubuntu	e29dae044d	Tmp.	2023-06-28 14:56:38 +00:00
laurent	ca6aa8ff12	Use num-cpus to enable parallelism.	2023-06-27 14:42:26 +01:00

1 2

51 Commits