candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 02:38:10 +00:00

Author	SHA1	Message	Date
Eric Buehler	0f5cbb08b3	Add support for Llama 3.1 (#2359 ) * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable	2024-07-26 21:32:26 +02:00
Thomas Santerre	0067fe00a8	Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056 ) * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case	2024-04-21 00:10:33 +02:00
Thomas Santerre	4c88c3ce06	Add benchmarks for qmatmul operations (#2048 ) * Add qmatmul bench * add all dtypes	2024-04-13 12:30:14 +02:00
Thomas Santerre	9563a5fee4	Add support for conv_transpose2d on Metal backend (#1903 ) * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal	2024-03-21 18:08:45 +01:00
Ivar Flakstad	ecf88a6d38	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-14 17:10:54 +01:00
Nicolas Patry	bafe95b660	Fix format. (#1576 )	2024-01-12 14:23:17 +01:00
ivarflakstad	a3d92ab226	Metal: Activate bfloat affine and add benchmark (#1543 ) * Use cfg to seperate benchmark results based on features * Add bfloat affine and benchmarks * Fix flops calculation * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-01-12 11:19:49 +01:00
ivarflakstad	e90bcdcc7c	Metal: f16 and bf16 where_cond + benchmark (#1545 ) * Use cfg to seperate benchmark results based on features * Add metal where_cond for f16 and bf16. Add benchmark * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Updated feature separated benchmarks --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-12 11:18:11 +01:00
Ivar Flakstad	e63bb8661b	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-12 07:19:58 +01:00
ivarflakstad	9f0c99f0c1	Seperate benchmarks by enabled features (#1538 ) * Use cfg to seperate benchmark results based on features * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Derive bench_name from actual device * Run CPU benchmarks even when GPU feature is enabled --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-11 15:35:38 +01:00
Ivar Flakstad	87efb5d8eb	Updated feature separated benchmarks	2024-01-09 19:04:31 +01:00
Ivar Flakstad	ad181f9cdc	Merge branch 'ivarflakstad/seperate-benchmarks-by-feature' into ivarflakstad/metal-prng	2024-01-09 18:55:40 +01:00
Ivar Flakstad	88945f2c22	Improve benchmarks layout	2024-01-09 18:31:28 +01:00
Laurent	fb05af4c42	Avoid some unnecessary returns.	2024-01-08 07:19:59 +01:00
Ivar Flakstad	ad075a5f7e	Remove allow pragma	2024-01-08 06:48:33 +01:00
Ivar Flakstad	3f04a79ada	Use cfg to seperate benchmark results based on features	2024-01-07 14:40:15 +01:00
Ivar Flakstad	6bf52b9fdf	Gaussian normal distribution of PRNG via Box-Muller transform	2024-01-07 11:39:46 +01:00
Ivar Flakstad	955e63c803	Implement hybrid Tausworthe + LCG psuedo random number generator in metal	2024-01-05 13:27:59 +01:00
Laurent Mazare	ceb78d3e28	Sketch the minimal mamba example. (#1465 ) * Sketch the minimal mamba example. * Fix rustfmt. * Forward pass for mamba. * Finish the forward pass. * Inference fixes. * Bugfixes. * More fixes. * Add a readme.	2023-12-22 00:28:50 +01:00
Nicolas Patry	9b5e4843a6	Optimizing decode matmul (Phi at 28tok/s on M3). Adding some benchmark in order to help checking out matmul performance.	2023-12-20 09:54:19 +01:00

20 Commits