candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-22 20:38:06 +00:00

Author	SHA1	Message	Date
Laurent Mazare	1df2bddccf	Add the layernorm specialized op. (#2212 ) * Add the layernorm cuda kernels. * Dedicated layer norm op. * Add the slower variant. * Plug the cuda implementation. * Add the metal variant. * Add a dedicated test. * Bugfix.	2024-05-24 15:58:01 +02:00
Laurent Mazare	2ac302a5d1	Add the rope THD kernel. (#2014 ) * Add the rope THD kernel. * Cuda kernel for rope-thd. * Add the metal kernels. * Add a dedicated test.	2024-04-05 08:32:58 +02:00
Thomas Santerre	5aebe53dd2	update dtypes checks for several metal operations (#2010 )	2024-04-04 18:39:06 +02:00
Laurent Mazare	1e46cf8b19	Minor cleanups in reduce.metal. (#2004 )	2024-04-04 08:26:02 +02:00
Thomas Santerre	bd8db2a771	refactor to reduce the amount of code wrapped in template syntax (#2002 )	2024-04-04 08:13:12 +02:00
Laurent Mazare	e7f8e72588	Contiguous variant of the rope kernel. (#1929 ) * Contiguous variant of the rope kernel. * Add the cuda kernel. * Metal kernel.	2024-03-25 09:11:20 +01:00
Laurent Mazare	1b98f84a2b	Fast kernels for rotary embeddings. (#1928 ) * Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.	2024-03-24 22:48:52 +01:00
Laurent Mazare	0fddec762e	RmsNorm kernel for metal. (#1895 ) * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.	2024-03-21 09:48:56 +01:00
ivarflakstad	d3bdd788cf	Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version check (#1540 )	2024-01-10 18:50:30 +01:00
Gonzalo	87d7f81b43	Metal: more u8/u32 (#1502 ) * Adds more metal u8 * Metal: more u32	2023-12-29 23:56:21 +01:00
Gonzalo	4373534d59	Metal: i64 basic support (#1495 ) * Adds basic metal i64 support * metal copy i64	2023-12-29 19:42:50 +01:00
Nicolas Patry	972903021c	Finish reduce kernels.	2023-12-17 19:07:00 +01:00
Nicolas Patry	26540641c1	Renamed all kernel names.	2023-12-15 11:24:47 +01:00
Nicolas Patry	ece4c69a68	Fixing softmax.	2023-12-15 01:35:08 +01:00
Nicolas Patry	4eeaf205d6	Fix softmax for long sequences (missing barrier).	2023-12-14 19:37:03 +01:00
nicolas	87dc559817	Lots of updates including some stack of command buffers.	2023-12-12 17:41:56 +01:00
Nicolas Patry	4349ff1fc2	Starting to fix some tests. Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit `c65f68e988`. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.	2023-11-30 11:30:31 +01:00
Nicolas Patry	f82bf2d915	Adding indexing. Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2023-11-20 14:12:57 +01:00
Nicolas Patry	39406a6721	Adding the actual backend	2023-11-20 14:12:56 +01:00

19 Commits