candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-23 04:46:15 +00:00

Author	SHA1	Message	Date
Thomas Santerre	fee33b45c2	Add support for strided index-select on Metal (#1909 ) * initial implementation * use correct index, but still not breaking like it should have... * fix test	2024-03-22 07:30:02 +01:00
Thomas Santerre	9563a5fee4	Add support for conv_transpose2d on Metal backend (#1903 ) * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal	2024-03-21 18:08:45 +01:00
Laurent Mazare	0fddec762e	RmsNorm kernel for metal. (#1895 ) * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.	2024-03-21 09:48:56 +01:00
Thomas Santerre	2a8679509e	Add support for conv_transpose1d for metal backend (#1874 ) * first attempt * progress * integrate into metal backend * finish and get test passing * add other dtype support * update transpose1d dtypes supported	2024-03-19 08:46:58 +01:00
Thomas Santerre	04a61a9c72	Add avg_pool2d metal implementation for the metal backend (#1869 ) * implement metal avg pool 2d * fixX * add suggested precision workaround for the accumulator	2024-03-18 18:50:14 +01:00
Thomas Santerre	754fa1e813	Add support for max_pool2d for Metal backend (#1863 ) * first pass at implementation of maxpool2d * Add definitions for other dtypes * add tests for other dtypes * Cosmetic tweaks + re-enable maxpool2d tests for metal. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-03-18 08:33:30 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Niklas Hallqvist	be5b68cd0b	Metal random-generation bug fixes (#1811 ) * use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together. * The seeding was incorrect and used the address instead of the value of the passed in seed. * Add a check that likely exhibits failure to update the seed between generation of random tensors. * Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units. * By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted. * Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted." This reverts commit `d7302de9` Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119 * The Metal random kernel failed to set element N/2 of tensors with N elements, N being even. The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number. In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors. * Add a test catching any deterministic tensor element in rand and randn output. --------- Co-authored-by: niklas <niklas@appli.se> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2024-03-08 16:11:50 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Christopher Fleetwood	6d83d42efb	Merge pull request #1606 from FL33TW00D/feature/larger-batches fix: larger batches	2024-01-29 15:31:10 +00:00
FL33TW00D	b6afb46601	chore: final	2024-01-22 15:15:19 +00:00
FL33TW00D	73d79e6092	chore: actual fix	2024-01-19 09:35:42 +00:00
FL33TW00D	b1879f17f6	chore: switch to buffer	2024-01-19 08:57:49 +00:00
FL33TW00D	4f79f5df8a	fix: larger batches	2024-01-18 14:30:14 +00:00
Ivar Flakstad	80b1c689f9	Revert public EncoderParam	2024-01-17 18:09:28 +01:00
Ivar Flakstad	db923517b3	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-17 18:03:57 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Ivar Flakstad	79478ff5a1	Seed should be updated by random kernel result.	2024-01-15 11:58:25 +01:00
Ivar Flakstad	e63bb8661b	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-12 07:19:58 +01:00
Juarez Bochi	ae06cb74bb	Add relu kernel for metal (#1488 ) * Add relu kernel for metal * Copy error messages proposed in #1491 * Revert non relu changes * Fix name changes * Fix the last of us (: * Fix copy and paste mistakes * Fix typo * Revert order changes * Revert order change * Add deleted functions back * Run rustfmt	2024-01-10 18:27:17 +01:00
Ivar Flakstad	6ebe043273	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-07 11:52:03 +01:00
Ivar Flakstad	6bf52b9fdf	Gaussian normal distribution of PRNG via Box-Muller transform	2024-01-07 11:39:46 +01:00
Ivar Flakstad	955e63c803	Implement hybrid Tausworthe + LCG psuedo random number generator in metal	2024-01-05 13:27:59 +01:00
Gonzalo	0a245e6fa4	Metal: support unary abs (#1503 ) * Metal: support unary abs * cargo fmt	2023-12-30 00:00:12 +01:00
Gonzalo	87d7f81b43	Metal: more u8/u32 (#1502 ) * Adds more metal u8 * Metal: more u32	2023-12-29 23:56:21 +01:00
Gonzalo	4373534d59	Metal: i64 basic support (#1495 ) * Adds basic metal i64 support * metal copy i64	2023-12-29 19:42:50 +01:00
Baye Dieng	cc06ba2294	fix bad pattern matching and function name	2023-12-29 09:46:24 +00:00
Baye Dieng	3922b42c18	add urecip op to metal backend	2023-12-28 21:50:12 +00:00
Nicolas Patry	13a5d15ebc	Adding upsample_nearest_2d.	2023-12-25 14:25:19 +01:00
Nicolas Patry	95e18ef675	Fixing matmul for convolutions.	2023-12-25 12:29:34 +01:00
Nicolas Patry	10d94659c3	Adding the convolutions (1d + 2d) to candle on metal.	2023-12-21 10:39:24 +01:00
Nicolas Patry	9b5e4843a6	Optimizing decode matmul (Phi at 28tok/s on M3). Adding some benchmark in order to help checking out matmul performance.	2023-12-20 09:54:19 +01:00
Nicolas Patry	8bd3d6b94b	Index add.	2023-12-18 10:46:01 +01:00
Nicolas Patry	6a3ca7da0c	Scatter add.	2023-12-18 10:32:22 +01:00
Nicolas Patry	586b6f6fff	Adding gather op.	2023-12-17 23:34:12 +01:00
Nicolas Patry	e4b0cc59f5	Adding CMP	2023-12-17 22:32:25 +01:00
Nicolas Patry	972903021c	Finish reduce kernels.	2023-12-17 19:07:00 +01:00
Nicolas Patry	6bc92e63cb	Addressing a lot of comments.	2023-12-15 13:06:04 +01:00
Nicolas Patry	26540641c1	Renamed all kernel names.	2023-12-15 11:24:47 +01:00
Nicolas Patry	34d83377f6	Better error message on older macos	2023-12-15 11:18:54 +01:00
Nicolas Patry	243e83f2b9	Adding a bunch of docs ! Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2023-12-15 11:03:05 +01:00
Nicolas Patry	cf27868b57	More cleanup.	2023-12-15 01:44:22 +01:00
Nicolas Patry	f419a38e1a	Fix use resource.	2023-12-14 16:52:37 +01:00
Nicolas Patry	361f2ad2af	Working with merging encoders and using fences.	2023-12-14 16:05:33 +01:00
Nicolas Patry	0404a3eb5b	Removed MPSMatrix entirely (buggy).	2023-12-13 16:21:48 +01:00
nicolas	87dc559817	Lots of updates including some stack of command buffers.	2023-12-12 17:41:56 +01:00
Nicolas Patry	4349ff1fc2	Starting to fix some tests. Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit `c65f68e988`. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.	2023-11-30 11:30:31 +01:00
Nicolas Patry	60f624a902	Moving tests around.	2023-11-20 16:17:19 +01:00
Nicolas Patry	dc64adb8e4	Fixing cos_f16 test.	2023-11-20 14:17:07 +01:00
Nicolas Patry	c66e5d4716	Fix comments.	2023-11-20 14:13:44 +01:00

1 2

56 Commits