candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 11:56:45 +00:00

Author	SHA1	Message	Date
Thomas Santerre	fee33b45c2	Add support for strided index-select on Metal (#1909 ) * initial implementation * use correct index, but still not breaking like it should have... * fix test	2024-03-22 07:30:02 +01:00
Laurent Mazare	6708870e63	Add the alloc_uninit function. (#1901 ) * Add the alloc_uninit function. * Dummy metal fix. * Lazy initialization.	2024-03-22 07:25:23 +01:00
Thomas Santerre	9563a5fee4	Add support for conv_transpose2d on Metal backend (#1903 ) * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal	2024-03-21 18:08:45 +01:00
Laurent Mazare	ec97c98e81	Async tensor copying. (#1900 )	2024-03-21 13:09:42 +01:00
Laurent Mazare	469635a3eb	Minor cleanup. (#1885 )	2024-03-20 14:38:27 +01:00
Thomas Santerre	2a8679509e	Add support for conv_transpose1d for metal backend (#1874 ) * first attempt * progress * integrate into metal backend * finish and get test passing * add other dtype support * update transpose1d dtypes supported	2024-03-19 08:46:58 +01:00
Thomas Santerre	04a61a9c72	Add avg_pool2d metal implementation for the metal backend (#1869 ) * implement metal avg pool 2d * fixX * add suggested precision workaround for the accumulator	2024-03-18 18:50:14 +01:00
Thomas Santerre	754fa1e813	Add support for max_pool2d for Metal backend (#1863 ) * first pass at implementation of maxpool2d * Add definitions for other dtypes * add tests for other dtypes * Cosmetic tweaks + re-enable maxpool2d tests for metal. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-03-18 08:33:30 +01:00
Thomas Santerre	184105792f	add test for index add and add missing match statements (#1862 )	2024-03-17 22:19:12 +01:00
Thomas Santerre	e316cb6997	add support for casting between all datatypes (#1860 )	2024-03-17 20:55:11 +01:00
Laurent Mazare	ce9fbc3682	Optimize the cat operation on contiguous tensors (#1855 ) * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.	2024-03-17 10:49:13 +01:00
Thomas Santerre	db8b24ae92	Add support for index u8/i64 and input f16/bf16 scatter-add on metal (#1849 ) * add support and tests for scatter add on metal * add support for all datatypes	2024-03-17 08:09:43 +01:00
Niklas Hallqvist	be5b68cd0b	Metal random-generation bug fixes (#1811 ) * use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together. * The seeding was incorrect and used the address instead of the value of the passed in seed. * Add a check that likely exhibits failure to update the seed between generation of random tensors. * Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units. * By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted. * Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted." This reverts commit `d7302de9` Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119 * The Metal random kernel failed to set element N/2 of tensors with N elements, N being even. The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number. In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors. * Add a test catching any deterministic tensor element in rand and randn output. --------- Co-authored-by: niklas <niklas@appli.se> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>	2024-03-08 16:11:50 +01:00
ivarflakstad	0c09d10f32	Improve metal buffer usage (#1807 ) * Improve metal buffer usage * Clone cpu storage when loading to reduce wait_until_complete calls * Use powers of two for buffer sizes so reuse is more likely. * Select best available buffer by size. * Add count to MetalStorage -> can use buffer with different size Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co> * Simplify new buffer creation without blit copy. Revert &[] -> Vec * Add documentation on newBufferWithBytes safety / synchronization * Drop unused buffers after command buffer is done syncing. --------- Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>	2024-03-07 09:42:34 +01:00
Laurent Mazare	09e0148cce	Tweaks to run metavoice on metal (#1792 ) * Enable tanh + tweak conv-transpose. * Run the encodec decoding on cpu. * Clippy fixes.	2024-03-03 07:46:44 +01:00
Laurent Mazare	2f22afd80e	Cuda acceleration for quantized model. (#1754 ) * Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.	2024-02-25 18:11:47 +01:00
OlivierDehaene	b60064780d	feat: add silu activation function (#1706 ) * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node	2024-02-14 10:27:22 +01:00
Ivar Flakstad	db923517b3	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-17 18:03:57 +01:00
Nicolas Patry	403680f17d	Quantized GGUF style (#1523 ) * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>	2024-01-17 10:27:58 +01:00
Ivar Flakstad	86a8e58897	Update metal random kernel and set_seed method * set_seed via buffer content pointer copy + did_modify_range * ensure random.metal kernel does not write outside of buffer range when tid==0	2024-01-17 09:12:44 +01:00
Ivar Flakstad	79478ff5a1	Seed should be updated by random kernel result.	2024-01-15 11:58:25 +01:00
Ivar Flakstad	ecf88a6d38	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-14 17:10:54 +01:00
ivarflakstad	a3d92ab226	Metal: Activate bfloat affine and add benchmark (#1543 ) * Use cfg to seperate benchmark results based on features * Add bfloat affine and benchmarks * Fix flops calculation * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-01-12 11:19:49 +01:00
ivarflakstad	e90bcdcc7c	Metal: f16 and bf16 where_cond + benchmark (#1545 ) * Use cfg to seperate benchmark results based on features * Add metal where_cond for f16 and bf16. Add benchmark * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Updated feature separated benchmarks --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-01-12 11:18:11 +01:00
Ivar Flakstad	e63bb8661b	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-12 07:19:58 +01:00
Kyle McCarthy	402349d120	feat(bf16): add cast support + tests for cast + bin ops (#1524 )	2024-01-11 15:49:13 +01:00
Juarez Bochi	ae06cb74bb	Add relu kernel for metal (#1488 ) * Add relu kernel for metal * Copy error messages proposed in #1491 * Revert non relu changes * Fix name changes * Fix the last of us (: * Fix copy and paste mistakes * Fix typo * Revert order changes * Revert order change * Add deleted functions back * Run rustfmt	2024-01-10 18:27:17 +01:00
Ivar Flakstad	6ebe043273	Merge branch 'main' into ivarflakstad/metal-prng	2024-01-07 11:52:03 +01:00
Ivar Flakstad	6bf52b9fdf	Gaussian normal distribution of PRNG via Box-Muller transform	2024-01-07 11:39:46 +01:00
Ivar Flakstad	955e63c803	Implement hybrid Tausworthe + LCG psuedo random number generator in metal	2024-01-05 13:27:59 +01:00
Nicolas Patry	fa3ea98ba9	Adding bfloat16 support for the cast kernels. (#1520 )	2024-01-04 12:12:56 +01:00
Gonzalo	0a245e6fa4	Metal: support unary abs (#1503 ) * Metal: support unary abs * cargo fmt	2023-12-30 00:00:12 +01:00
Gonzalo	87d7f81b43	Metal: more u8/u32 (#1502 ) * Adds more metal u8 * Metal: more u32	2023-12-29 23:56:21 +01:00
Gonzalo	4373534d59	Metal: i64 basic support (#1495 ) * Adds basic metal i64 support * metal copy i64	2023-12-29 19:42:50 +01:00
Nicolas Patry	488e02a3f6	Merge pull request #1496 from bayedieng/unary Implement urecip op for metal backend	2023-12-29 12:20:52 +01:00
Baye Dieng	cc06ba2294	fix bad pattern matching and function name	2023-12-29 09:46:24 +00:00
Baye Dieng	3922b42c18	add urecip op to metal backend	2023-12-28 21:50:12 +00:00
Gonzalo	8e93e76a91	fixes error message	2023-12-28 15:03:05 -03:00
Gonzalo	b3e838f3e2	cargo fmt	2023-12-28 14:07:34 -03:00
Gonzalo	8bf892403a	Improves metal's not implemented error messages	2023-12-28 11:04:06 -03:00
Nicolas Patry	13a5d15ebc	Adding upsample_nearest_2d.	2023-12-25 14:25:19 +01:00
Nicolas Patry	95e18ef675	Fixing matmul for convolutions.	2023-12-25 12:29:34 +01:00
Nicolas Patry	10d94659c3	Adding the convolutions (1d + 2d) to candle on metal.	2023-12-21 10:39:24 +01:00
Nicolas Patry	03641293ee	Clippy pass.	2023-12-18 15:22:43 +01:00
Nicolas Patry	e8ee253ee0	Missing cast.	2023-12-18 11:01:18 +01:00
Nicolas Patry	8bd3d6b94b	Index add.	2023-12-18 10:46:01 +01:00
Nicolas Patry	6a3ca7da0c	Scatter add.	2023-12-18 10:32:22 +01:00
Nicolas Patry	586b6f6fff	Adding gather op.	2023-12-17 23:34:12 +01:00
Nicolas Patry	e4b0cc59f5	Adding CMP	2023-12-17 22:32:25 +01:00
Nicolas Patry	972903021c	Finish reduce kernels.	2023-12-17 19:07:00 +01:00

1 2

73 Commits