candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-18 03:28:50 +00:00

Author	SHA1	Message	Date
Laurent Mazare	957d604a78	Enable BF16 on metal. (#2380 )	2024-08-01 11:05:07 +02:00
Laurent Mazare	8696cf6494	Enable the affine kernel for u8/u32. (#2376 )	2024-08-01 10:03:11 +02:00
Laurent Mazare	6baa1d486b	Fix a bug in the metal implemtation of col2im1d. (#2284 )	2024-06-22 23:21:20 +02:00
Lionel Touati	1ec3b2cc18	add where_cond f32 for metal (#2236 )	2024-06-02 14:30:06 +02:00
Laurent Mazare	0814dfd148	Add a metal kernel for col2im1d. (#2214 ) * Add a metal kernel for col2im1d. * Enable the col2im variant. * Bugfix. * Revert the quantized tweak.	2024-05-25 11:03:23 +02:00
Laurent Mazare	01794dc16e	Use write rather than try-write on the metal rw-locks. (#2162 )	2024-05-05 07:22:46 +02:00
Laurent Mazare	b13a82a438	Separate quantized phi-3 implementation. (#2157 ) * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.	2024-05-04 10:14:57 +02:00
Laurent Mazare	96a48e5cc4	Add argsort. (#2132 ) * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.	2024-04-27 20:17:35 +02:00
Laurent Mazare	8a05743a21	Add StorageRef. (#2113 ) * Add the storage-ref bits. * Add the metal implementation.	2024-04-23 13:23:27 +02:00
Thomas Santerre	0067fe00a8	Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056 ) * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case	2024-04-21 00:10:33 +02:00
ivarflakstad	db7dbf3071	Add missing bfloat unary strided kernels and fix typo (#2058 )	2024-04-14 20:01:13 +02:00
Laurent Mazare	53e5380bf6	Add a synchronize method to devices. (#2055 ) * Add a synchronize method to devices. * Metal version.	2024-04-14 16:32:55 +02:00
Laurent Mazare	a4d5a414e3	Support gather on bf16 for metal. (#2035 )	2024-04-10 12:49:25 +02:00
Laurent Mazare	718671a0d5	Use BufferOffset in metal backend ops. (#2029 ) * Use BufferOffset in the metal backend. * More BufferOffset usage. * Use in where-cond.	2024-04-08 09:37:25 +02:00
Laurent Mazare	c5fe4a7f89	Rework the buffer offset logic for metal kernels (#2028 ) * Move the metal kernels utils in a separate module. * Use the BufferOffset for unary ops. * Fix clippy lints. * Use the new BufferOffset. * Adapt the binary ops. * Affine. * More ops (powf, elu, cast).	2024-04-07 22:37:53 +02:00
Thomas Santerre	c5626b8271	Add support for "sign" on tensors (#2012 ) * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-04 22:32:47 +02:00
Thomas Santerre	5aebe53dd2	update dtypes checks for several metal operations (#2010 )	2024-04-04 18:39:06 +02:00
Laurent Mazare	665da30487	Backend refactoring. (#1966 ) * Backend refactoring. * Metal tweaks. * Move the cudnn module.	2024-03-29 23:02:11 +01:00

18 Commits