candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-17 19:18:50 +00:00

Author	SHA1	Message	Date
laurent	1ad5baecc5	Handle transposed matrixes in cublas.	2023-06-26 17:49:29 +01:00
laurent	3761f02aa8	Use atomicAdd as a quick workaround some cuda synchronisation issue.	2023-06-26 16:31:24 +01:00
laurent	f2ac5547fc	Avoid the race condition on cuda sums.	2023-06-26 16:19:06 +01:00
laurent	cd2a171c06	Add the where kernels.	2023-06-26 13:25:02 +01:00
laurent	f6104c4b64	Add the reduce-sum kernel.	2023-06-26 12:35:26 +01:00
laurent	16f0f5b9d2	Add a cuda kernel for embeddings.	2023-06-26 11:47:57 +01:00
laurent	8ed350dc94	Add a couple unitary ops.	2023-06-23 20:19:20 +01:00
laurent	88187b784b	Also optimize the contiguous case for the binary cuda kernels.	2023-06-23 19:04:13 +01:00
laurent	5ca309ecb0	Optimize the unary cuda kernels for the contiguous case.	2023-06-23 18:40:15 +01:00
laurent	4c8931d2e4	More u32 support.	2023-06-23 14:54:03 +01:00
laurent	f8848db001	Fix the gelu kernel for f16.	2023-06-23 13:38:54 +01:00
Nicolas Patry	09b7731b8d	Fix unary op.	2023-06-23 13:10:26 +02:00
Nicolas Patry	56ae71dd4c	Address comments.	2023-06-23 13:08:04 +02:00
Nicolas Patry	fd21c708ab	Creating Gelu op (no backward).	2023-06-23 13:07:39 +02:00
laurent	1a90f9d3a6	Cuda implementation for copying data around.	2023-06-23 11:18:29 +01:00
laurent	065b7a19c7	Stride support for unary ops.	2023-06-22 15:46:34 +01:00
laurent	5b1ab5b687	Support strides in affine.	2023-06-22 15:38:42 +01:00
laurent	5276755fb3	Add cuda support for unary ops.	2023-06-22 15:12:59 +01:00
laurent	b8f514d9c6	Add more binary kernels.	2023-06-22 14:07:02 +01:00
laurent	e1eb86db61	Add some first binary op (add).	2023-06-22 13:52:02 +01:00
laurent	83d6198009	Simplify the binary kernels.	2023-06-22 13:16:03 +01:00
laurent	4b1c3405e9	Add a couple cuda kernels from dfdx.	2023-06-22 12:56:29 +01:00
laurent	083ced4428	Integrate the kernels bits.	2023-06-22 09:59:00 +01:00

23 Commits