candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 03:54:56 +00:00

Author	SHA1	Message	Date
laurent	117f014b55	Add where_cond and properly apply the causal mask.	2023-06-25 21:08:03 +01:00
laurent	817e4b5005	Rework the embeddings so that it works on non-contiguous weights + factor out some code.	2023-06-25 17:37:47 +01:00
laurent	3852a85af3	Boilerplate code for the sum operator.	2023-06-25 09:35:17 +01:00
laurent	d6cb4f1c53	Add the source offset when copying the data around.	2023-06-24 08:35:49 +01:00
laurent	d0a91db8fd	Softmax cpu implementation.	2023-06-23 22:26:53 +01:00
laurent	8443963d4f	Skeleton implementation for softmax.	2023-06-23 22:00:13 +01:00
laurent	5d44e76e3f	Add the casting operation.	2023-06-23 21:22:07 +01:00
laurent	92da45879c	Dummy broadcast placeholder functions.	2023-06-23 14:07:05 +01:00
Nicolas Patry	2fb87edda5	Address comments.	2023-06-23 13:43:18 +02:00
Nicolas Patry	5e54f37fe1	Adding embedding op (not generic gather, no select).	2023-06-23 13:13:26 +02:00
laurent	1a90f9d3a6	Cuda implementation for copying data around.	2023-06-23 11:18:29 +01:00
laurent	4712dcc2f6	Actually copy the data around in cat (cpu only).	2023-06-23 10:24:02 +01:00
laurent	6110db31c9	Add the cat operator (without the storage implementation for now).	2023-06-23 10:13:37 +01:00
laurent	cc78900922	Start adding the cublas based matmul.	2023-06-22 18:45:10 +01:00
laurent	683730c21d	Add the cublas handle to the cuda device.	2023-06-22 18:03:53 +01:00
laurent	7d9a8ff3f9	Do not ignore errors when cloning the storage.	2023-06-22 16:29:18 +01:00
laurent	836ad5f76c	Remove one level of indirection for the binary and unary ops.	2023-06-22 15:20:51 +01:00
laurent	5276755fb3	Add cuda support for unary ops.	2023-06-22 15:12:59 +01:00
laurent	b8f514d9c6	Add more binary kernels.	2023-06-22 14:07:02 +01:00
laurent	e1eb86db61	Add some first binary op (add).	2023-06-22 13:52:02 +01:00
Nicolas Patry	ce977b489e	Adding matmul?	2023-06-22 12:25:58 +02:00
laurent	97d9142dee	Add a first kernel.	2023-06-21 20:48:22 +01:00
laurent	71735c7a02	Move the data between the host and the device.	2023-06-21 19:43:25 +01:00
laurent	2bfe8f18ab	Start adding support for cuda.	2023-06-21 18:11:56 +01:00
laurent	eb52b9b343	Move the cpu backend specific bits apart.	2023-06-21 10:25:56 +01:00
laurent	8cde0c5478	Add some skeleton code for GPU support.	2023-06-21 09:13:57 +01:00
laurent	3a5405ca6d	Move the StridedIndex in its own module.	2023-06-21 07:44:36 +01:00
laurent	78bac0ed32	Add a couple operators.	2023-06-20 22:32:11 +01:00
laurent	f1f372b13e	Add the affine transformation.	2023-06-20 21:51:35 +01:00
laurent	98b423145a	Bugfix for the contiguous strides.	2023-06-20 13:35:07 +01:00
laurent	d9cb1917ce	Add some unary ops.	2023-06-20 12:04:01 +01:00
laurent	f5b0aa815a	Get the addition/multiplication to work.	2023-06-20 11:07:59 +01:00
laurent	6c5fc767a8	Add the slice indexing.	2023-06-20 10:50:58 +01:00
laurent	786544292d	Add more to the binary operators.	2023-06-20 09:49:40 +01:00
laurent	bcae61b7f2	Cosmetic changes.	2023-06-19 21:30:03 +01:00
laurent	26d6288eb6	Add an easy way to create tensor objects.	2023-06-19 20:59:26 +01:00
laurent	8e2c534d1f	Flesh out some ops bits.	2023-06-19 19:28:33 +01:00
laurent	ce718bb807	Add the op.	2023-06-19 18:34:54 +01:00
laurent	844704de5c	Split the tensor file.	2023-06-19 17:34:13 +01:00

39 Commits