candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 03:54:56 +00:00

Author	SHA1	Message	Date
Nicolas Patry	eb24875856	Reworked affine and it works ? No idea how it's different.	2023-11-08 02:37:20 +01:00
Nicolas Patry	3f662e54cd	Reworked affine and it works ? No idea how it's different.	2023-11-08 02:34:08 +01:00
Nicolas Patry	480a3e22e6	Adding cast + binary kernels.	2023-11-07 23:45:53 +01:00
Nicolas Patry	0c24a885a6	Updated everything and output a trace.	2023-11-07 21:12:42 +01:00
Nicolas Patry	76d3116f5d	Broken metal ?	2023-11-07 14:20:13 +01:00
Ivar Flakstad	63cce76b84	Improve metal kernel loading and associated errors	2023-11-06 09:48:18 +01:00
Ivar Flakstad	634a4e7168	BlitEncoder added to affine for copying buffer contents quickly.	2023-11-06 08:23:36 +01:00
Ivar Flakstad	8124d1003f	Affine metal kernel works. Need to extract buffer contents based on layout offset (like CudaSlice.slice) for candle intergration	2023-11-06 04:46:56 +01:00
Ivar Flakstad	6d4c8c0707	Use metal encode_gemm	2023-11-06 03:27:22 +01:00
Ivar Flakstad	c921cc3784	Add Arc to metalstorage buffer for quick cloning	2023-11-04 09:03:23 +01:00
Ivar Flakstad	0794e70a19	Debugging index_add.	2023-11-03 12:09:05 +01:00
Nicolas Patry	f57e3164ae	Implemented cos for now.	2023-11-03 01:24:51 +01:00
Nicolas Patry	7161002a34	Finished scaffolding, lots of TODOs - Most kernels just copy themselfs to get the shapes correct - Matmul works only in 1 case and simply empty allocates otherwise - Logits and randomized to make the demo finish itself. Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal. Couln't get it super high by removing the obvious blockers (println + the actual running matmuls). Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.	2023-11-02 15:32:28 +01:00
Ivar Flakstad	82cce52e73	Rename candle-metal -> candle-metal-kernels	2023-11-02 09:53:29 +01:00
Nicolas Patry	71fcb31873	Owned command buffer now.	2023-11-01 18:03:53 +01:00
Nicolas Patry	198009453a	Matmul (no batch, no strided, f32, f32 only) sort of done.	2023-11-01 17:36:51 +01:00
Nicolas Patry	492d164235	More scaffolding, now need to implement matmul (for precompute_cos_sin to work).	2023-11-01 16:54:09 +01:00
Nicolas Patry	2d84c16fed	First pass (Quantized scaffolding work done + quantized example scaffolding).	2023-11-01 15:10:11 +01:00
Ivar Flakstad	4525b7b52a	Initial setup	2023-10-31 18:09:10 +01:00
Lukas Kreussel	c05c0a8213	PyO3: Add `equal` and `__richcmp__` to `candle.Tensor` (#1099 ) * add `equal` to tensor * add `__richcmp__` support for tensors and scalars * typo * more typos * Add `abs` + `candle.testing` * remove duplicated `broadcast_shape_binary_op` * `candle.i16` => `candle.i64` * `tensor.nelements` -> `tensor.nelement` * Cleanup `abs`	2023-10-30 15:17:28 +00:00
Laurent Mazare	5fc66bd4ba	Support negative steps in arange. (#1218 )	2023-10-30 07:40:54 +00:00
Laurent Mazare	154c674a79	Add i64-abs. (#1216 )	2023-10-29 15:28:53 +00:00
Laurent Mazare	7bbde55c61	Marian MT model (#1210 ) * Skeleton files for the marian MT model. * Marian initialization. * Implement the attention forward method. * Forward pass for the encoder side. * Expose the encoder and decoder. * Start plugging the decoder. * Forward pass for the decoder layer. * Set up the marian example. * Add some missing backtraces. * Bugfix.	2023-10-29 15:12:22 +00:00
Laurent Mazare	46d6566c99	Fix the conv2d gradient computation. (#1214 )	2023-10-29 09:50:04 +00:00
Laurent Mazare	55bc3382cf	Allow for different behavior between training and eval (#1213 ) * Forward with training. * Do not use dropout on vgg evaluation.	2023-10-29 07:53:09 +01:00
Laurent Mazare	ef33df7ae2	No need for the even constraint on vecdot-q40-q80. (#1202 )	2023-10-28 07:23:59 +01:00
Laurent Mazare	e2826e70b3	Add a quantized variant of llama2.c (#1197 ) * Add a quantized variant of llama2.c * Clippy fixes.	2023-10-27 15:34:06 +01:00
Laurent Mazare	9b1158b315	Add some missing backtraces. (#1193 )	2023-10-27 06:09:11 +01:00
Laurent Mazare	c698e17619	Enable the test for meshgrid + fix the implementation. (#1175 )	2023-10-25 13:47:54 +01:00
Wouter Doppenberg	e4c9adfdbe	Implemented meshgrid (#1174 ) * Implemented meshgrid * Resolved feedback from LaurentMazare * Rustfmt * Updated docstring * Removed outdated error mode from docstring	2023-10-25 12:49:11 +01:00
Ibiyemi Abiodun	45dbe541bc	fix ucopy for `f64` tensors (#1170 )	2023-10-24 17:06:03 +01:00
KGrewal1	807e3f9f52	derivative for GELU (#1160 ) * derivative for GELU * add tests	2023-10-23 20:23:45 +01:00
Laurent Mazare	8a82d623e5	Handle LongStorage in pytorch checkpoints. (#1152 )	2023-10-22 18:34:36 +01:00
Laurent Mazare	62fc965617	Expose the track-op method. (#1148 )	2023-10-22 06:57:03 +01:00
Laurent Mazare	e8f760ee44	Add get_on_dim. (#1142 )	2023-10-21 15:01:38 +01:00
Laurent Mazare	87eb1658e1	Add pad_with_same. (#1127 ) * More model cloning. * More cloning on quantized models. * Add pad-with-same. * Add some tests.	2023-10-18 23:13:37 +01:00
Laurent Mazare	662c186fd5	Better error message when overflowing in narrow. (#1119 )	2023-10-18 08:40:14 +01:00
Laurent Mazare	6c588c4792	Refactor the pth tensor exctraction. (#1109 )	2023-10-16 18:16:34 +01:00
Laurent Mazare	0106b0b04c	Read all the tensors in a PyTorch pth file. (#1106 )	2023-10-16 13:50:07 +01:00
Laurent Mazare	b73c35cc57	Improve the reshape error messages. (#1096 ) * Improve the reshape error messages. * Add the verbose-prompt flag to the phi example.	2023-10-15 10:43:10 +01:00
Laurent Mazare	8f310cc666	Avoid trying to backprop through non-differentiable layers. (#1094 )	2023-10-14 22:03:41 +01:00
Laurent Mazare	9309cfc47d	Create a new curand instead of reseeding. (#1089 )	2023-10-14 10:03:59 +01:00
Laurent Mazare	7473c4ceca	Fix the npy read function and add some testing. (#1080 )	2023-10-12 15:25:05 +02:00
Laurent Mazare	37dbbff261	Use full tensors for zeros and ones (#1071 ) * Only optimize float tensors. * Use full tensors for zeros and ones.	2023-10-11 08:16:04 +01:00
Laurent Mazare	9fea56d28e	Only optimize float tensors. (#1069 )	2023-10-10 09:05:41 +01:00
Laurent Mazare	b34d7f0248	Remove some unusued bits. (#1067 )	2023-10-09 19:49:57 +01:00
Laurent Mazare	9abeddd750	Make the cuda rng seedable. (#1056 )	2023-10-08 09:32:36 +01:00
Laurent Mazare	aa53368aeb	Better control on the optional dequantization in QMatMul (#1049 ) * Cosmetic change to the quantized whisper model. * Fix the dequantization. * Add the dequantize all variable.	2023-10-07 10:16:18 +01:00
Laurent Mazare	7f7d95e2c3	Add the round-to function. (#1039 )	2023-10-05 20:28:09 +01:00
Gonzalo	8f7973958c	fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 (#1037 ) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt	2023-10-05 18:46:13 +01:00

1 2 3 4 5 ...

502 Commits