candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 19:58:35 +00:00

Author	SHA1	Message	Date
Nicolas Patry	0c24a885a6	Updated everything and output a trace.	2023-11-07 21:12:42 +01:00
Ivar Flakstad	8124d1003f	Affine metal kernel works. Need to extract buffer contents based on layout offset (like CudaSlice.slice) for candle intergration	2023-11-06 04:46:56 +01:00
Nicolas Patry	7161002a34	Finished scaffolding, lots of TODOs - Most kernels just copy themselfs to get the shapes correct - Matmul works only in 1 case and simply empty allocates otherwise - Logits and randomized to make the demo finish itself. Performance is quite bad (30ms/token), but lot's of prints and allocs and some actual sending to metal. Couln't get it super high by removing the obvious blockers (println + the actual running matmuls). Allocations takes between 1us and 100us and seems very stable, Maybe metal doesn't really have a smart allocator and we'll need to own it.	2023-11-02 15:32:28 +01:00
Laurent Mazare	55bc3382cf	Allow for different behavior between training and eval (#1213 ) * Forward with training. * Do not use dropout on vgg evaluation.	2023-10-29 07:53:09 +01:00
Laurent Mazare	34f2ecbc3b	Fix the leaky relu. (#898 )	2023-09-19 18:17:17 +01:00
Laurent Mazare	30be5b6660	Replication pad (#861 ) * Add the embed mapper convolutions. * Add the replication pad layer. * Use the replication-pad op. * Tweak a todo.	2023-09-15 14:06:21 +01:00
Laurent Mazare	2746f2c4be	DiffNeXt/unet (#859 ) * DiffNeXt/unet * Start adding the vae. * VAE residual block. * VAE forward pass. * Add pixel shuffling. * Actually use pixel shuffling.	2023-09-15 10:14:02 +01:00
Laurent Mazare	130fe5a087	Add the upblocks. (#853 )	2023-09-14 22:24:56 +01:00
Laurent Mazare	a0d65585db	Softmax implementation for cuda. (#747 )	2023-09-05 18:38:03 +01:00
Laurent Mazare	6615daf242	Tweaks to softmax. (#745 )	2023-09-05 15:22:27 +01:00
Laurent Mazare	1c9e5394a5	Add a custom softmax implementation. (#744 ) * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.	2023-09-05 14:20:23 +01:00
Laurent Mazare	2047d34b7c	More robust tests (so that they pass on accelerate). (#679 )	2023-08-30 18:10:10 +01:00
Laurent Mazare	3159982a89	Add a Dropout layer (#676 ) * Add a dropout layer. * Add an actual layer.	2023-08-30 16:19:28 +01:00
Laurent Mazare	5bb2fce998	Implement group-norm. (#334 ) * Implement group-norm. * Add some testing for group-norm.	2023-08-07 06:53:05 +01:00
Laurent Mazare	d34039e352	Add a stable diffusion example (#328 ) * Start adding a stable-diffusion example. * Proper computation of the causal mask. * Add the chunk operation. * Work in progress: port the attention module. * Add some dummy modules for conv2d and group-norm, get the attention module to compile. * Re-enable the 2d convolution. * Add the embeddings module. * Add the resnet module. * Add the unet blocks. * Add the unet. * And add the variational auto-encoder. * Use the pad function from utils.	2023-08-06 17:49:43 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	1f26042693	Move some shared functions to the nn module. (#221 )	2023-07-22 13:25:11 +01:00

17 Commits