candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 10:38:54 +00:00

Author	SHA1	Message	Date
Gonzalo	fc59bc31bf	fix: add missing gpu fill_* (#996 )	2023-09-29 15:49:30 +01:00
Gonzalo	01b92cd959	fixes slice_scatter dim type (#988 )	2023-09-29 07:54:45 +01:00
Laurent Mazare	25657804ef	Simd128 q2k vecdot (#982 ) * Sketch the simd128 version of q2k vecdot. * Use a single accumulator.	2023-09-28 12:16:35 +01:00
Laurent Mazare	5e1c595e00	Optimize the index-select cuda kernel. (#976 )	2023-09-28 09:05:29 +01:00
Laurent Mazare	9cb110c44c	Sketch a simd128 optimized q4k vecdot. (#977 ) * Sketch a simd128 optimized q4k vecdot. * Simdify. * More quantization optimizations. * Again more simdification. * Simdify the splitting loop.	2023-09-27 20:19:38 +01:00
Laurent Mazare	667f01c173	Simd128 vec-dot for q4_0. (#974 ) * Simd128 vec-dot for q4_0. * Bugfix. * Add wasm tests. * Bugfix for the q40 vecdot. * More quantization tests.	2023-09-27 14:15:30 +01:00
Laurent Mazare	e59784e353	simd128 optimized q8_0 vecdot (#972 ) * wasm/simd128 version of the quantized q8_0 vecdot. * Add the missing conversion.	2023-09-27 11:03:20 +01:00
Laurent Mazare	ce0a4e3a85	Use the gelu-erf activation. (#969 )	2023-09-26 22:30:21 +01:00
Laurent Mazare	4abc1ea34d	Avoid some overflows on wasm32. (#968 )	2023-09-26 11:15:38 +01:00
Laurent Mazare	dc47224ab9	Override the default cudnn heuristics. (#957 )	2023-09-25 10:31:53 +01:00
Laurent Mazare	e32c89d90c	Add the buffered safetensor wrapper. (#948 )	2023-09-23 22:57:42 +01:00
Laurent Mazare	890d069092	Self-contained safetensor wrappers (#946 ) * Self-contained safetensor wrappers. * Use the new safetensor container in varbuilders.	2023-09-23 20:39:52 +01:00
Laurent Mazare	ccf352f3d1	Use yoke to provide a self-referential container for mmaped safetenso… (#939 ) * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files.	2023-09-23 15:43:11 +01:00
Laurent Mazare	912a3d63b0	Use the proper block size for quantizing models. (#933 ) * Use the proper block size for quantizing models. * Use the proper dimension.	2023-09-22 21:36:56 +01:00
Laurent Mazare	8601537e31	Add slice-scatter. (#927 ) * Add slice-scatter. * Add the op. * Make transpose be a no-op when the dimensions are identical. * Add the backprop. * And add some gradient test.	2023-09-22 12:18:16 +01:00
Laurent Mazare	3b557765e8	T5 quantized example (#922 ) * Load gguf files for the quantized t5. * Add the quantized t5 example. * Allow for loading local files. * Add some support for quantizing safetensor files. * Transpose before quantizing. * Quantized t5. * Retrieve the weights from the hub.	2023-09-21 12:33:15 +01:00
Laurent Mazare	2619c4307f	Add a quantized version of the t5 model. (#921 )	2023-09-21 11:13:39 +01:00
Laurent Mazare	7b26e513f1	Add the erf function. (#917 )	2023-09-21 06:19:10 +01:00
Laurent Mazare	d7e48234d4	Add an erf based gelu op (#900 ) * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new).	2023-09-19 19:54:28 +01:00
Laurent Mazare	4f91c8e109	Improve the error message on shape mismatch for cat. (#897 ) * Improve the error message on shape mismatch for cat. * Cosmetic tweak.	2023-09-19 15:09:47 +01:00
Laurent Mazare	7dd8e12472	Bump the crate versions to v0.2.3. (#886 ) * Bump the crate version. * Also update the python bindings.	2023-09-18 12:14:03 +01:00
Laurent Mazare	635012d770	Do not backprop through argmin/argmax. (#865 )	2023-09-15 22:15:40 +01:00
Laurent Mazare	2746f2c4be	DiffNeXt/unet (#859 ) * DiffNeXt/unet * Start adding the vae. * VAE residual block. * VAE forward pass. * Add pixel shuffling. * Actually use pixel shuffling.	2023-09-15 10:14:02 +01:00
Laurent Mazare	130fe5a087	Add the upblocks. (#853 )	2023-09-14 22:24:56 +01:00
Laurent Mazare	d6447ad635	Tensor based indexing. (#842 )	2023-09-14 07:47:07 +01:00
Laurent Mazare	9a465e1b26	Add 1d upsampling. (#839 ) * Add 1d upsampling. * Add the interpolate functions.	2023-09-13 16:50:39 +01:00
Laurent Mazare	b11a2a7b9d	Move the constant to avoid some unused warning. (#837 )	2023-09-13 11:56:53 +01:00
Charles Lew	1c09164021	Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. (#836 )	2023-09-13 11:39:22 +01:00
Laurent Mazare	18d3c803a8	Scalar support in minimum/maximum. (#832 ) * Scalar support in minimum/maximum. * Add a clamp method to tensors.	2023-09-13 08:24:58 +01:00
Laurent Mazare	2257f4d475	Bump the crate version + update the changelog. (#822 )	2023-09-12 06:39:24 +01:00
Laurent Mazare	871efc0307	Bugfix for the conv2d cpu kernel. (#820 )	2023-09-11 23:11:27 +01:00
Laurent Mazare	c5a058b169	Use the module trait in stable-diffusion. (#817 )	2023-09-11 20:40:07 +01:00
Laurent Mazare	dbd4561416	im2col version of the conv1d kernel. (#815 ) * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel.	2023-09-11 14:40:09 +01:00
Laurent Mazare	70f38c2069	Proper error on unsupported dtypes when using gemm. (#813 )	2023-09-11 12:10:51 +01:00
Laurent Mazare	df712ecf64	Handle the case where the kernel is not contiguous in the cuda backend. (#809 )	2023-09-11 09:48:31 +01:00
Laurent Mazare	6fb665004c	Enable im2col on the cpu side. (#805 ) * Enable im2col on the cpu side. * Hook im2col on the cpu backend. * Use the kernel offset. * Avoid an unnecessary copy. * Handle non-contiguous kernels. * Add a const to select the conv2d kernel.	2023-09-11 09:28:13 +01:00
Laurent Mazare	1cd74129d4	Add Im2Col support on the gpu side. (#808 ) * Add Im2Col support on the gpu side. * Actually enable.	2023-09-11 08:52:33 +01:00
Laurent Mazare	98d1242b8f	im2col based conv2d (#802 ) * im2col implementation for conv2d. * Fix for the im2col implementation to match the current conv2d. * Small optimization. * Add a cuda kernel. * Handle arbitrary layouts. * Im2Col cuda code.	2023-09-10 21:02:42 +01:00
Laurent Mazare	258ac32c38	Fix cuda randn when generating an odd number of values. (#793 )	2023-09-09 18:44:21 +01:00
Laurent Mazare	c88d6fd4b9	Remove set_training. (#784 )	2023-09-09 08:27:37 +01:00
Laurent Mazare	057f7909bc	Accelerate support for gelu. (#782 )	2023-09-08 21:58:56 +01:00
Laurent Mazare	acf8f10ae1	Get the comparison operation to work on scalar values. (#780 ) * Get the comparison operation to work on scalar values. * Add some time measurement.	2023-09-08 20:13:29 +01:00
Laurent Mazare	158ff3c609	Add tracing to segment-anything (#777 ) * Tracing support for segment-anything. * More tracing. * Handle the empty slice case.	2023-09-08 15:31:29 +01:00
zmlcc	98172d46fa	Fix some errors about BlockQ8_1 (#776 ) * use int8 type instead of uint8 for BlockQ8_1.qs The uint8 type of BlockQ8_1.qs causes great loss for negative weights Ref: `ebc96086af/ggml.c (L904)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ4_1 Ref: `ebc96086af/ggml.c (L2840)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> * fix sum error in vec_dot of BlockQ5_1 Ref: `ebc96086af/ggml.c (L3490)` Signed-off-by: Zhang Miaolei <zmlcc@outlook.com> --------- Signed-off-by: Zhang Miaolei <zmlcc@outlook.com>	2023-09-08 13:29:40 +01:00
Laurent Mazare	0e250aee4f	Shape with holes (#770 ) * Shape with holes. * rustfmt.	2023-09-08 08:38:13 +01:00
Laurent Mazare	7396b8ed1a	Segment Anything - process images (#766 ) * Start processing images. * Add LayerNorm2d. * Properly use LayerNorm2d. * Tweak eps. * Use LayerNorm on inputs with a rank different from 3. * Window partitioning. * Fix a couple todos. * More todos. * Hard-code the einsums. * More padding support. * Some sizes tweaks. * Use the hub to get the weights. * Use a batch matmul. * Tweaks. * More fixes. * Get some predictions to be generated.	2023-09-07 19:22:45 +01:00
Laurent Mazare	6527ab81a3	Sketch the segment anything model. (#759 ) * Sketch the segment anything model. * Fix some clippy lint. * Add the mask decoder.	2023-09-07 05:34:05 +01:00
Laurent Mazare	7b1f2da828	Cudnn fix. (#758 )	2023-09-06 17:39:39 +01:00
Laurent Mazare	7299a68353	img2img pipeline for stable diffusion. (#752 ) * img2img pipeline for stable diffusion. * Rename the arguments + fix. * Fix for zero strength. * Another fix. * Another fix. * Revert. * Include the backtrace. * Noise scaling. * Fix the height/width.	2023-09-06 07:06:49 +01:00
Laurent Mazare	a4f40f3dc8	Use rayon directly rather than constraining the number of threads. (#749 )	2023-09-05 20:26:15 +01:00

1 2 3 4 5 ...

440 Commits