candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-16 18:48:51 +00:00

Author	SHA1	Message	Date
Laurent Mazare	871efc0307	Bugfix for the conv2d cpu kernel. (#820 )	2023-09-11 23:11:27 +01:00
Laurent Mazare	dbd4561416	im2col version of the conv1d kernel. (#815 ) * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel.	2023-09-11 14:40:09 +01:00
Laurent Mazare	70f38c2069	Proper error on unsupported dtypes when using gemm. (#813 )	2023-09-11 12:10:51 +01:00
Laurent Mazare	6fb665004c	Enable im2col on the cpu side. (#805 ) * Enable im2col on the cpu side. * Hook im2col on the cpu backend. * Use the kernel offset. * Avoid an unnecessary copy. * Handle non-contiguous kernels. * Add a const to select the conv2d kernel.	2023-09-11 09:28:13 +01:00
Laurent Mazare	a4f40f3dc8	Use rayon directly rather than constraining the number of threads. (#749 )	2023-09-05 20:26:15 +01:00
Gonzalo	cda45a7443	Let outside CustomOp2 implementations use binary_map/binary_map_vec (#741 )	2023-09-05 09:27:32 +01:00
Laurent Mazare	84d003ff53	Handle arbitrary shapes in Tensor::new. (#718 )	2023-09-02 19:59:21 +01:00
Laurent Mazare	393690387f	Support dilation in conv-transpose2d. (#671 )	2023-08-30 09:22:00 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00
Laurent Mazare	71221559d3	Fix the dilated convolutions. (#659 )	2023-08-29 16:37:42 +01:00
Laurent Mazare	a044907ffc	Dilated convolutions (#657 ) * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.	2023-08-29 16:12:11 +01:00
Laurent Mazare	72fae3140c	Optimize the conv2d transpose cpu kernel. (#644 ) * Optimize the conv2d transpose cpu kernel. * Use multiple cores.	2023-08-28 20:06:31 +01:00
Laurent Mazare	ca26198b95	Fix the cpu kernel for conv-transpose. (#643 )	2023-08-28 16:45:12 +01:00
Laurent Mazare	b292047882	Backprop for conv2d. (#638 ) * Start adding backprop for conv2d. * Backprop for conv2d. * Bugfix + start adding a conv2d test. * Conv2d backprop testing. * More conv fixes.	2023-08-28 16:08:55 +01:00
Laurent Mazare	3cca89cc70	Add conv-transpose. (#635 ) * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.	2023-08-28 10:10:12 +01:00
Laurent Mazare	d8ba0452dc	Fail on bf16. (#594 )	2023-08-25 06:10:38 +01:00
Laurent Mazare	329f661d9b	Trace softmax (#568 ) * Trace the softmax op. * Inline the sum. * Add min/max vec operations.	2023-08-23 15:25:50 +01:00
Laurent Mazare	9a5c7db91a	Add support for i64 (#563 ) * Add the i64 dtype. * Adapt the cuda kernels.	2023-08-23 10:42:19 +01:00
Laurent Mazare	495e0b7580	Simd support (#448 ) * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.	2023-08-15 09:50:38 +01:00
Laurent Mazare	d379a76a9e	Add a softmax bench. (#433 ) * Add a softmax bench. * Add the vectorized sum reduce.	2023-08-13 20:09:18 +01:00
Laurent Mazare	9af438ac1b	Track the conv2d operations in stable-diffusion. (#431 ) * Track the conv2d operations in stable-diffusion. * Add more tracing to stable-diffusion. * Also trace the resnet bits. * Trace the attention blocks. * Also trace the attention inner part. * Small tweak.	2023-08-13 15:58:26 +01:00
Laurent Mazare	662db45fc3	Use zero padding in conv1d and conv2d (same as pytorch). (#408 )	2023-08-11 14:53:05 +01:00
Laurent Mazare	e29c7809ec	Parallelise the CPU kernels for the conv ops. (#401 ) * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.	2023-08-11 05:51:58 +01:00
Laurent Mazare	a325c1aa50	Upsample test + bugfix. (#399 )	2023-08-10 21:02:35 +02:00
Laurent Mazare	94eff56aee	Optimize the cpu conv2d kernel (#396 ) * Conv2d simd optimization. * Fix the contiguous copying. * Small tweak.	2023-08-10 17:40:09 +01:00
Laurent Mazare	c8039579a5	Conv1d optimize (#392 ) * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.	2023-08-10 15:23:52 +01:00
Laurent Mazare	c7f92f985e	Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383 )	2023-08-10 05:48:19 +01:00
Lei	3bbc08a8df	Fix randn cpu (#382 ) * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use	2023-08-10 05:33:44 +01:00
Laurent Mazare	fcfdcbd337	Add a conv1d benchmark based on the whisper sizes. (#377 ) * Add a conv1d benchmark based on the whisper sizes. * Enforce the batch-dim in conv1d.	2023-08-09 20:27:03 +01:00
LeeeSe	a5c5a893aa	add max_pool2d (#371 ) Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>	2023-08-09 18:05:26 +01:00
Laurent Mazare	1892bd139c	Extract the strides in the conv ops. (#370 )	2023-08-09 17:57:05 +01:00
Laurent Mazare	cd225bd3b1	More testing for avg-pool2d. (#366 ) * More testing for avg-pool2d. * Another fix. * Add a max-pool test with non-divisible kernel sizes.	2023-08-09 16:12:23 +01:00
Laurent Mazare	b80348d22f	Bugfix for avg-pool + add some test. (#365 )	2023-08-09 15:44:16 +01:00
Laurent Mazare	dbc6f281c9	Conv1d test with padding. (#356 )	2023-08-09 05:45:38 +01:00
Laurent Mazare	cf965ecaa8	Simplify the conv1d and conv2d code. (#352 )	2023-08-08 22:10:59 +01:00
Laurent Mazare	608b2358c6	Add some conv1d test + bugfix using padding. (#349 )	2023-08-08 20:50:20 +01:00
Laurent Mazare	1e6dbeac01	Add some conv2d tests. (#347 ) * Add some conv2d tests. * Add a simpler conv2d test. * More conv2d testing + bugfix. * Add a todo.	2023-08-08 19:02:42 +01:00
Laurent Mazare	13ce68ff9b	Bugfix for conv2d. (#343 )	2023-08-08 15:20:00 +01:00
Laurent Mazare	ab35684326	Naive implementation for conv2d. (#341 )	2023-08-08 06:34:36 +01:00
Laurent Mazare	b5bb5e056d	Add more conv2d support. (#340 ) * Add more conv2d support. * Conv2d cpu work. * Conv2d output shape.	2023-08-08 06:04:32 +01:00
Laurent Mazare	d0d7010682	CPU implementation for upsample-nearest2d. (#339 )	2023-08-07 20:07:10 +01:00
Laurent Mazare	fc265d9dcf	Some CLIP fixes for stable diffusion. (#338 ) * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.	2023-08-07 18:31:45 +01:00
Laurent Mazare	b278834267	Support the Accelerate BLAS on macOS. (#325 ) * Add the accelerate feature. * Ffi tweaks.	2023-08-05 17:25:24 +01:00
Laurent Mazare	4b3bd79fbd	Remove the embedding ops in favor of index-select. (#299 ) * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.	2023-08-02 05:42:11 +01:00
Laurent Mazare	c950a5c6b1	Cuda support for the mnist training. (#277 ) * Cuda support for the mnist training. * min/max fix + testing. * Add the argmin/argmax tests. * More cuda support for argmin/argmax. * Cuda kernels for argmin and argmax.	2023-07-29 19:48:04 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	3e89df938c	Starcoder fix (#264 ) * Bugfix for starcoder. * Get some proper code generation. * Slightly simpler softmax.	2023-07-28 11:17:49 +01:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	18cc73954a	Add some testing for index-add (#237 ) * Add some testing for index-add. * Fix the cpu implementation for index-add.	2023-07-25 08:38:33 +01:00
Laurent Mazare	23827c49cd	Cleanup some todos. (#226 ) * Cleanup some todos. * Fix more todo. * Optimize for the contiguous case. * Add the IntDType trait. * Handle the intdtype trait for more ops. * Remove a todo. * Remove a todo.	2023-07-23 16:00:00 +01:00

1 2 3

114 Commits