candle

mirror of https://github.com/huggingface/candle.git synced 2025-06-19 11:56:45 +00:00

Author	SHA1	Message	Date
Laurent Mazare	5e1c595e00	Optimize the index-select cuda kernel. (#976 )	2023-09-28 09:05:29 +01:00
Laurent Mazare	9a465e1b26	Add 1d upsampling. (#839 ) * Add 1d upsampling. * Add the interpolate functions.	2023-09-13 16:50:39 +01:00
Laurent Mazare	b11a2a7b9d	Move the constant to avoid some unused warning. (#837 )	2023-09-13 11:56:53 +01:00
Charles Lew	1c09164021	Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. (#836 )	2023-09-13 11:39:22 +01:00
Laurent Mazare	dbd4561416	im2col version of the conv1d kernel. (#815 ) * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel.	2023-09-11 14:40:09 +01:00
Laurent Mazare	df712ecf64	Handle the case where the kernel is not contiguous in the cuda backend. (#809 )	2023-09-11 09:48:31 +01:00
Laurent Mazare	1cd74129d4	Add Im2Col support on the gpu side. (#808 ) * Add Im2Col support on the gpu side. * Actually enable.	2023-09-11 08:52:33 +01:00
Laurent Mazare	98d1242b8f	im2col based conv2d (#802 ) * im2col implementation for conv2d. * Fix for the im2col implementation to match the current conv2d. * Small optimization. * Add a cuda kernel. * Handle arbitrary layouts. * Im2Col cuda code.	2023-09-10 21:02:42 +01:00
Laurent Mazare	258ac32c38	Fix cuda randn when generating an odd number of values. (#793 )	2023-09-09 18:44:21 +01:00
Laurent Mazare	158ff3c609	Add tracing to segment-anything (#777 ) * Tracing support for segment-anything. * More tracing. * Handle the empty slice case.	2023-09-08 15:31:29 +01:00
Laurent Mazare	a0d65585db	Softmax implementation for cuda. (#747 )	2023-09-05 18:38:03 +01:00
Laurent Mazare	393690387f	Support dilation in conv-transpose2d. (#671 )	2023-08-30 09:22:00 +01:00
Laurent Mazare	59b731de99	Add the powf op. (#664 ) * Add the powf op. * Cuda kernels and backprop. * Add a test.	2023-08-29 20:48:18 +01:00
Laurent Mazare	a044907ffc	Dilated convolutions (#657 ) * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.	2023-08-29 16:12:11 +01:00
Laurent Mazare	037b41c9dc	Cuda conv transpose (#645 ) * Cuda kernel for conv-transpose. * Fix the cuda kernel. * Fix the tests.	2023-08-28 20:58:49 +01:00
Laurent Mazare	3cca89cc70	Add conv-transpose. (#635 ) * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.	2023-08-28 10:10:12 +01:00
Laurent Mazare	dd64465899	Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578 ) * Add a test for conv2d with padding. * Cosmetic changes. * Bugfix the rand function on the cuda backend.	2023-08-24 10:16:37 +01:00
Laurent Mazare	9a5c7db91a	Add support for i64 (#563 ) * Add the i64 dtype. * Adapt the cuda kernels.	2023-08-23 10:42:19 +01:00
Laurent Mazare	90374097dc	Cudnn support (#445 ) * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.	2023-08-14 21:30:41 +01:00
Laurent Mazare	c84883ecf2	Add a cuda kernel for upsampling. (#441 ) * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.	2023-08-14 13:12:17 +01:00
Laurent Mazare	a094dc503d	Add a cuda kernel for avg-pool2d. (#440 ) * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.	2023-08-14 12:32:05 +01:00
Laurent Mazare	34f4b3187e	Add a naive conv2d cuda kernel. (#438 ) * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.	2023-08-14 10:34:42 +01:00
Ciarán Curley	25ec2d9f6b	fix: remove incorrect unwrap (#379 )	2023-08-09 21:45:24 +01:00
LeeeSe	a5c5a893aa	add max_pool2d (#371 ) Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>	2023-08-09 18:05:26 +01:00
Laurent Mazare	b5bb5e056d	Add more conv2d support. (#340 ) * Add more conv2d support. * Conv2d cpu work. * Conv2d output shape.	2023-08-08 06:04:32 +01:00
Laurent Mazare	d0d7010682	CPU implementation for upsample-nearest2d. (#339 )	2023-08-07 20:07:10 +01:00
Laurent Mazare	fc265d9dcf	Some CLIP fixes for stable diffusion. (#338 ) * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.	2023-08-07 18:31:45 +01:00
Laurent Mazare	4b3bd79fbd	Remove the embedding ops in favor of index-select. (#299 ) * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.	2023-08-02 05:42:11 +01:00
Laurent Mazare	c950a5c6b1	Cuda support for the mnist training. (#277 ) * Cuda support for the mnist training. * min/max fix + testing. * Add the argmin/argmax tests. * More cuda support for argmin/argmax. * Cuda kernels for argmin and argmax.	2023-07-29 19:48:04 +01:00
Laurent Mazare	c0a8ed19eb	Support for where-cond on cuda for u8 and u32. (#274 )	2023-07-29 11:48:58 +01:00
Laurent Mazare	3eb2bc6d07	Softmax numerical stability. (#267 ) * Softmax numerical stability. * Fix the flash-attn test.	2023-07-28 13:13:01 +01:00
Laurent Mazare	6475bfadfe	Simplify Tensor::randn. (#255 ) * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.	2023-07-27 07:40:36 +01:00
Laurent Mazare	944d70bd9a	Add a test for scatter add. (#238 ) * Add a test for scatter add (segfaults on gpus for now). * Bugfix for the scatter add cuda kernel.	2023-07-25 09:12:14 +01:00
Laurent Mazare	74a6a769dd	Cuda kernels for IndexAdd/ScatterAdd. (#236 ) * Skeleton methods for IndexAdd/ScatterAdd. * Add a Map2InPlace trait. * Add the glue code for the index-add/scatter-add kernels. * Tweak the file name: embeddings -> indexing. * Add the cuda kernel for indexadd. * And add the scatter-add kernels.	2023-07-24 21:53:08 +01:00
Laurent Mazare	581b104f97	Indexing cuda (#235 ) * Allow using uint8_t for indexing. * Revert the default cuda feature. * Add a cuda-kernel for index-select. * Add a test for gather.	2023-07-24 20:22:47 +01:00
Laurent Mazare	b50f932e7c	Add some cmp tests. (#233 ) * Add some cmp tests. * Add the cuda kernels for comparison operations.	2023-07-24 16:53:45 +01:00
Laurent Mazare	e449ce53a2	Wrapping code to call the custom op. (#225 ) * Wrapping code to call the custom op. * Get the rms example to work. * Get around rustfmt failing in the CI. * Fix the rms computation.	2023-07-23 11:31:17 +01:00
Laurent Mazare	b8a10425ad	Kernel build example (#224 ) * Build example kernels. * Add some sample custom kernel. * Get the example kernel to compile. * Add some cuda code. * More cuda custom op. * More cuda custom ops.	2023-07-23 07:15:37 +01:00
Laurent Mazare	43c7223292	Rename the .r functions to .dims so as to be a bit more explicit. (#220 )	2023-07-22 10:39:27 +01:00
Laurent Mazare	52c5d8c087	Add the gather op. (#219 ) * Start adding gather. * Gather cpu implementation + use in simple training. * Add scatter_add for the gradient of gather. * Simple cpu implementation of scatter_add. * Use gather in the simple-training backprop.	2023-07-22 07:21:28 +01:00
laurent	27174a82aa	Start adding index-add.	2023-07-21 20:12:48 +01:00
Laurent Mazare	410654525f	Refactor the reduce ops in order to introduce argmin/argmax. (#212 ) * Refactor the reduce ops in order to introduce argmin/argmax. * Clippy fixes. * Use the newly introduced argmax. * Fix the strided case. * Handle the non-contiguous case.	2023-07-21 11:41:08 +01:00
Laurent Mazare	fa08fb3126	Add the index-select op. (#209 ) * Add the index-select op. * Cpu implementation of index-select. * Add the cpu implementation for index-select.	2023-07-20 14:01:03 +01:00
Laurent Mazare	2a8f28d687	Op refactor (#208 ) * Add the binary and unary op enums to factorize some code. * Bugfix.	2023-07-20 12:28:45 +01:00
Laurent Mazare	e9c052bf94	Add the comparison operations. (#207 ) * Add the comparison operations. * Add the helper functions on the tensor side. * More cmp operations. * Cpu implementation for the comparison operations.	2023-07-20 09:40:31 +01:00
Laurent Mazare	536c5e702e	Cuda kernels for fast min/max reductions (#203 ) * Add the min/max cuda kernels. * Better integration of the cuda kernels.	2023-07-19 18:12:27 +01:00
Laurent Mazare	cb687b4897	Add some more developed training examples. (#199 ) * Use contiguous tensors for variables. * Sketch the mnist example. * Start adding the reduce ops. * Renaming. * Refactor the reduce operations. * Bugfix for the broadcasting vectorization.	2023-07-19 15:37:52 +01:00
Laurent Mazare	d88b6cdca9	Add backtrace information to errors where relevant. (#166 ) * Add backtrace information to errors where relevant. * More backtrace information. * Add to the FAQ.	2023-07-14 09:31:25 +01:00
Laurent Mazare	64264d97c1	Modular backends (#138 ) * Add some trait to formalize backends. * Use the generic backend trait.	2023-07-11 11:17:02 +01:00
Laurent Mazare	674eb35e10	Remove some dead-code pragmas. (#137 )	2023-07-11 09:33:59 +01:00

1 2

76 Commits