Commit Graph

67 Commits

Author SHA1 Message Date
26c16923b9 Make sorted_nodes pub function (#2780) 2025-02-22 10:23:45 +01:00
3159f91b90 20241118 docs (#2629)
* module docs

* varbuilder gguf docs

* add a link to gguf files

* small additonal mod doc titles

* safetensor docs

* more core docs

* more module docs in canlde_core

* 2 more link fixes
2024-11-19 04:07:07 +01:00
c0a559d427 optimize gradient for silu a bit (#2393) 2024-08-04 11:24:17 +02:00
ce90287f45 Add get_ids to GradStore (#2379) 2024-08-01 10:56:13 +02:00
f25173d68b Fix for backprop in ConvTranspose2D with stride of 2 (#2337)
* Add gradient test for conv_transpose2d with stride of 2.

* Swap dilation and stride in ConvTranspose2D backpropagation.

Without this, a shape mismatch occurs with a stride of 2 and dilation of 1.

* Add further tests of the ConvTranspose2D gradient.

Values calculated with torch, minor numerical errors adjusted and commented.
2024-07-17 19:22:23 +02:00
6a4741bbf9 Fix Elu gradient NaN on large input (#2328)
* Fix Elu gradient NaN on large input

* Reuse previously computed exp in Elu
2024-07-16 14:41:16 +02:00
1690ab45d2 Fix the silu gradient issue on 0. (#2083) 2024-04-18 14:31:41 +02:00
c5626b8271 Add support for "sign" on tensors (#2012)
* add the sign unary operator

* remove uneeded import

* remove uneeded import

* undo formatting

* undo formatting

* remove unnecessary redefintion

* allow gradient to flow through for sign and round

* fix cpu ops to ensure that negzero and positive zero are handled properly

* clippy fixes

* Properly avoid gradient tracking.

* Use a branchless version.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-04 22:32:47 +02:00
cc856db9ce Backwards for ConvTranspose2D (#1910)
* add documentation  for nackprop

* add backwards for ConvTranspose2D

* add test python code to test
2024-03-23 07:05:55 +01:00
8013b50829 Add grads for interpolate1d (#1742)
* add backprop for interpolate1d

* fix clippy lint

* correct fix clippy lint
2024-02-22 08:44:01 +01:00
1fb728772d Support for groups in conv-transpose1d. (#1731)
* Groups support in conv-transpose-1d.

* Remove dangling file.
2024-02-18 21:28:07 +01:00
b60064780d feat: add silu activation function (#1706)
* feat: add silu activation function

* use silu/arg in grad

* update candle-nn

* use node
2024-02-14 10:27:22 +01:00
ad73e93da2 Detach the tensors on batch-norm eval. (#1702)
* Detach the tensors on batch-norm eval.

* Fix pyo3 bindings.

* Black tweak.

* Formatting.

* Also update the pyo3-onnx formatting.

* Apply black.
2024-02-13 14:26:32 +01:00
18eb87f25f Upsample grad (#1420)
* encode size of upsample in enum

* working convolution method for limited 2d kernels

* add test for sf 3 interpolation

* add higher dimensional tests, fix to work with multichannel input

* Remove commented out line.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-12-10 08:43:24 +01:00
60fdab4e17 Detach all grads during backprop. (#1243)
* Detach all grads during backprop.

* Add an environment variable to select the backprop behavior.

* Update the comment.
2023-11-05 14:07:41 +01:00
7051fb8098 feat: add backprop for elu (#1269)
* feat: add backprop for elu

* Cosmetic tweaks.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-11-04 21:26:41 +01:00
3173b1ce3b feat: impl backprop for erf and gelu-erf (#1258)
* impl backprop for erf anf gelu-erf

* feat: unary tests added for erf and gelu-erf

* fix: (clippy) remove immediately dereferenced ref

* fix: improve comments with pytorch code snippet

* fix: adjust comment typo in backprop impl
2023-11-03 21:32:30 +01:00
1cfc5d6d0c Backprop support for conv1d (cpu only for now). (#1255) 2023-11-03 14:23:53 +01:00
be4555c5a5 Add the conv-transpose1d op. (#1251)
* Skeleton structure for conv-transpose1d.

* CPU implementation for conv-transpose1d.
2023-11-03 09:44:46 +01:00
46d6566c99 Fix the conv2d gradient computation. (#1214) 2023-10-29 09:50:04 +00:00
807e3f9f52 derivative for GELU (#1160)
* derivative for GELU

* add tests
2023-10-23 20:23:45 +01:00
8f310cc666 Avoid trying to backprop through non-differentiable layers. (#1094) 2023-10-14 22:03:41 +01:00
c18a856e76 Add the rounding operators. (#1030)
* Add the rounding operators.

* Avoid tracking gradients for the rounding operations.

* Add some rounding tests.
2023-10-04 17:58:44 +01:00
8601537e31 Add slice-scatter. (#927)
* Add slice-scatter.

* Add the op.

* Make transpose be a no-op when the dimensions are identical.

* Add the backprop.

* And add some gradient test.
2023-09-22 12:18:16 +01:00
7b26e513f1 Add the erf function. (#917) 2023-09-21 06:19:10 +01:00
d7e48234d4 Add an erf based gelu op (#900)
* Erf based gelu.

* Add the erf backed gelu.

* Test the new gelu op (which is not gelu_new).
2023-09-19 19:54:28 +01:00
635012d770 Do not backprop through argmin/argmax. (#865) 2023-09-15 22:15:40 +01:00
9a465e1b26 Add 1d upsampling. (#839)
* Add 1d upsampling.

* Add the interpolate functions.
2023-09-13 16:50:39 +01:00
ad8a62dbf5 Add tanh. (#675)
* Add tanh.

* Use tanh in the lstm block.

* Add a test for tanh forward and backward passes.
2023-08-30 13:54:50 +01:00
59b731de99 Add the powf op. (#664)
* Add the powf op.

* Cuda kernels and backprop.

* Add a test.
2023-08-29 20:48:18 +01:00
2d3fcad267 Simplify usage of the pool functions. (#662)
* Simplify usage of the pool functions.

* Small tweak.

* Attempt at using apply to simplify the convnet definition.
2023-08-29 19:12:16 +01:00
a044907ffc Dilated convolutions (#657)
* Add the dilation parameter.

* Restore the basic optimizer example.

* Dilation support in cudnn.

* Use the dilation parameter in the cpu backend.

* More dilation support.

* No support for dilation in transposed convolutions.

* Add dilation to a test.

* Remove a print.

* Helper function.
2023-08-29 16:12:11 +01:00
d0a330448d Backprop support for pooling ops. (#652)
* Backprop support for pooling ops.

* max-pool gradient.
2023-08-29 10:17:59 +01:00
b292047882 Backprop for conv2d. (#638)
* Start adding backprop for conv2d.

* Backprop for conv2d.

* Bugfix + start adding a conv2d test.

* Conv2d backprop testing.

* More conv fixes.
2023-08-28 16:08:55 +01:00
3cca89cc70 Add conv-transpose. (#635)
* Add conv-transpose.

* Return zeros for now.

* Naive CPU implementation.

* Add a conv-transpose test + fix the cpu implementation.

* Add a second test.
2023-08-28 10:10:12 +01:00
d70cffdab6 Fix the minimum/maximum gradient computations. (#534) 2023-08-21 08:28:41 +01:00
a1812f934f Add a yolo-v3 example. (#528)
* Add a couple functions required for yolo.

* Add the yolo-v3 example.

* Add minimum and maximum.

* Use the newly introduced maximum.

* Cuda support for min/max + add some testing.

* Allow for more tests to work with accelerate.

* Fix a typo.
2023-08-20 18:19:37 +01:00
cb069d6063 Add the permute op (similar to pytorch). (#504)
* Add the permute op (similar to pytorch).

* Add the backprop for dimension permutation.
2023-08-18 16:30:53 +01:00
a5c5a893aa add max_pool2d (#371)
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
2023-08-09 18:05:26 +01:00
2345b8ce3f Skeleton for the avg-pool2d and upsample-nearest2d ops. (#337)
* Skeleton for the avg-pool2d and upsample-nearest2d ops.

* Preliminary conv2d support.
2023-08-07 16:15:38 +01:00
166bfd5847 Add the recip op + use it in stable-diffusion. (#331)
* Add the recip unary op.

* Fix the cuda kernel.

* Use the recip op in sigmoid.
2023-08-06 21:14:52 +01:00
4b3bd79fbd Remove the embedding ops in favor of index-select. (#299)
* Remove the embedding ops in favor of index-select.

* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
3eb2bc6d07 Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
89ba005962 Support backprop for a few more ops. (#254) 2023-07-26 21:31:54 +01:00
fe87778223 Add the copy op. (#227)
* Add the copy op.

* Tweak some cat error messages.

* Handle the contiguous case in to_vec1.

* Fast variant for to_vec2.

* Add add a faster to_vec3 variant.
2023-07-23 18:06:47 +01:00
52c5d8c087 Add the gather op. (#219)
* Start adding gather.

* Gather cpu implementation + use in simple training.

* Add scatter_add for the gradient of gather.

* Simple cpu implementation of scatter_add.

* Use gather in the simple-training backprop.
2023-07-22 07:21:28 +01:00
6eeea1b04e Polish the index-add op and use it in the index-select backprop (#218)
* Add the cpu version of index-add.

* More cpu support for index-add.

* Use index-add in the backprop.
2023-07-22 05:31:46 +01:00
27174a82aa Start adding index-add. 2023-07-21 20:12:48 +01:00
5cc843550d Add binary and ternary custom ops. (#217) 2023-07-21 17:29:50 +01:00
a6bcdfb269 Custom ops with a single argument (#214)
* Add the CustomOp1 trait.

* Add an example of custom op.

* Polish the custom op example.

* Add some backward pass test for custom ops.
2023-07-21 15:18:05 +01:00