Commit Graph

114 Commits

Author SHA1 Message Date
871efc0307 Bugfix for the conv2d cpu kernel. (#820) 2023-09-11 23:11:27 +01:00
dbd4561416 im2col version of the conv1d kernel. (#815)
* im2col version of the cuda conv1d kernel.

* im2col version of the conv1d cpu kernel.
2023-09-11 14:40:09 +01:00
70f38c2069 Proper error on unsupported dtypes when using gemm. (#813) 2023-09-11 12:10:51 +01:00
6fb665004c Enable im2col on the cpu side. (#805)
* Enable im2col on the cpu side.

* Hook im2col on the cpu backend.

* Use the kernel offset.

* Avoid an unnecessary copy.

* Handle non-contiguous kernels.

* Add a const to select the conv2d kernel.
2023-09-11 09:28:13 +01:00
a4f40f3dc8 Use rayon directly rather than constraining the number of threads. (#749) 2023-09-05 20:26:15 +01:00
cda45a7443 Let outside CustomOp2 implementations use binary_map/binary_map_vec (#741) 2023-09-05 09:27:32 +01:00
84d003ff53 Handle arbitrary shapes in Tensor::new. (#718) 2023-09-02 19:59:21 +01:00
393690387f Support dilation in conv-transpose2d. (#671) 2023-08-30 09:22:00 +01:00
59b731de99 Add the powf op. (#664)
* Add the powf op.

* Cuda kernels and backprop.

* Add a test.
2023-08-29 20:48:18 +01:00
71221559d3 Fix the dilated convolutions. (#659) 2023-08-29 16:37:42 +01:00
a044907ffc Dilated convolutions (#657)
* Add the dilation parameter.

* Restore the basic optimizer example.

* Dilation support in cudnn.

* Use the dilation parameter in the cpu backend.

* More dilation support.

* No support for dilation in transposed convolutions.

* Add dilation to a test.

* Remove a print.

* Helper function.
2023-08-29 16:12:11 +01:00
72fae3140c Optimize the conv2d transpose cpu kernel. (#644)
* Optimize the conv2d transpose cpu kernel.

* Use multiple cores.
2023-08-28 20:06:31 +01:00
ca26198b95 Fix the cpu kernel for conv-transpose. (#643) 2023-08-28 16:45:12 +01:00
b292047882 Backprop for conv2d. (#638)
* Start adding backprop for conv2d.

* Backprop for conv2d.

* Bugfix + start adding a conv2d test.

* Conv2d backprop testing.

* More conv fixes.
2023-08-28 16:08:55 +01:00
3cca89cc70 Add conv-transpose. (#635)
* Add conv-transpose.

* Return zeros for now.

* Naive CPU implementation.

* Add a conv-transpose test + fix the cpu implementation.

* Add a second test.
2023-08-28 10:10:12 +01:00
d8ba0452dc Fail on bf16. (#594) 2023-08-25 06:10:38 +01:00
329f661d9b Trace softmax (#568)
* Trace the softmax op.

* Inline the sum.

* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
9a5c7db91a Add support for i64 (#563)
* Add the i64 dtype.

* Adapt the cuda kernels.
2023-08-23 10:42:19 +01:00
495e0b7580 Simd support (#448)
* Import the simd intrinsics in candle-core.

* simd version of reduce-sum.

* Bugfix.

* Fix some clippy lints.
2023-08-15 09:50:38 +01:00
d379a76a9e Add a softmax bench. (#433)
* Add a softmax bench.

* Add the vectorized sum reduce.
2023-08-13 20:09:18 +01:00
9af438ac1b Track the conv2d operations in stable-diffusion. (#431)
* Track the conv2d operations in stable-diffusion.

* Add more tracing to stable-diffusion.

* Also trace the resnet bits.

* Trace the attention blocks.

* Also trace the attention inner part.

* Small tweak.
2023-08-13 15:58:26 +01:00
662db45fc3 Use zero padding in conv1d and conv2d (same as pytorch). (#408) 2023-08-11 14:53:05 +01:00
e29c7809ec Parallelise the CPU kernels for the conv ops. (#401)
* Parallelise the conv2d op.

* Tighter control on threading.

* Also parallelise conv1d.

* Add some safety comment.
2023-08-11 05:51:58 +01:00
a325c1aa50 Upsample test + bugfix. (#399) 2023-08-10 21:02:35 +02:00
94eff56aee Optimize the cpu conv2d kernel (#396)
* Conv2d simd optimization.

* Fix the contiguous copying.

* Small tweak.
2023-08-10 17:40:09 +01:00
c8039579a5 Conv1d optimize (#392)
* Reorder the conv1d loops in the cpu backend.

* Optimize the 1d convolution.

* Conv1D optimize.

* Fix some clippy lints.
2023-08-10 15:23:52 +01:00
c7f92f985e Further randn tweaks: use the appropriate rng rather than the f64 one, some cleanup. (#383) 2023-08-10 05:48:19 +01:00
Lei
3bbc08a8df Fix randn cpu (#382)
* Change distributions

Standard generates in [0, 1), Normal is correct.

* Add test

Not sure if this is the best place to put  the test

* Remove unnecessary use
2023-08-10 05:33:44 +01:00
fcfdcbd337 Add a conv1d benchmark based on the whisper sizes. (#377)
* Add a conv1d benchmark based on the whisper sizes.

* Enforce the batch-dim in conv1d.
2023-08-09 20:27:03 +01:00
a5c5a893aa add max_pool2d (#371)
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
2023-08-09 18:05:26 +01:00
1892bd139c Extract the strides in the conv ops. (#370) 2023-08-09 17:57:05 +01:00
cd225bd3b1 More testing for avg-pool2d. (#366)
* More testing for avg-pool2d.

* Another fix.

* Add a max-pool test with non-divisible kernel sizes.
2023-08-09 16:12:23 +01:00
b80348d22f Bugfix for avg-pool + add some test. (#365) 2023-08-09 15:44:16 +01:00
dbc6f281c9 Conv1d test with padding. (#356) 2023-08-09 05:45:38 +01:00
cf965ecaa8 Simplify the conv1d and conv2d code. (#352) 2023-08-08 22:10:59 +01:00
608b2358c6 Add some conv1d test + bugfix using padding. (#349) 2023-08-08 20:50:20 +01:00
1e6dbeac01 Add some conv2d tests. (#347)
* Add some conv2d tests.

* Add a simpler conv2d test.

* More conv2d testing + bugfix.

* Add a todo.
2023-08-08 19:02:42 +01:00
13ce68ff9b Bugfix for conv2d. (#343) 2023-08-08 15:20:00 +01:00
ab35684326 Naive implementation for conv2d. (#341) 2023-08-08 06:34:36 +01:00
b5bb5e056d Add more conv2d support. (#340)
* Add more conv2d support.

* Conv2d cpu work.

* Conv2d output shape.
2023-08-08 06:04:32 +01:00
d0d7010682 CPU implementation for upsample-nearest2d. (#339) 2023-08-07 20:07:10 +01:00
fc265d9dcf Some CLIP fixes for stable diffusion. (#338)
* Some CLIP fixes for stable diffusion.

* Add the avg-pool2d operation on cpu.
2023-08-07 18:31:45 +01:00
b278834267 Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
4b3bd79fbd Remove the embedding ops in favor of index-select. (#299)
* Remove the embedding ops in favor of index-select.

* Also remove the cuda kernels.
2023-08-02 05:42:11 +01:00
c950a5c6b1 Cuda support for the mnist training. (#277)
* Cuda support for the mnist training.

* min/max fix + testing.

* Add the argmin/argmax tests.

* More cuda support for argmin/argmax.

* Cuda kernels for argmin and argmax.
2023-07-29 19:48:04 +01:00
3eb2bc6d07 Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
3e89df938c Starcoder fix (#264)
* Bugfix for starcoder.

* Get some proper code generation.

* Slightly simpler softmax.
2023-07-28 11:17:49 +01:00
6475bfadfe Simplify Tensor::randn. (#255)
* Simplify Tensor::randn.

* Also switch Tensor::rand to use a generic dtype.

* Support sampling for f16.

* Cleanup.
2023-07-27 07:40:36 +01:00
18cc73954a Add some testing for index-add (#237)
* Add some testing for index-add.

* Fix the cpu implementation for index-add.
2023-07-25 08:38:33 +01:00
23827c49cd Cleanup some todos. (#226)
* Cleanup some todos.

* Fix more todo.

* Optimize for the contiguous case.

* Add the IntDType trait.

* Handle the intdtype trait for more ops.

* Remove a todo.

* Remove a todo.
2023-07-23 16:00:00 +01:00