bc3be6f9b0
Add the elu cuda kernel. ( #114 )
2023-07-10 07:57:01 +01:00
c187f347bf
Make it easier to use whisper samples from the repo. ( #112 )
...
* Make it easier to use samples from the repo.
* Use f32 for accumulation in the f16/bf16 kernels.
2023-07-08 18:48:27 +01:00
eb64ad0d4d
Cuda kernel for the conv1d op ( #111 )
...
* Boilerplate code for conv1d.
* Boilerplate code for conv1d.
* More boilerplate for conv1d.
* Conv1d work.
* Get the conv1d cuda kernel to work.
* Conv1d support when no batch dim.
2023-07-08 18:13:25 +01:00
e676f85f00
Sketch a fast cuda kernel for reduce-sum. ( #109 )
...
* Sketch a fast cuda kernel for reduce-sum.
* Sketch the rust support code for the fast sum kernel.
* More work on the fast kernel.
* Add some testing ground.
* A couple fixes for the fast sum kernel.
2023-07-08 12:43:56 +01:00
c71a38deb7
Tweak the include order to include math.h first. ( #100 )
2023-07-07 06:47:25 +01:00
f114394456
Include the math.h file to get access to constants. ( #99 )
2023-07-07 06:42:57 +01:00
9784d1ed9f
Minor tweaks.
2023-07-03 18:31:55 +01:00
313fa022a5
Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous.
2023-06-30 10:43:32 +01:00
8ad47907f3
Add the kernels.
2023-06-30 10:26:56 +01:00
6486a6d7b2
Avoid some cast kernels.
2023-06-29 23:23:44 +01:00
ec79fc43f2
Add the bf16 cuda kernels.
2023-06-29 23:12:02 +01:00
1ce3843cab
Add the relu op.
2023-06-28 09:38:54 +01:00
380d61e990
Fix two cuda bugs (matmul and where_cond).
2023-06-27 11:31:04 +01:00
d7f729fb8f
Refactor the hierarchy.
2023-06-27 11:57:27 +02:00