Commit Graph

117 Commits

Author SHA1 Message Date
bc3be6f9b0 Add the elu cuda kernel. (#114) 2023-07-10 07:57:01 +01:00
c187f347bf Make it easier to use whisper samples from the repo. (#112)
* Make it easier to use samples from the repo.

* Use f32 for accumulation in the f16/bf16 kernels.
2023-07-08 18:48:27 +01:00
eb64ad0d4d Cuda kernel for the conv1d op (#111)
* Boilerplate code for conv1d.

* Boilerplate code for conv1d.

* More boilerplate for conv1d.

* Conv1d work.

* Get the conv1d cuda kernel to work.

* Conv1d support when no batch dim.
2023-07-08 18:13:25 +01:00
e676f85f00 Sketch a fast cuda kernel for reduce-sum. (#109)
* Sketch a fast cuda kernel for reduce-sum.

* Sketch the rust support code for the fast sum kernel.

* More work on the fast kernel.

* Add some testing ground.

* A couple fixes for the fast sum kernel.
2023-07-08 12:43:56 +01:00
c71a38deb7 Tweak the include order to include math.h first. (#100) 2023-07-07 06:47:25 +01:00
f114394456 Include the math.h file to get access to constants. (#99) 2023-07-07 06:42:57 +01:00
fefdc0228a Fixing the cached build.
- rerun-if-change:src/ encapsulates any src modification (including file
  additions).
- Now not rewriting `src/lib.rs` everytime (it triggers new builds.)
- Also using modified timestamp to trigger kernel recompilation (should
  prevent skipping modified source files).
- Will also rewrite when a kernel is removed.
2023-07-05 18:12:17 +02:00
9784d1ed9f Minor tweaks. 2023-07-03 18:31:55 +01:00
313fa022a5 Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous. 2023-06-30 10:43:32 +01:00
8ad47907f3 Add the kernels. 2023-06-30 10:26:56 +01:00
6486a6d7b2 Avoid some cast kernels. 2023-06-29 23:23:44 +01:00
ec79fc43f2 Add the bf16 cuda kernels. 2023-06-29 23:12:02 +01:00
1ea08a19cb Rerun on new files. 2023-06-29 15:59:58 +00:00
b5bdbef53a Fixing kernel cache (a bit brutal for now, but if build triggers,
rebuild ALL kernels).
2023-06-29 15:51:08 +00:00
1ce3843cab Add the relu op. 2023-06-28 09:38:54 +01:00
380d61e990 Fix two cuda bugs (matmul and where_cond). 2023-06-27 11:31:04 +01:00
d7f729fb8f Refactor the hierarchy. 2023-06-27 11:57:27 +02:00