* Boilerplate code for conv1d.
* Boilerplate code for conv1d.
* More boilerplate for conv1d.
* Conv1d work.
* Get the conv1d cuda kernel to work.
* Conv1d support when no batch dim.
* Sketch a fast cuda kernel for reduce-sum.
* Sketch the rust support code for the fast sum kernel.
* More work on the fast kernel.
* Add some testing ground.
* A couple fixes for the fast sum kernel.
- rerun-if-change:src/ encapsulates any src modification (including file
additions).
- Now not rewriting `src/lib.rs` everytime (it triggers new builds.)
- Also using modified timestamp to trigger kernel recompilation (should
prevent skipping modified source files).
- Will also rewrite when a kernel is removed.