* Start updating to cudarc 0.14.
* Adapt a couple more things.
* And a couple more fixes.
* More tweaks.
* And a couple more fixes.
* Bump the major version number.
* Proper module system for the cuda kernels.
* Proper ptx loading.
* Launch the sort kernel.
* Custom op.
* Start using the builder pattern.
* More builder.
* More builder.
* Get candle-core to compile.
* Get the tests to pass.
* Get candle-nn to work too.
* Support for custom cuda functions.
* cudnn fixes.
* Get flash attn to run.
* Switch the crate versions to be alpha.
* Bump the ug dependency.
- rerun-if-change:src/ encapsulates any src modification (including file
additions).
- Now not rewriting `src/lib.rs` everytime (it triggers new builds.)
- Also using modified timestamp to trigger kernel recompilation (should
prevent skipping modified source files).
- Will also rewrite when a kernel is removed.