* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
* Add flash-attention for the stable-diffusion example.
* Change the dtype.
* Silly fix.
* Another fix.
* Revert the dtype back to the query dtype after apply flash-attn.
* Track the conv2d operations in stable-diffusion.
* Add more tracing to stable-diffusion.
* Also trace the resnet bits.
* Trace the attention blocks.
* Also trace the attention inner part.
* Small tweak.
* Start adding a stable-diffusion example.
* Proper computation of the causal mask.
* Add the chunk operation.
* Work in progress: port the attention module.
* Add some dummy modules for conv2d and group-norm, get the attention module to compile.
* Re-enable the 2d convolution.
* Add the embeddings module.
* Add the resnet module.
* Add the unet blocks.
* Add the unet.
* And add the variational auto-encoder.
* Use the pad function from utils.