* Fix the block size for some cuda kernels. * Bump the version number to 0.4.1.
This crate contains Metal kernels used from candle.