* Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul.
This crate contains Metal kernels used from candle.