* Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.
* Backend refactoring. * Metal tweaks. * Move the cudnn module.