* Avoid copying the data on squeeze and unsqueeze. * Fix the quantized llama example. * Unrelated fix for the quantized stable-lm example on cuda. * Fix for mamba on cuda (unrelated to the PR).
* Rename to candle-core. * More candle-core renaming.
* Introduce the strided blocks. * Use the strided blocks to fasten the copy. * Add more testing.