* Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.