* gemma3: changed RotaryEmbedding base freq based on layer and sliding window
* Changed attention mask per layer, either normal or sliding
* made attention mask creation slightly more efficient by only creating them once per model iteration
* changed is_sliding to an Option
* clippy
* changed to stop on both <eos> and <end_of_turn> instead of either or
* Add the Gemma models.
* Add the gemma example.
* Adapt the RmsNorm.
* Get the 2b model to work.
* 7b support.
* Use the config head dim.
* Yet another fix.
* Make the matrixes contiguous.
* Also get the 7b model to work.
* And add to the readme.