* Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.
* Add the phi-3 model. * Faster rope. * Bugfix. * Fix the detokenization.