* Quantized phi in a separate file.
* Add the quantized phi example + rework the model code.
* Improve the phi model.
* Get some generation out.
* Use the appropriate rope shape.
* Tweak the default prompt.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
* add support for l3b, new tokenizer
* add todo
* Add todo and use k_s model
* Use the official tokenizers.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Utilize batches in Stable Diffusion that were already there, but unutilized.
Also refactor out the `save_image` function.
* Clippy + cosmetic fixes.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Start adding the recurrent-gemma model.
* More griffin.
* Add the example + get the weights to load from the HF version.
* More inference code.
* Rope + kv-cache on the attention side.
* Add to the inference code.
* Add more to the recurrent gemma inference.
* Get some first inference to run.
* Add the softcap on logits.
* Fixes.
* Use partial rotary embeddings.
* Get inference to work.
* Add a comment.
* And add a readme.
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* Add flag to use f16
* Avoid breaking the quantized version on cuda.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Pass bos token at the beginning of tensor.
* Quantize moondream.
* Forward with image bos token.
* Clippy.
* Use q4_0 quantization.
* Add pointers for sequence and tokens; Remove seq_len conditional
* Add more cuda kernels for quantized matmul.
* Add the vec-dot bits.
* Expose the quantized matmul-vec kernels.
* Also include the quantize-q8-1 kernel.
* Glue code for the q8-1 quantization.
* mm-vec product via q8-1 quantization.
* Add a test.
* Add a mm test.
* Get the test to return some sensible results.
* Also test dmmv.
* Fix the launch params.
* Allow for tweaking the force_dmmv parameter while it's experimental.
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* CLIP model implementation with example
* CLIP Implementation fixes, batch images
* CLIP model remove images from git
* CLIP model remove unnecessary use of batch_indices