* Start adding the recurrent-gemma model.
* More griffin.
* Add the example + get the weights to load from the HF version.
* More inference code.
* Rope + kv-cache on the attention side.
* Add to the inference code.
* Add more to the recurrent gemma inference.
* Get some first inference to run.
* Add the softcap on logits.
* Fixes.
* Use partial rotary embeddings.
* Get inference to work.
* Add a comment.
* And add a readme.
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* Add flag to use f16
* Avoid breaking the quantized version on cuda.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Pass bos token at the beginning of tensor.
* Quantize moondream.
* Forward with image bos token.
* Clippy.
* Use q4_0 quantization.
* Add pointers for sequence and tokens; Remove seq_len conditional
* Add more cuda kernels for quantized matmul.
* Add the vec-dot bits.
* Expose the quantized matmul-vec kernels.
* Also include the quantize-q8-1 kernel.
* Glue code for the q8-1 quantization.
* mm-vec product via q8-1 quantization.
* Add a test.
* Add a mm test.
* Get the test to return some sensible results.
* Also test dmmv.
* Fix the launch params.
* Allow for tweaking the force_dmmv parameter while it's experimental.
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* CLIP model implementation with example
* CLIP Implementation fixes, batch images
* CLIP model remove images from git
* CLIP model remove unnecessary use of batch_indices
* Avoid copying the data on squeeze and unsqueeze.
* Fix the quantized llama example.
* Unrelated fix for the quantized stable-lm example on cuda.
* Fix for mamba on cuda (unrelated to the PR).
* Add a --seed argument to the stable-diffusion example.
* Make the case when no seed is specified, that it will not be set, but use the engine's default. This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.
---------
Co-authored-by: niklas <niklas@appli.se>
* Normalize loudness of the generated audio.
* Lints.
* One more lint.
* Avoid running the bs1770 tests.
* Another attempt at discarding doc comments.
* Also normalize the loudness in the encodec example.