* add xlm-roberta-base
* Add task enum for fill-mask and reranker in xlm-roberta example; update README and fix attention mask dimensions
- Introduced a new `Task` enum to replace string task identifiers in the xlm-roberta example.
- Updated the logic in `main.rs` to handle tasks using the new enum.
- Enhanced README with example output for fill-mask task.
- Fixed dimension retrieval in `prepare_4d_attention_mask` function for better clarity and safety.
* Clippy fix.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Stella_en_1.5B_v5
* Separated creation. This is a critical step for numerical accuracy and would be documented in the readme
* EmbedDim would require clone and copy
* WIP: example
* Examples added
* a litte more in README
* start to impl chinese clip
* impl vision model
* copy code from bert
* refactor use
* refactor use again
* fix text model
* refactor
* try to fix text model
* tuning
* tuning chinese clip
* delete useless code
* revert code
* Clippy fixes.
* Also apply cargo fmt.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add Pixtral.
* More pixtral vision encoder.
* Sketch a pixtral example.
* Sketch a pixtral example.
* Better image loading.
* Support loading images embedded in safetensor files.
* Clippy fixes.
* Add the llava multimodal adapter.
* Add more of the llava bits.
* Add the pixtral config.
* More pixtral inference.
* Add the text generation bits.
* Get the example to work.
* Bugfix.
* Run some bits of the model in f32.
* Blessed version :)
* Better rope frequency computations.
* README update.
* Add the SigLIP model.
* Add more to the forward pass of the vision model.
* Complete the forward pass.
* Add the siglip example.
* Fix.
* Another fix.
* Get everything in place.
* Add a readme.
* Adding Granite 7b Instruct model example
* Minor refactoring to make it a little more idiomatic
* Clippy fixes.
* * Adding a README with some information about supported Granite models
* Changing the default prompt to accomodate better the Language
modality of the Granite 7b Instruct model
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add the mimi audio-tokenizer.
* Formatting tweaks.
* Add a full example.
* Use the transformers names.
* More renamings.
* Get encoding and decoding to work.
* Clippy fixes.
* Allow loading images with given std and mean
* OpenCLIP text encoder component
* Two MobileCLIP models
* Clippy fixes.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Start sketching parler-tts support.
* Implement the attention.
* Add the example code.
* Fix the example.
* Add the description + t5 encode it.
* More of the parler forward pass.
* Fix the positional embeddings.
* Support random sampling in generation.
* Handle EOS.
* Add the python decoder.
* Proper causality mask.
* Add the flux autoencoder.
* Add the encoder down-blocks.
* Upsampling in the decoder.
* Sketch the flow matching model.
* More flux model.
* Add some of the positional embeddings.
* Add the rope embeddings.
* Add the sampling functions.
* Add the flux example.
* Fix the T5 bits.
* Proper T5 tokenizer.
* Clip encoder path fix.
* Get the clip embeddings.
* No configurable weights in layer norm.
* More weights related fixes.
* Yet another shape fix.
* DType fix.
* Fix a couple more shape issues.
* DType fixes.
* Fix the latent dims.
* Fix more shape issues.
* Autoencoder fixes.
* Get some generations out.
* Bugfix.
* T5 padding.
* Clippy fix.
* Add the decode only mode.
* Fix.
* More fixes.
* Finally get some generations to work.
* Add readme.
* Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 )
* Remove extra files + update README to download them + remove extra lines
* minor fix (README remove extra spaces)
* minor fix (README: Fix image url)
* Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions )
* Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt'
* Another clippy fix.
---------
Co-authored-by: x-VEspit <vincent.espitalier@cirad.fr>
Co-authored-by: laurent <laurent.mazare@gmail.com>
* define structs
* construct ResidualConvUnit
* forward() for ResidualConvUnit
* implement FeatureFusionBlock
* implement Scratch
* implement DPTHead
* add identity module
* implement forward for DTPHead
* add get_intermediate_layers to DinoVisionTransformer
* implement DepthAnythingV2
* some minor tweaks
* fix compile errors
* fix var builder prefixes
* setup initial example
* use fixed patch size of 37 (518 / 14)
* debugged until output
* print min and max values
* add some dynamism to the output location
* scale input image
* extract prep function
* extract output path function
* normalize image with magic mean and std
* add spectral coloring
* squeeze in the right place
* make enterpolation optional
* use bail instead of panic
* omit unnecessary Shape call
* remove empty curly braces
* use bail instead of assert
* use vb and pp
* remove closures
* extract config object
* Apply rustfmt.
* Fix some clippy lints.
* More lints.
* Use the array methods.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
* Separate quantized phi-3 implementation.
* Integrate the quantized phi3 model.=
* Small fixes, get the generation to work properly.
* Keep the old llama implementation around.
* Change the default.
* Quantized phi in a separate file.
* Add the quantized phi example + rework the model code.
* Improve the phi model.
* Get some generation out.
* Use the appropriate rope shape.
* Tweak the default prompt.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
* Start adding the recurrent-gemma model.
* More griffin.
* Add the example + get the weights to load from the HF version.
* More inference code.
* Rope + kv-cache on the attention side.
* Add to the inference code.
* Add more to the recurrent gemma inference.
* Get some first inference to run.
* Add the softcap on logits.
* Fixes.
* Use partial rotary embeddings.
* Get inference to work.
* Add a comment.
* And add a readme.
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Pass bos token at the beginning of tensor.
* Quantize moondream.
* Forward with image bos token.
* Clippy.
* Use q4_0 quantization.
* Add pointers for sequence and tokens; Remove seq_len conditional
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* CLIP model implementation with example
* CLIP Implementation fixes, batch images
* CLIP model remove images from git
* CLIP model remove unnecessary use of batch_indices
* Add the metavoice transformer.
* Sketch the speaker-encoder module.
* Adding to the metavoice model.
* Start adding the metavoice example.
* Get some logits out.
* Load the second stage model.
* Get the second step to run.
* Tweak the example.
* Add encodec tilting.
* Glue the different bits together.
* Fix a shape issue.
* Use a constant.
* BPE tokenization.
* Add a warning.
* Encodec model.
* Fixes.
* Add the padding functions.
* Get the LSTM bit to work.
* Get the encodec model to generate some tokens (decoder only for now).
* Minor tweak.
* Minor tweak.