Add the SigLIP model. (#2515)

* Add the SigLIP model. * Add more to the forward pass of the vision model. * Complete the forward pass. * Add the siglip example. * Fix. * Another fix. * Get everything in place. * Add a readme.
2025-06-21 04:10:46 +00:00 · 2024-09-28 23:48:00 +02:00
parent 62525e8352
commit 261ed65f36
8 changed files with 797 additions and 54 deletions
--- a/candle-examples/examples/siglip/README.md
+++ b/candle-examples/examples/siglip/README.md
@ -0,0 +1,24 @@
+## SigLIP
+
+SigLIP is multi-modal text-vision model that improves over CLIP by using a sigmoid based loss,
+[HuggingFace](https://huggingface.co/google/siglip-base-patch16-224).
+
+### Running an example
+```
+$ cargo run --features cuda -r --example siglip -
+softmax_image_vec: [2.1912122e-14, 2.3624872e-14, 1.0, 1.0, 2.4787932e-8, 3.2784535e-12]
+
+
+Results for image: candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg
+
+Probability: 0.0000% Text: a cycling race 
+Probability: 0.0000% Text: a photo of two cats 
+Probability: 100.0000% Text: a robot holding a candle 
+
+
+Results for image: candle-examples/examples/yolo-v8/assets/bike.jpg
+
+Probability: 100.0000% Text: a cycling race 
+Probability: 0.0000% Text: a photo of two cats 
+Probability: 0.0000% Text: a robot holding a candle 
+```