Use the hub model file when possible. (#1190)

* Use the hub model file when possible. * And add a mention in the main readme.
2025-06-15 10:26:33 +00:00 · 2023-10-26 20:00:50 +01:00
parent c8e197f68c
commit 0ec5ebcec4
3 changed files with 71 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -56,7 +56,7 @@ These online demos run entirely in your browser:
 - [T5](https://huggingface.co/spaces/radames/Candle-T5-Generation-Wasm): text generation.
 - [Phi-v1.5](https://huggingface.co/spaces/radames/Candle-Phi-1.5-Wasm): text generation.
 - [Segment Anything Model](https://huggingface.co/spaces/radames/candle-segment-anything-wasm): Image segmentation.
- [Blip](https://huggingface.co/spaces/radames/Candle-BLIP-Image-Captioning): image captioning.
+- [BLIP](https://huggingface.co/spaces/radames/Candle-BLIP-Image-Captioning): image captioning.

 We also provide a some command line based examples using state of the art models:

@ -96,7 +96,8 @@ We also provide a some command line based examples using state of the art models
 <img src="https://github.com/huggingface/candle/raw/main/candle-examples/examples/segment-anything/assets/sam_merged.jpg" width="200">

 - [Whisper](./candle-examples/examples/whisper/): speech recognition model.
- [T5](./candle-examples/examples/t5), [Bert](./candle-examples/examples/bert/): useful for sentence embeddings.
+- [T5](./candle-examples/examples/t5), [Bert](./candle-examples/examples/bert/),
+  [JinaBert](./candle-examples/examples/jina-bert/) : useful for sentence embeddings.
 - [DINOv2](./candle-examples/examples/dinov2/): computer vision model trained
  using self-supervision (can be used for imagenet classification, depth
  evaluation, segmentation).
--- a/candle-examples/examples/jina-bert/README.md
+++ b/candle-examples/examples/jina-bert/README.md
@ -0,0 +1,45 @@
+# candle-jina-bert
+
+Jina-Bert is a general large language model with a context size of 8192, [model
+card](https://huggingface.co/jinaai/jina-embeddings-v2-base-en). In this example
+it can be used for two different tasks:
+- Compute sentence embeddings for a prompt.
+- Compute similarities between a set of sentences.
+
+
+## Sentence embeddings
+
+Jina-Bert is used to compute the sentence embeddings for a prompt. The model weights
+are downloaded from the hub on the first run.
+
+```bash
+cargo run --example jina-bert --release -- --prompt "Here is a test sentence"
+
+> [[[ 0.1595, -0.9885,  0.6494, ...,  0.3003, -0.6901, -1.2355],
+>   [ 0.0374, -0.1798,  1.3359, ...,  0.6731,  0.2133, -1.6807],
+>   [ 0.1700, -0.8534,  0.8924, ..., -0.1785, -0.0727, -1.5087],
+>   ...
+>   [-0.3113, -1.3665,  0.2027, ..., -0.2519,  0.1711, -1.5811],
+>   [ 0.0907, -1.0492,  0.5382, ...,  0.0242, -0.7077, -1.0830],
+>   [ 0.0369, -0.6343,  0.6105, ...,  0.0671,  0.3778, -1.1505]]]
+> Tensor[[1, 7, 768], f32]
+```
+
+## Similarities
+
+In this example, Jina-Bert is used to compute the sentence embeddings for a set of
+sentences (hardcoded in the examples). Then cosine similarities are computed for
+each sentence pair and they are reported by decreasing values, hence the first
+reported pair contains the two sentences that have the highest similarity score.
+The sentence embeddings are computed using average pooling through all the
+sentence tokens, including some potential padding.
+
+```bash
+cargo run --example jina-bert --release
+
+> score: 0.94 'The new movie is awesome' 'The new movie is so great'
+> score: 0.81 'The cat sits outside' 'The cat plays in the garden'
+> score: 0.78 'I love pasta' 'Do you like pizza?'
+> score: 0.68 'I love pasta' 'The new movie is awesome'
+> score: 0.67 'A man is playing guitar' 'A woman watches TV'
+```
--- a/candle-examples/examples/jina-bert/main.rs
+++ b/candle-examples/examples/jina-bert/main.rs
@ -35,19 +35,37 @@ struct Args {
    normalize_embeddings: bool,

    #[arg(long)]
-    tokenizer: String,
+    tokenizer: Option<String>,

    #[arg(long)]
-    model: String,
+    model: Option<String>,
 }

 impl Args {
    fn build_model_and_tokenizer(&self) -> anyhow::Result<(BertModel, tokenizers::Tokenizer)> {
+        use hf_hub::{api::sync::Api, Repo, RepoType};
+        let model = match &self.model {
+            Some(model_file) => std::path::PathBuf::from(model_file),
+            None => Api::new()?
+                .repo(Repo::new(
+                    "jinaai/jina-embeddings-v2-base-en".to_string(),
+                    RepoType::Model,
+                ))
+                .get("model.safetensors")?,
+        };
+        let tokenizer = match &self.tokenizer {
+            Some(file) => std::path::PathBuf::from(file),
+            None => Api::new()?
+                .repo(Repo::new(
+                    "sentence-transformers/all-MiniLM-L6-v2".to_string(),
+                    RepoType::Model,
+                ))
+                .get("tokenizer.json")?,
+        };
        let device = candle_examples::device(self.cpu)?;
        let config = Config::v2_base();
-        let tokenizer = tokenizers::Tokenizer::from_file(&self.tokenizer).map_err(E::msg)?;
-        let vb =
-            unsafe { VarBuilder::from_mmaped_safetensors(&[&self.model], DType::F32, &device)? };
+        let tokenizer = tokenizers::Tokenizer::from_file(tokenizer).map_err(E::msg)?;
+        let vb = unsafe { VarBuilder::from_mmaped_safetensors(&[model], DType::F32, &device)? };
        let model = BertModel::new(vb, &config)?;
        Ok((model, tokenizer))
    }