Use the hub model file when possible. (#1190)

* Use the hub model file when possible. * And add a mention in the main readme.
2025-06-15 02:16:37 +00:00 · 2023-10-26 20:00:50 +01:00
parent c8e197f68c
commit 0ec5ebcec4
3 changed files with 71 additions and 7 deletions
--- a/candle-examples/examples/jina-bert/README.md
+++ b/candle-examples/examples/jina-bert/README.md
@ -0,0 +1,45 @@
+# candle-jina-bert
+
+Jina-Bert is a general large language model with a context size of 8192, [model
+card](https://huggingface.co/jinaai/jina-embeddings-v2-base-en). In this example
+it can be used for two different tasks:
+- Compute sentence embeddings for a prompt.
+- Compute similarities between a set of sentences.
+
+
+## Sentence embeddings
+
+Jina-Bert is used to compute the sentence embeddings for a prompt. The model weights
+are downloaded from the hub on the first run.
+
+```bash
+cargo run --example jina-bert --release -- --prompt "Here is a test sentence"
+
+> [[[ 0.1595, -0.9885,  0.6494, ...,  0.3003, -0.6901, -1.2355],
+>   [ 0.0374, -0.1798,  1.3359, ...,  0.6731,  0.2133, -1.6807],
+>   [ 0.1700, -0.8534,  0.8924, ..., -0.1785, -0.0727, -1.5087],
+>   ...
+>   [-0.3113, -1.3665,  0.2027, ..., -0.2519,  0.1711, -1.5811],
+>   [ 0.0907, -1.0492,  0.5382, ...,  0.0242, -0.7077, -1.0830],
+>   [ 0.0369, -0.6343,  0.6105, ...,  0.0671,  0.3778, -1.1505]]]
+> Tensor[[1, 7, 768], f32]
+```
+
+## Similarities
+
+In this example, Jina-Bert is used to compute the sentence embeddings for a set of
+sentences (hardcoded in the examples). Then cosine similarities are computed for
+each sentence pair and they are reported by decreasing values, hence the first
+reported pair contains the two sentences that have the highest similarity score.
+The sentence embeddings are computed using average pooling through all the
+sentence tokens, including some potential padding.
+
+```bash
+cargo run --example jina-bert --release
+
+> score: 0.94 'The new movie is awesome' 'The new movie is so great'
+> score: 0.81 'The cat sits outside' 'The cat plays in the garden'
+> score: 0.78 'I love pasta' 'Do you like pizza?'
+> score: 0.68 'I love pasta' 'The new movie is awesome'
+> score: 0.67 'A man is playing guitar' 'A woman watches TV'
+```
--- a/candle-examples/examples/jina-bert/main.rs
+++ b/candle-examples/examples/jina-bert/main.rs
@ -35,19 +35,37 @@ struct Args {
    normalize_embeddings: bool,

    #[arg(long)]
-    tokenizer: String,
+    tokenizer: Option<String>,

    #[arg(long)]
-    model: String,
+    model: Option<String>,
 }

 impl Args {
    fn build_model_and_tokenizer(&self) -> anyhow::Result<(BertModel, tokenizers::Tokenizer)> {
+        use hf_hub::{api::sync::Api, Repo, RepoType};
+        let model = match &self.model {
+            Some(model_file) => std::path::PathBuf::from(model_file),
+            None => Api::new()?
+                .repo(Repo::new(
+                    "jinaai/jina-embeddings-v2-base-en".to_string(),
+                    RepoType::Model,
+                ))
+                .get("model.safetensors")?,
+        };
+        let tokenizer = match &self.tokenizer {
+            Some(file) => std::path::PathBuf::from(file),
+            None => Api::new()?
+                .repo(Repo::new(
+                    "sentence-transformers/all-MiniLM-L6-v2".to_string(),
+                    RepoType::Model,
+                ))
+                .get("tokenizer.json")?,
+        };
        let device = candle_examples::device(self.cpu)?;
        let config = Config::v2_base();
-        let tokenizer = tokenizers::Tokenizer::from_file(&self.tokenizer).map_err(E::msg)?;
-        let vb =
-            unsafe { VarBuilder::from_mmaped_safetensors(&[&self.model], DType::F32, &device)? };
+        let tokenizer = tokenizers::Tokenizer::from_file(tokenizer).map_err(E::msg)?;
+        let vb = unsafe { VarBuilder::from_mmaped_safetensors(&[model], DType::F32, &device)? };
        let model = BertModel::new(vb, &config)?;
        Ok((model, tokenizer))
    }