From e82fcf1c594b54c105f1a3979a09f3d2e044a2e0 Mon Sep 17 00:00:00 2001 From: Laurent Mazare Date: Tue, 12 Sep 2023 18:21:24 +0200 Subject: [PATCH] Add more example readmes. (#828) * Add more readmes. * Add a readme for dinov2. * Add some skeleton files for a couple more examples. * More whisper details. --- candle-examples/examples/bert/README.md | 44 ++++++++++++++++++++ candle-examples/examples/bigcode/README.md | 7 ++++ candle-examples/examples/dinov2/README.md | 19 +++++++++ candle-examples/examples/falcon/README.md | 3 ++ candle-examples/examples/quantized/README.md | 2 +- candle-examples/examples/whisper/README.md | 39 +++++++++++++++++ 6 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 candle-examples/examples/bert/README.md create mode 100644 candle-examples/examples/bigcode/README.md create mode 100644 candle-examples/examples/dinov2/README.md create mode 100644 candle-examples/examples/falcon/README.md create mode 100644 candle-examples/examples/whisper/README.md diff --git a/candle-examples/examples/bert/README.md b/candle-examples/examples/bert/README.md new file mode 100644 index 00000000..82ca5f40 --- /dev/null +++ b/candle-examples/examples/bert/README.md @@ -0,0 +1,44 @@ +# candle-bert + +Bert is a general large language model. In this example it can be used for two +different tasks: +- Compute sentence embeddings for a prompt. +- Compute similarities between a set of sentences. + + +## Sentence embeddings + +Bert is used to compute the sentence embeddings for a prompt. The model weights +are downloaded from the hub on the first run. + +```bash +cargo run --example bert --release -- --prompt "Here is a test sentence" + +> [[[ 0.0798, -0.0665, -0.0247, ..., -0.1082, -0.1000, -0.2751], +> [ 0.4218, 0.2690, 0.2740, ..., 0.3889, 1.3503, 0.9908], +> [ 0.0466, 0.3041, -0.1143, ..., 0.4427, 0.6926, -0.1515], +> ... +> [ 0.3396, 0.4320, -0.4408, ..., 0.9212, 0.2331, -0.6777], +> [ 0.2789, 0.7539, 0.4306, ..., -0.0095, 0.3375, -1.7529], +> [ 0.6737, 0.7882, 0.0548, ..., 0.1836, 0.7299, -0.6617]]] +> Tensor[[1, 7, 384], f32] +``` + +## Similarities + +In this example, Bert is used to compute the sentence embeddings for a set of +sentences (hardcoded in the examples). Then cosine similarities are computed for +each sentence pair and they are reported by decreasing values, hence the first +reported pair contains the two sentences that have the highest similarity score. +The sentence embeddings are computed using average pooling through all the +sentence tokens, including some potential padding. + +```bash +cargo run --example bert --release + +> score: 0.85 'The new movie is awesome' 'The new movie is so great' +> score: 0.61 'The cat sits outside' 'The cat plays in the garden' +> score: 0.52 'I love pasta' 'Do you like pizza?' +> score: 0.23 'The new movie is awesome' 'Do you like pizza?' +> score: 0.22 'I love pasta' 'The new movie is awesome' +``` diff --git a/candle-examples/examples/bigcode/README.md b/candle-examples/examples/bigcode/README.md new file mode 100644 index 00000000..0b593674 --- /dev/null +++ b/candle-examples/examples/bigcode/README.md @@ -0,0 +1,7 @@ +# candle-starcoder: code generation model + +StarCoder/BigCode is a LLM model specialized to code generation. + +```bash +cargo run --example bigcode --release -- --prompt "fn fact(n: u64) -> u64 " +``` diff --git a/candle-examples/examples/dinov2/README.md b/candle-examples/examples/dinov2/README.md new file mode 100644 index 00000000..10d4ac1f --- /dev/null +++ b/candle-examples/examples/dinov2/README.md @@ -0,0 +1,19 @@ +# candle-dinov2 + +[DINOv2](https://github.com/facebookresearch/dinov2) is a computer vision model. +In this example, it is used as an ImageNet classifier: the model returns the +probability for the image to belong to each of the 1000 ImageNet categories. + +## Running some example + +```bash +cargo run --example dinov2 --release -- --image candle-examples/examples/yolo-v8/assets/bike.jpg + +> mountain bike, all-terrain bike, off-roader: 43.67% +> bicycle-built-for-two, tandem bicycle, tandem: 33.20% +> crash helmet : 13.23% +> unicycle, monocycle : 2.44% +> maillot : 2.42% +``` + +![Leading group, Giro d'Italia 2021](../yolo-v8/assets/bike.jpg) diff --git a/candle-examples/examples/falcon/README.md b/candle-examples/examples/falcon/README.md new file mode 100644 index 00000000..267c78c2 --- /dev/null +++ b/candle-examples/examples/falcon/README.md @@ -0,0 +1,3 @@ +# candle-falcon + +Falcon is a general large language model. diff --git a/candle-examples/examples/quantized/README.md b/candle-examples/examples/quantized/README.md index f3159493..ee4f3420 100644 --- a/candle-examples/examples/quantized/README.md +++ b/candle-examples/examples/quantized/README.md @@ -24,7 +24,7 @@ cargo run --example quantized --release -- --prompt "The best thing about coding > The best thing about coding in rust is 1.) that I don’t need to worry about memory leaks, 2.) speed and 3.) my program will compile even on old machines. ``` -### Command-line flags +## Command-line flags Run with `--help` to see all options. diff --git a/candle-examples/examples/whisper/README.md b/candle-examples/examples/whisper/README.md new file mode 100644 index 00000000..124cd182 --- /dev/null +++ b/candle-examples/examples/whisper/README.md @@ -0,0 +1,39 @@ +# candle-whisper: speech recognition + +An implementation of [OpenAI Whisper](https://github.com/openai/whisper) using +candle. Whisper is a general purpose speech recognition model, it can be used to +convert audio files (in the `.wav` format) to text. Supported features include +language detection as well as multilingual speech recognition. + +## Running some example + +If no audio file is passed as input, a [sample +file](https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav) is automatically downloaded +from the hub. + +```bash + cargo run --example whisper --release + +> No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav +> loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 } +> pcm data loaded 176000 +> loaded mel: [1, 80, 3000] +> 0.0s -- 30.0s: And so my fellow Americans ask not what your country can do for you ask what you can do for your country + ``` + + In order to use the multilingual mode, specify a multilingual model via the + `--model` flag, see the details below. + +## Command line flags + +- `--input`: the audio file to be converted to text, in wav format. +- `--language`: force the language to some specific value rather than being + detected, e.g. `en`. +- `--task`: the task to be performed, can be `transcribe` (return the text data + in the original language) or `translate` (translate the text to English). +- `--timestamps`: enable the timestamp mode where some timestamps are reported + for each recognized audio extracts. +- `--model`: the model to be used. Models that do not end with `-en` are + multilingual models, other ones are English only models. The supported models + are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`, + `medium.en`, `large`, and `large-v2`.