Add more example readmes. (#828)

* Add more readmes. * Add a readme for dinov2. * Add some skeleton files for a couple more examples. * More whisper details.
2025-06-15 18:28:24 +00:00 · 2023-09-12 18:21:24 +02:00
parent 805bf9ffa7
commit e82fcf1c59
6 changed files with 113 additions and 1 deletions
--- a/candle-examples/examples/whisper/README.md
+++ b/candle-examples/examples/whisper/README.md
@ -0,0 +1,39 @@
+# candle-whisper: speech recognition
+
+An implementation of [OpenAI Whisper](https://github.com/openai/whisper) using
+candle. Whisper is a general purpose speech recognition model, it can be used to
+convert audio files (in the `.wav` format) to text. Supported features include
+language detection as well as multilingual speech recognition.
+
+## Running some example
+
+If no audio file is passed as input, a [sample
+file](https://huggingface.co/datasets/Narsil/candle-examples/resolve/main/samples_jfk.wav) is automatically downloaded
+from the hub.
+
+```bash
+ cargo run --example whisper --release
+
+> No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
+> loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
+> pcm data loaded 176000
+> loaded mel: [1, 80, 3000]
+> 0.0s -- 30.0s:  And so my fellow Americans ask not what your country can do for you ask what you can do for your country
+ ```
+
+ In order to use the multilingual mode, specify a multilingual model via the
+ `--model` flag, see the details below.
+
+## Command line flags
+
+- `--input`: the audio file to be converted to text, in wav format.
+- `--language`: force the language to some specific value rather than being
+  detected, e.g. `en`.
+- `--task`: the task to be performed, can be `transcribe` (return the text data
+  in the original language) or `translate` (translate the text to English). 
+- `--timestamps`: enable the timestamp mode where some timestamps are reported
+  for each recognized audio extracts.
+- `--model`: the model to be used. Models that do not end with `-en` are
+  multilingual models, other ones are English only models. The supported models
+  are `tiny`, `tiny.en`, `base`, `base.en`, `small`, `small.en`, `medium`,
+  `medium.en`, `large`, and `large-v2`.