mirror of
https://github.com/huggingface/candle.git
synced 2025-06-15 02:16:37 +00:00

* Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and *is* correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
candle-whisper: speech recognition
An implementation of OpenAI Whisper using
candle. Whisper is a general purpose speech recognition model, it can be used to
convert audio files (in the .wav
format) to text. Supported features include
language detection as well as multilingual speech recognition.
Running some example
If no audio file is passed as input, a sample file is automatically downloaded from the hub.
cargo run --example whisper --release
> No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
> loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
> pcm data loaded 176000
> loaded mel: [1, 80, 3000]
> 0.0s -- 30.0s: And so my fellow Americans ask not what your country can do for you ask what you can do for your country
In order to use the multilingual mode, specify a multilingual model via the
--model
flag, see the details below.
Command line flags
--input
: the audio file to be converted to text, in wav format.--language
: force the language to some specific value rather than being detected, e.g.en
.--task
: the task to be performed, can betranscribe
(return the text data in the original language) ortranslate
(translate the text to English).--timestamps
: enable the timestamp mode where some timestamps are reported for each recognized audio extracts.--model
: the model to be used. Models that do not end with-en
are multilingual models, other ones are English only models. The supported models aretiny
,tiny.en
,base
,base.en
,small
,small.en
,medium
,medium.en
,large
, andlarge-v2
.