Metavoice - first cut (#1717)

* Add the metavoice transformer. * Sketch the speaker-encoder module. * Adding to the metavoice model. * Start adding the metavoice example. * Get some logits out. * Load the second stage model. * Get the second step to run. * Tweak the example. * Add encodec tilting. * Glue the different bits together. * Fix a shape issue. * Use a constant. * BPE tokenization. * Add a warning.
2025-06-20 12:06:35 +00:00 · 2024-03-02 18:50:01 +01:00
parent 314630638d
commit 4fff5b51f5
6 changed files with 1117 additions and 0 deletions
--- a/candle-examples/examples/metavoice/README.md
+++ b/candle-examples/examples/metavoice/README.md
@ -0,0 +1,18 @@
+# candle-metavoice
+
+MetaVoice-1B is a text-to-speech model trained on 100K hours of speech, more
+details on the [model
+card](https://huggingface.co/metavoiceio/metavoice-1B-v0.1).
+
+Note that the current candle implementation suffers from some limitations as of
+2024-03-02:
+- The speaker embeddings are hardcoded.
+- The generated audio file quality is weaker than the Python implementation,
+  probably because of some implementation discrepancies.
+
+## Run an example
+
+```bash
+cargo run --example metavoice --release -- \\
+  --prompt "This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model."
+```