mirror of
https://github.com/huggingface/candle.git
synced 2025-06-16 10:38:54 +00:00

* Add the metavoice transformer. * Sketch the speaker-encoder module. * Adding to the metavoice model. * Start adding the metavoice example. * Get some logits out. * Load the second stage model. * Get the second step to run. * Tweak the example. * Add encodec tilting. * Glue the different bits together. * Fix a shape issue. * Use a constant. * BPE tokenization. * Add a warning.
19 lines
629 B
Markdown
19 lines
629 B
Markdown
# candle-metavoice
|
|
|
|
MetaVoice-1B is a text-to-speech model trained on 100K hours of speech, more
|
|
details on the [model
|
|
card](https://huggingface.co/metavoiceio/metavoice-1B-v0.1).
|
|
|
|
Note that the current candle implementation suffers from some limitations as of
|
|
2024-03-02:
|
|
- The speaker embeddings are hardcoded.
|
|
- The generated audio file quality is weaker than the Python implementation,
|
|
probably because of some implementation discrepancies.
|
|
|
|
## Run an example
|
|
|
|
```bash
|
|
cargo run --example metavoice --release -- \\
|
|
--prompt "This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model."
|
|
```
|