mirror of https://github.com/huggingface/candle.git synced 2025-06-15 10:26:33 +00:00

Files

Laurent Mazare 4fff5b51f5 Metavoice - first cut (#1717 )

* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Add a warning.

2024-03-02 18:50:01 +01:00

main.rs

Metavoice - first cut (#1717 )

2024-03-02 18:50:01 +01:00

README.md

Metavoice - first cut (#1717 )

2024-03-02 18:50:01 +01:00

README.md

candle-metavoice

MetaVoice-1B is a text-to-speech model trained on 100K hours of speech, more details on the model card.

Note that the current candle implementation suffers from some limitations as of 2024-03-02:

The speaker embeddings are hardcoded.
The generated audio file quality is weaker than the Python implementation, probably because of some implementation discrepancies.

Run an example

cargo run --example metavoice --release -- \\
  --prompt "This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model."