Files
Laurent Mazare cf9d7bf24c Add the CSM model. (#2862)
* Add the CSM model.

* Add some code to load the model.

* Load the text tokenizer.

* Add frame generation.

* Get the sampling to work.

* Rope fix.

* Autoregressive generation.

* Generate some audio file.

* Use the actual prompt.

* Support multiple turns.

* Add a very barebone readme.

* Move some of the shared bits to the model.
2025-04-04 06:48:03 +02:00
..
2025-04-04 06:48:03 +02:00
2025-04-04 06:48:03 +02:00

Conversational Speech Model (CSM)

CSM is a speech generation model from Sesame, SesameAILabs/csm.

It can generate a conversational speech between two different speakers. The speakers turn are delimited by the | character in the prompt.

cargo run --example csm --features cuda -r -- \
    --voices voices.safetensors  \
    --prompt "Hey how are you doing?|Pretty good, pretty good. How about you?"