Add the CSM model. (#2862)

* Add the CSM model.

* Add some code to load the model.

* Load the text tokenizer.

* Add frame generation.

* Get the sampling to work.

* Rope fix.

* Autoregressive generation.

* Generate some audio file.

* Use the actual prompt.

* Support multiple turns.

* Add a very barebone readme.

* Move some of the shared bits to the model.
This commit is contained in:
Laurent Mazare
2025-04-04 06:48:03 +02:00
committed by GitHub
parent 9d31361c4f
commit cf9d7bf24c
4 changed files with 791 additions and 0 deletions

View File

@ -0,0 +1,14 @@
# Conversational Speech Model (CSM)
CSM is a speech generation model from Sesame,
[SesameAILabs/csm](https://github.com/SesameAILabs/csm).
It can generate a conversational speech between two different speakers.
The speakers turn are delimited by the `|` character in the prompt.
```bash
cargo run --example csm --features cuda -r -- \
--voices voices.safetensors \
--prompt "Hey how are you doing?|Pretty good, pretty good. How about you?"
```