Add the CSM model. (#2862)

* Add the CSM model. * Add some code to load the model. * Load the text tokenizer. * Add frame generation. * Get the sampling to work. * Rope fix. * Autoregressive generation. * Generate some audio file. * Use the actual prompt. * Support multiple turns. * Add a very barebone readme. * Move some of the shared bits to the model.
2025-06-20 20:09:50 +00:00 · 2025-04-04 06:48:03 +02:00
parent 9d31361c4f
commit cf9d7bf24c
4 changed files with 791 additions and 0 deletions
--- a/candle-examples/examples/csm/README.md
+++ b/candle-examples/examples/csm/README.md
@ -0,0 +1,14 @@
+# Conversational Speech Model (CSM)
+
+CSM is a speech generation model from Sesame,
+[SesameAILabs/csm](https://github.com/SesameAILabs/csm).
+
+It can generate a conversational speech between two different speakers.
+The speakers turn are delimited by the `|` character in the prompt.
+
+```bash
+cargo run --example csm --features cuda -r -- \
+    --voices voices.safetensors  \
+    --prompt "Hey how are you doing?|Pretty good, pretty good. How about you?"
+```
+