diff --git a/candle-examples/examples/stable-diffusion/README.md b/candle-examples/examples/stable-diffusion/README.md new file mode 100644 index 00000000..ee83b3f9 --- /dev/null +++ b/candle-examples/examples/stable-diffusion/README.md @@ -0,0 +1,63 @@ +# candle-stable-diffusion: A Diffusers API in Rust/Candle + +![rusty robot holding a candle](./assets/stable-diffusion-xl.jpg) + +_A rusty robot holding a fire torch in its hand_, generated by Stable Diffusion +XL using Rust and [candle](https://github.com/huggingface/candle). + +The `stable-diffusion` example is a conversion of +[diffusers-rs](https://github.com/LaurentMazare/diffusers-rs) using candle +rather than libtorch. This implementation supports Stable Diffusion v1.5, v2.1, +as well as Stable Diffusion XL 1.0. + +## Getting the weights + +The weights are automatically downloaded for you from the [HuggingFace +Hub](https://huggingface.co/) on the first run. There are various command line +flags to use local files instead, run with `--help` to learn about them. + +## Running some example. + +```bash +cargo run --example stable-diffusion --release --features=cuda,cudnn \ + -- --prompt "a cosmonaut on a horse (hd, realistic, high-def)" +``` + +The final image is named `sd_final.png` by default. +The default scheduler is the Denoising Diffusion Implicit Model scheduler (DDIM). The +original paper and some code can be found in the [associated repo](https://github.com/ermongroup/ddim). + +### Command-line flags + +- `--prompt`: the prompt to be used to generate the image. +- `--uncond-prompt`: the optional unconditional prompt. +- `--sd-version`: the Stable Diffusion version to use, can be `v1-5`, `v2-1`, or + `xl`. +- `--cpu`: use the cpu rather than the gpu (much slower). +- `--height`, `--width`: set the height and width for the generated image. +- `--n-steps`: the number of steps to be used in the diffusion process. +- `--num-samples`: the number of samples to generate. +- `--final-image`: the filename for the generated image(s). + +### Using flash-attention + +Using flash attention makes image generation a lot faster and uses less memory. +The downside is some long compilation time. You can set the +`CANDLE_FLASH_ATTN_BUILD_DIR` environment variable to something like +`/home/user/.candle` to ensures that the compilation artifacts are properly +cached. + +Enabling flash-attention requires both a feature flag, `--feature flash-attn` +and using the command line flag `--use-flash-attn`. + +## Image to Image Pipeline +... + +## FAQ + +### Memory Issues + +This requires a GPU with more than 8GB of memory, as a fallback the CPU version can be used +with the `--cpu` flag but is much slower. +Alternatively, reducing the height and width with the `--height` and `--width` +flag is likely to reduce memory usage significantly. diff --git a/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg b/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg new file mode 100644 index 00000000..a6f7b6c6 Binary files /dev/null and b/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg differ