diff --git a/candle-examples/examples/stable-diffusion/README.md b/candle-examples/examples/stable-diffusion/README.md
new file mode 100644
index 00000000..ee83b3f9
--- /dev/null
+++ b/candle-examples/examples/stable-diffusion/README.md
@@ -0,0 +1,63 @@
+# candle-stable-diffusion: A Diffusers API in Rust/Candle
+
+![rusty robot holding a candle](./assets/stable-diffusion-xl.jpg)
+
+_A rusty robot holding a fire torch in its hand_, generated by Stable Diffusion
+XL using Rust and [candle](https://github.com/huggingface/candle).
+
+The `stable-diffusion` example is a conversion of
+[diffusers-rs](https://github.com/LaurentMazare/diffusers-rs) using candle
+rather than libtorch. This implementation supports Stable Diffusion v1.5, v2.1,
+as well as Stable Diffusion XL 1.0.
+
+## Getting the weights
+
+The weights are automatically downloaded for you from the [HuggingFace
+Hub](https://huggingface.co/) on the first run. There are various command line
+flags to use local files instead, run with `--help` to learn about them.
+
+## Running some example.
+
+```bash
+cargo run --example stable-diffusion --release --features=cuda,cudnn \
+    -- --prompt "a cosmonaut on a horse (hd, realistic, high-def)"
+```
+
+The final image is named `sd_final.png` by default.
+The default scheduler is the Denoising Diffusion Implicit Model scheduler (DDIM). The
+original paper and some code can be found in the [associated repo](https://github.com/ermongroup/ddim).
+
+### Command-line flags
+
+- `--prompt`: the prompt to be used to generate the image.
+- `--uncond-prompt`: the optional unconditional prompt.
+- `--sd-version`: the Stable Diffusion version to use, can be `v1-5`, `v2-1`, or
+  `xl`.
+- `--cpu`: use the cpu rather than the gpu (much slower).
+- `--height`, `--width`: set the height and width for the generated image.
+- `--n-steps`: the number of steps to be used in the diffusion process.
+- `--num-samples`: the number of samples to generate.
+- `--final-image`: the filename for the generated image(s).
+
+### Using flash-attention
+
+Using flash attention makes image generation a lot faster and uses less memory.
+The downside is some long compilation time. You can set the
+`CANDLE_FLASH_ATTN_BUILD_DIR` environment variable to something like
+`/home/user/.candle` to ensures that the compilation artifacts are properly
+cached.
+
+Enabling flash-attention requires both a feature flag, `--feature flash-attn`
+and using the command line flag `--use-flash-attn`.
+
+## Image to Image Pipeline
+...
+
+## FAQ
+
+### Memory Issues
+
+This requires a GPU with more than 8GB of memory, as a fallback the CPU version can be used
+with the `--cpu` flag but is much slower.
+Alternatively, reducing the height and width with the `--height` and `--width`
+flag is likely to reduce memory usage significantly.
diff --git a/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg b/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg
new file mode 100644
index 00000000..a6f7b6c6
Binary files /dev/null and b/candle-examples/examples/stable-diffusion/assets/stable-diffusion-xl.jpg differ