Files
candle/candle-examples/examples/stable-diffusion-3/README.md
Czxck001 ca7cf5cb3b Add Stable Diffusion 3 Example (#2558)
* Add stable diffusion 3 example

Add get_qkv_linear to handle different dimensionality in linears

Add stable diffusion 3 example

Add use_quant_conv and use_post_quant_conv for vae in stable diffusion

adapt existing AutoEncoderKLConfig to the change

add forward_until_encoder_layer to ClipTextTransformer

rename sd3 config to sd3_medium in mmdit; minor clean-up

Enable flash-attn for mmdit impl when the feature is enabled.

Add sd3 example codebase

add document

crediting references

pass the cargo fmt test

pass the clippy test

* fix typos

* expose cfg_scale and time_shift as options

* Replace the sample image with JPG version. Change image output format accordingly.

* make meaningful error messages

* remove the tail-end assignment in sd3_vae_vb_rename

* remove the CUDA requirement

* use default_value in clap args

* add use_flash_attn to turn on/off flash-attn for MMDiT at runtime

* resolve clippy errors and warnings

* use default_value_t

* Pin the web-sys dependency.

* Clippy fix.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-10-13 22:08:40 +02:00

3.0 KiB

candle-stable-diffusion-3: Candle Implementation of Stable Diffusion 3 Medium

A cute rusty robot holding a candle torch in its hand, with glowing neon text "LETS GO RUSTY" displayed on its chest, bright background, high quality, 4k

Stable Diffusion 3 Medium is a text-to-image model based on Multimodal Diffusion Transformer (MMDiT) architecture.

Getting access to the weights

The weights of Stable Diffusion 3 Medium is released by Stability AI under the Stability Community License. You will need to accept the conditions and acquire a license by visiting the repo on HuggingFace Hub to gain access to the weights for your HuggingFace account.

On the first run, the weights will be automatically downloaded from the Huggingface Hub. You might be prompted to configure a Huggingface User Access Tokens (recommended) on your computer if you haven't done that before. After the download, the weights will be cached and remain accessible locally.

Running the model

cargo run --example stable-diffusion-3 --release --features=cuda -- \
  --height 1024 --width 1024 \
  --prompt 'A cute rusty robot holding a candle torch in its hand, with glowing neon text \"LETS GO RUSTY\" displayed on its chest, bright background, high quality, 4k'

To display other options available,

cargo run --example stable-diffusion-3 --release --features=cuda -- --help

If GPU supports, Flash-Attention is a strongly recommended feature as it can greatly improve the speed of inference, as MMDiT is a transformer model heavily depends on attentions. To utilize candle-flash-attn in the demo, you will need both --features flash-attn and --use-flash-attn.

cargo run --example stable-diffusion-3 --release --features=cuda,flash-attn -- --use-flash-attn ...

Performance Benchmark

Below benchmark is done by generating 1024-by-1024 image from 28 steps of Euler sampling and measure the average speed (iteration per seconds).

candle and candle-flash-attn is based on the commit of 0d96ec3.

System specs (Desktop PCIE 5 x8/x8 dual-GPU setup):

  • Operating System: Ubuntu 23.10
  • CPU: i9 12900K w/o overclocking.
  • RAM: 64G dual-channel DDR5 @ 4800 MT/s
Speed (iter/s) w/o flash-attn w/ flash-attn
RTX 3090 Ti 0.83 2.15
RTX 4090 1.72 4.06