
* Add stable diffusion 3 example Add get_qkv_linear to handle different dimensionality in linears Add stable diffusion 3 example Add use_quant_conv and use_post_quant_conv for vae in stable diffusion adapt existing AutoEncoderKLConfig to the change add forward_until_encoder_layer to ClipTextTransformer rename sd3 config to sd3_medium in mmdit; minor clean-up Enable flash-attn for mmdit impl when the feature is enabled. Add sd3 example codebase add document crediting references pass the cargo fmt test pass the clippy test * fix typos * expose cfg_scale and time_shift as options * Replace the sample image with JPG version. Change image output format accordingly. * make meaningful error messages * remove the tail-end assignment in sd3_vae_vb_rename * remove the CUDA requirement * use default_value in clap args * add use_flash_attn to turn on/off flash-attn for MMDiT at runtime * resolve clippy errors and warnings * use default_value_t * Pin the web-sys dependency. * Clippy fix. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
3.0 KiB
candle-stable-diffusion-3: Candle Implementation of Stable Diffusion 3 Medium
A cute rusty robot holding a candle torch in its hand, with glowing neon text "LETS GO RUSTY" displayed on its chest, bright background, high quality, 4k
Stable Diffusion 3 Medium is a text-to-image model based on Multimodal Diffusion Transformer (MMDiT) architecture.
Getting access to the weights
The weights of Stable Diffusion 3 Medium is released by Stability AI under the Stability Community License. You will need to accept the conditions and acquire a license by visiting the repo on HuggingFace Hub to gain access to the weights for your HuggingFace account.
On the first run, the weights will be automatically downloaded from the Huggingface Hub. You might be prompted to configure a Huggingface User Access Tokens (recommended) on your computer if you haven't done that before. After the download, the weights will be cached and remain accessible locally.
Running the model
cargo run --example stable-diffusion-3 --release --features=cuda -- \
--height 1024 --width 1024 \
--prompt 'A cute rusty robot holding a candle torch in its hand, with glowing neon text \"LETS GO RUSTY\" displayed on its chest, bright background, high quality, 4k'
To display other options available,
cargo run --example stable-diffusion-3 --release --features=cuda -- --help
If GPU supports, Flash-Attention is a strongly recommended feature as it can greatly improve the speed of inference, as MMDiT is a transformer model heavily depends on attentions. To utilize candle-flash-attn in the demo, you will need both --features flash-attn
and --use-flash-attn
.
cargo run --example stable-diffusion-3 --release --features=cuda,flash-attn -- --use-flash-attn ...
Performance Benchmark
Below benchmark is done by generating 1024-by-1024 image from 28 steps of Euler sampling and measure the average speed (iteration per seconds).
candle and candle-flash-attn is based on the commit of 0d96ec3.
System specs (Desktop PCIE 5 x8/x8 dual-GPU setup):
- Operating System: Ubuntu 23.10
- CPU: i9 12900K w/o overclocking.
- RAM: 64G dual-channel DDR5 @ 4800 MT/s
Speed (iter/s) | w/o flash-attn | w/ flash-attn |
---|---|---|
RTX 3090 Ti | 0.83 | 2.15 |
RTX 4090 | 1.72 | 4.06 |