stable diffusion — papers

AIIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 · Dec 2021 Open access

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser and Björn Ommer

The paper proposes latent diffusion models (LDMs), which apply the diffusion process in the compressed latent space of a pretrained autoencoder rather than directly in pixel space, greatly reducing compute. A cross-attention conditioning mechanism enables flexible inputs such as text and bounding boxes for tasks including text-to-image generation, inpainting, and super-resolution. LDMs achieve strong or state-of-the-art results across these tasks while being far more efficient to train and sample, and this architecture underlies Stable Diffusion.

latent diffusion image synthesis generative models computer vision