Some Stable Diffusion Research
I got curious how Stable Diffusion works.
Here’s some links:
- https://github.com/AUTOMATIC1111/stable-diffusion-webui
- https://github.com/Stability-AI/stablediffusion?tab=readme-ov-file
- https://github.com/facebookresearch/xformers?tab=readme-ov-file
- https://bbycroft.net/llm
- https://www.rand.org/pubs/research_reports/RRA2849-1.html
Diffusion models, in short:
- PNG -> Some “latest space”
- Compressed with a “variational encoder / decoder”
- Add noise according to a gaussian
- Pass (prompt,noise,latent_img, timestep) into a U-Net
- U-Net predicts noise that was added at that timestep
- A scheduler/sampler (e.g. DDPM, Euler) subtract predicted noise
- Pass through decoder
PNG -> Noise -> U-Net -> New PNG.
PNG -vaencoder> Latent image -gaussian_noise> latent+noise -unet> new_latent -vadecoder> New PNG