Introduction to Stable Diffusion (Overview)

wrasheed1980 181 views 9 slides Aug 28, 2024
Slide 1
Slide 1 of 9
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9

About This Presentation

introduction to stable diffusion


Slide Content

Stable Diffusion Intro

Overview Description: A Latent Diffusion model that can run on most consumer hardware equipped with a modest GPU with at least 8 GB VRAM Developers CompVis Group, Ludwig Maximilian University of Munich Runway Sponsor/Donor: Stability AI, EleutherAI , and LAION Series A funding: Lightspeed Venture Partners, and Coatue Management Dataset contributors: Various NPOs Training cost: $660,000 Sources: Hugging Face Spaces (Stability AI), and DreamStudio Primarily use To generate detailed images conditioned on text descriptions (prompts) Additional use Inpainting Outpainting generating image-to-image translations guided by a text prompt https://techpp.com/2022/10/10/how-to-train-stable-diffusion-ai-dreambooth/

LDM Architecture variational autoencoder (VAE) VAE Decoder

SD Architecture Uses Diffusion Model from CompVis , LMU Munich, Germany 860 million parameters in the U-Net (with Resnet Backbone) 123 million parameters in the text encoder Considered relatively lightweight by 2022 standards SD v2 uses Xformers , and XL uses additional reinforcement/knowledge Model fine-tuning methods… Dreambooth Textual Inversion LoRA Hypernetworks Aesthetic Gradient

Fine-tuning method: Dreambooth https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth , Hypernetworks) : r/ StableDiffusion (reddit.com)

Fine-tuning method: Textual Inversion https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth , Hypernetworks) : r/ StableDiffusion (reddit.com)

Fine-tuning method: LoRA https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth , Hypernetworks) : r/ StableDiffusion (reddit.com)

Fine-tuning method: HyperNetworks https://youtu.be/dVjMiJsuR5o Well-Researched Comparison of Training Techniques (Lora, Inversion, Dreambooth , Hypernetworks) : r/ StableDiffusion (reddit.com)

Training Create a Dataset Fine tuning options Dreambooth notebook Image dataset 512px x 512px prior_preservation_class_prompt auto complete using CLIP tokenizer Web UI over SD v1 and v2 and hypernetwork Image dataset 512px x 512px Unique dataset folder name and images as a unique name series Starts from pre-trained checkpoint file (. ckpt ) Auto annotation (class prompt) text file generated using BLIP with textual inversion UI supports gradio outlook (conventional) Streamlit outlook (new and in progress)