画像生成用のDiffusion Modelの基礎チュートリアル - 数理から研究紹介まで

1272498237haoyu 115 views 17 slides Aug 23, 2024
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

佐藤いまり研究室に所属していたころに作ったDiffusion Modelに関するチュートリアルです。数理的な原理から発展的な研究紹介までカバーしています。


Slide Content

Tutorial on Diffusion Models Sato Imari Lab. Haoyu Wang (M2) 2023.07.10

1 Denoising Diffusion Models Learning to generate by denoising Data Noise Reverse denoising process (generative) Denoising diffusion models consist of two processes ( EASY! ) : Forward diffusion process that gradually adds noise to input Reverse denoising process that learns to generate data by denoising Forward diffusion process (fixed)

2 But how to add noise/denoise? DDPM vs. DDIM DDPM Adding noise (adding noise step by step) (adding all noise) Denoise Same initial noise -> various results Strictly step-by-step denoising ( SLOW! )   Data Noise Forward diffusion process (fixed) x x 1 x 2 x 3 x 4 … x T

3 But how to add noise/denoise? DDPM vs. DDIM DDIM Adding noise ( SAME noise for all steps! ) (adding all noise) Denoise Same initial noise, same skipping -> same result Free to skip any number of steps ( FAST! )   Data Noise Forward diffusion process (fixed) x x 1 x 2 x 3 x 4 … x T

4 Why is this generative? How to understand the variation during the process? For DDPM: High Randomness on the first few steps! For DDIM: Highly depends on the initial noise!

5 Diffusion Models and Differential Equations What are we doing during the reverse diffusion? Ordinary Differential Equation (ODE): Form Numerical Solutions: Euler method, Runge- Kutta method, … DDIM: Consider predicted noise as derivative, then DDIM = Euler method on ODE ODE solvers for DDIM allow faster convergence!   Stochastic Differential Equation (SDE): Form DDPM: Consider predicted noise as drift, and random noise as diffusion, then DDPM = Euler method on SDE

6 Step towards Stable Diffusion From RGB to latent, from unconditioned to conditioned Stable Diffusion Text-to-image model ( Conditional! ) U-Net shaped network Compressing RGB image with VAE Conditioning using attention (Transformer) Condition

7 Building blocks Fully convolutional VAE Resnet layers processing latent and time Transformer layers mixing latent with condition Input image size don’t need to be fixed! Step towards Stable Diffusion From RGB to latent, from unconditioned to conditioned

8 Finetuning the Diffusion Model Generating with unseen conditions ControlNet Adding additional condition while keeping the original network Ex. adding edge detection results as condition Not very memory/computation efficient! Additional Condition

9 Finetuning the Diffusion Model Generating objects with unseen style / text Custom Attention / LoRA F inetuning with unseen text-image pairs Custom Attention Replace original transformer block with a new one Losing original network! LoRA (Low-Rank Adaption) Adding low-rank transformer layer to original transformer layers , Ex. Generating Disney-style characters Tiny additional network & fast training & keeping original network!  

Other Applications of Diffusion Model Image Editing SDEdit Turn rough drawings to realistic images Add some noise to the drawing Forward diffusion brings two distributions close to each other Denoise to generate realistic image 10

Other Applications of Diffusion Model Image-to-Image Translation Monochromatic to RGB Learn to generate RGB image conditioning on the monochromatic one 11

Other Applications of Diffusion Model Generating unseen objects with few samples 12 Few-shot Diffusion Model Extract condition vector from sample set with ViT Size of image set is highly limited Visual quality is not very good

Other Applications of Diffusion Model Label- efficient semantic segmentation 13 Using representation from pretrained diffusion models Using limited number of labels (20~50) Train a small set of ensembled MLPs for segmentation

14 Other Applications of Diffusion Model Label- efficient semantic segmentation Experimental results show that the proposed method outperform Masked Autoencoders, GAN and VAE-based models. Using all images (when learning the diffusion model); label quality seems good

15 Other Generative Models Image-label Pair Generation with GAN DatasetGAN Using pretrained GANs Using limited number of labels (20~30) Train a small set of ensembled MLPs for generating labels Using all images (when training the GAN model); label quality seems good

1 6 Insight on Research Generating image-label pairs for few-shot learning Cell images with different staining often seem very different It’s impossible to cover all variants in dataset Pretrained generative models show ability to learn segmentation with small sample size Can we generate image-label pairs of cell images with few samples (for instance segmentation)? Or using similar method to perform instance segmentation directly?