Course Outcome 4 2 Investigate auto encoders techniques in deep learning.
Contents 3 Auto encoders: Under complete auto encoders, regularized encoders, stochastic encoders and decoders Deep generative models: Boltzmann Machines, restricted Boltzmann machines, Deep Belief networks, Deep Boltzmann machines for real world data Introduction to Generative Adversarial Networks(GANs) and its applications
Reference 4 Chapters 20, Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016 What Are GANs? | Generative Adversarial Networks Tutorial | Deep Learning Tutorial | Simplilearn https://www.youtube.com/watch?v=MZmNxvLDdV0 https://www.geeksforgeeks.org/
What Are GANs? 5
Generator vs Discriminator 6 Discriminative learning, as in, we’d like to be able to discriminate between photos of cats and photos of dogs. Given a large corpus of photographs of faces, we might want to be able to generate a new photorealistic image that looks like it might plausibly have come from the same dataset. This kind of learning is called generative modeling. A data generator is good if we cannot tell fake data apart from real data. Improve the data generator until it generates something that resembles the real data. At the very least, it needs to fool the classifier/Discriminator
Generative Adversarial Networks (GANs) 7 A Generative Adversarial Network (GAN) consists of two neural networks: Generator (G) : Creates fake data samples trying to mimic real data. Discriminator (D) : Distinguishes between real and fake samples. Both networks compete: G tries to generate realistic data to fool D . D learns to improve its ability to detect fake data. Over time, G generates realistic samples that are indistinguishable from real data.
Training GAN 8
Numerical Example 9 Let's assume we want to generate numbers that follow a simple real data distribution (e.g., real numbers centered around 5 with some small variation. ).
Generator G(z) 10 The Generator takes random noise z∼N (0,1) and transforms it into a data sample using a Neural network G(z)= W G ⋅z+b G Let's initialize: W G = 2 , b G = 1 If z=0.5, then: G(0.5)=2(0.5)+1=2 The Generated/fake sample is 2 , which is far from 5.
Discriminator D(x) 11 The Discriminator outputs the probability that a given input is real. D(x)= σ( W D ⋅x+b D ) Let's initialize: WD= 0.5 , b D = −1 If input x=5 (real sample), then: D(5)= σ(0.5(5)−1)=σ(2.5)≈0.92 If input x=2 (fake sample from G), then: D(2)= σ(0.5(2)−1)=σ(0)=0.5 So, D correctly identifies real samples with high probability and fake ones with lower probability
Training 12 Discriminator Update Suppose we have: Real Sample 𝑥 real =5, Fake Sample (Generated by G) 𝐺(𝑧)=2 The discriminator outputs: 𝐷(5)=𝜎(0.5(5)−1)=𝜎(2.5)≈0.92 D (2)= σ(0.5(2)−1)=σ(0)=0.5 Discriminator loss: 𝐿 𝐷 = − log𝐷(5) − log(1−𝐷(2)) = − log(0.92) − log(1−0.5)=0.08+0.69=0.77 We update 𝑊 𝐷 ,𝑏 𝐷 using gradient descent to decrease this loss.
Training 13 Generator Update Generator loss: 𝐿 𝐺 =log(1−𝐷(𝐺(𝑧))) =log(1−0.5)=log(0.5)=−0.69 We update 𝑊 𝐺 ,𝑏 𝐺 using gradient descent to decrease this loss, leading to more realistic samples. Alternately minimize − log ( 𝐷 (G(z)) )
Training 14 After Multiple Iterations Suppose after training, G improves and generates 4.8 , which is close to 5 . D now finds it hard to distinguish between real and fake samples. Summary Initially, G generated poor samples (e.g., 2 instead of 5). D successfully classified real vs. fake samples. After training, G learned to generate samples close to 5 , making it hard for D to differentiate.
15 # Initialize Generator (G) and Discriminator (D) Initialize G and D with random weights # Define loss function and optimizer loss_function = BinaryCrossEntropyLoss () optimizer_G = Adam( G.parameters (), lr = learning_rate ) optimizer_D = Adam( D.parameters (), lr = learning_rate ) # Set hyperparameters for training num_D_updates = USER_DEFINED # Number of times to update D per iteration num_G_updates = USER_DEFINED # Number of times to update G per iteration # Training Loop for each epoch:
16 for each batch of real images ( x_real ) from dataset: # -------- Train Discriminator (D) multiple times -------- for _ in range( num_D_updates ): # Sample random noise (z) z = Sample from normal distribution # Generate fake images x_fake = G(z).detach() # Detach to avoid updating G # Compute D's predictions D_real = D( x_real ) # Probability of real images D_fake = D( x_fake ) # Probability of fake images # Compute Discriminator Loss loss_D = -[ log( D_real ) + log(1 - D_fake ) ] # Backpropagate and update D optimizer_D.zero_grad () loss_D.backward () optimizer_D.step ()
17 for each batch of real images ( x_real ) from dataset: # -------- Train Discriminator (D) multiple times -------- ……… prev slide # -------- Train Generator (G) multiple times -------- for _ in range( num_G_updates ): # Sample new noise (z) z = Sample from normal distribution # Generate new fake images x_fake = G(z) # Compute Generator Loss (wants D to think fake is real) D_fake = D( x_fake ) loss_G = - log( D_fake ) # Backpropagate and update G optimizer_G.zero_grad () loss_G.backward () optimizer_G.step () # Print loss values for monitoring Print(epoch, loss_D , loss_G )
Training Summary 18 Train 𝐷 (Discriminator) Classifies real vs. fake images. Uses Binary Cross-Entropy (BCE) Loss. Updates 𝐷 to improve classification. Train 𝐺 (Generator)Generates fake images. Tries to fool 𝐷 into thinking fake images are real. Updates 𝐺 based on 𝐷's feedback. Repeat the process until convergence𝐺 improves at generating realistic images.𝐷 improves at distinguishing real vs. fake.
DCGAN 19 Deep convolutional GAN (DCGAN): a very basic GAN where the generator and discriminator are deep convnets. Key Features: Convolutions instead of fully connected layers for better spatial structure. Batch Normalization to stabilize training. LeakyReLU in the discriminator to allow gradient flow. Tanh Activation for output images. Example: For generating 64×64 images: G : Input (100D noise) → ConvTranspose ( upsampling ) → Tanh Output (64×64×3) D : Input (64×64×3) → Convolutions → Sigmoid Output (real/fake)
DCGAN 20
DCGAN 21 First, we’ll develop a discriminator model that takes as input a candidate image (real or synthetic) and classifies it into one of two classes: “generated image” or “real image that comes from the training set.” Next, let’s develop a generator model that turns a vector (from the latent space— during training it will be sampled at random) into a candidate image. Finally, we’ll set up the GAN, which chains the generator and the discriminator. When trained, this model will move the generator in a direction that improves its ability to fool the discriminator.
Major Applications 22 Image Generation & Enhancement Application : GANs can generate high-quality, realistic images from noise. DeepFake : GANs can synthesize realistic human faces and videos.
DeepFake 23
Major Applications 24 Image Generation & Enhancement Application : GANs can generate high-quality, realistic images from noise. StyleGAN : Generates high-resolution human-like portraits.
StyleGAN 25
StyleGAN 26
Training StyleGAN 27 Mapping Latent Space: A random noise vector (𝑧) is first mapped to an intermediate latent space (𝑊) using a fully connected neural network, which helps in better disentangling features. Style-Based Synthesis: The mapped latent code (𝑤) is injected into different layers of the generator using Adaptive Instance Normalization ( AdaIN ) to control features at multiple levels. Progressive Growing: Training starts with low-resolution images (e.g., 4×4) and progressively increases resolution (up to 1024×1024), improving stability and quality. Discriminator Training: A discriminator learns to differentiate between real and fake images while the generator continuously improves to create more realistic images. Stochastic Noise Injection: Random noise is added at different stages to introduce fine details (e.g., hair strands, skin texture) without affecting overall image structure.
Major Applications 28 Image Generation & Enhancement Application : GANs can generate high-quality, realistic images from noise. Super-Resolution GAN (SRGAN) : Enhances low-resolution images into high-definition ones.
SRGAN 29
Training SRGAN 30 Generator Training: The generator takes a low-resolution (LR) image as input and learns to upsample it to a high-resolution (HR) image using deep convolutional layers. Perceptual Loss Optimization: Instead of just minimizing pixel-wise differences, SRGAN uses perceptual loss, which includes content loss (based on feature maps from a pretrained VGG network) and adversarial loss to make the generated image look more realistic. Discriminator Training: A discriminator network is trained to differentiate between real high-resolution images and the generated ones, pushing the generator to produce more photo-realistic outputs. Adversarial Training: The generator and discriminator are trained in a GAN framework, where the generator improves by fooling the discriminator, and the discriminator refines its ability to detect fake images. Fine Detail Enhancement: By combining pixel, perceptual, and adversarial losses, SRGAN generates super-resolved images with sharper textures and more realistic details compared to traditional upscaling methods.
Major Applications 31 Data Augmentation for AI Models Application : GANs create synthetic training data when real data is scarce. Example : Medical Imaging : GANs generate synthetic medical images (e.g., MRIs, X-rays) to train models without privacy concerns. Autonomous Driving : GANs simulate realistic driving scenarios for training self-driving AI.
Major Applications 32 Text-to-Image: GANs can transform textual descriptions into images. DALL·E : Converts text descriptions into images.
DALLE Architecture 33
DALLE Learning 34 Input-Output Pairs: DALL-E is presented with an image-text pair. The image acts as the desired output for the given text. Prediction: Based on its current understanding, DALL-E tries to generate an image from the text. Error Calculation: The difference between DALL-E’s generated image and the actual image (from the dataset) is measured. This difference is termed as “error” or “loss.” Backpropagation: Using this error, the model adjusts its internal parameters to reduce the error for subsequent predictions. Iteration: Steps 2 to 4 are repeated millions of times, refining DALL-E’s understanding with each iteration.
Major Applications 35 Image-to-Image Translation: Modify image styles. Example : CycleGAN : Transforms images from one domain to another Pix2Pix : Converts sketches into realistic images
Major Limitations 36 Mode Collapse Problem : The generator produces only a limited variety of outputs instead of capturing the full diversity of the training data. Example : If training on human face images, the generator might repeatedly produce similar-looking faces instead of diverse individuals. Why? The generator learns to fool the discriminator with a few samples and stops exploring new variations.
Major Limitations 37 Mode Collapse
Major Limitations 38 Training Instability Problem : GANs are difficult to train and often suffer from unstable convergence, meaning they may fail to generate meaningful outputs. Example : The discriminator may become too strong, preventing the generator from learning, or vice versa, leading to poor-quality samples. Why? The min-max game between generator and discriminator can lead to vanishing gradients, making training unstable.
Major Limitations 39 Lack of Interpretability & Evaluation Problem : There is no clear metric to evaluate GAN performance objectively. Example : Unlike classification models with accuracy or loss functions, evaluating how "realistic" a generated image is remains subjective. Why? Metrics like Inception Score (IS) and Fréchet Inception Distance (FID) exist but do not always align with human perception.