Path: blob/master/Applied Generative AI with GANS/ Capstone GAN-based Image Generation using CIFAR-10 using DCGAN CGAN AND WGAN.ipynb
5026 views
1. DCGAN (Deep Convolutional GAN)
Definition
DCGAN is a GAN architecture that uses deep convolutional neural networks instead of fully connected layers for both the Generator and the Discriminator. This significantly improves image generation quality and training stability compared to vanilla GANs.
Architecture
Generator
Input: random noise vector ( z \sim \mathcal{N}(0,1) )
Transposed Convolutions (ConvTranspose)
Batch Normalization
ReLU activations
Output: generated image
Discriminator
Convolution layers
LeakyReLU activations
Batch Normalization
Sigmoid output (real vs fake probability)
Objective Function

loss Function

Why DCGAN Works
CNNs capture spatial structure
BatchNorm stabilizes gradients
Strided convolutions replace pooling
Better feature hierarchy than dense GANs
Example
Dataset: MNIST Input: Random noise Output: Digit-like images Limitation: Cannot choose which digit is generated
Applications
Image synthesis (faces, objects)
Super-resolution
Data augmentation
Artistic image generation
Limitations
No control over generated class
Training instability
Mode collapse
2. Conditional GAN (cGAN)
Motivation
DCGAN generates images randomly without control. Conditional GANs introduce explicit conditioning information to guide generation.
Key Idea
Both Generator and Discriminator are conditioned on auxiliary information ( y ) such as:
Class labels
Attributes
Text embeddings
Architecture Change
Conditioning is implemented by:
Concatenating labels with noise
Embedding labels and injecting into CNN layers
Objective Function

loss

Example
Dataset: CIFAR-10
| Label | Generated Class |
|---|---|
| 0 | Airplane |
| 1 | Automobile |
| 2 | Bird |
| 3 | Cat |
Result: Controlled image generation by class label
Applications
Class-specific image generation
Text-to-image synthesis
Medical image generation by disease type
Speech and voice synthesis
Limitations
Still uses Jensen–Shannon divergence
Training instability persists
Mode collapse not fully resolved
3. WGAN-GP (Wasserstein GAN with Gradient Penalty)
Motivation
Standard GAN losses provide weak gradients when real and generated distributions do not overlap. WGAN replaces divergence-based loss with Wasserstein distance, giving meaningful gradients.
Wasserstein Distance

This measures the minimum cost of transforming one distribution into another.
Architectural Changes
Discriminator becomes Critic
No sigmoid activation
Outputs real-valued score
Enforces 1-Lipschitz constraint
What Is Critic Loss?
The critic does not classify real vs fake. Instead, it assigns higher scores to real samples and lower scores to fake samples
| Discriminator | Critic |
|---|---|
| Outputs probability | Outputs real number |
| Uses sigmoid | No sigmoid |
| Binary classifier | Distance estimator |
| Cross-entropy loss | Wasserstein loss |
Why Critic Loss Works Better ?
Approximates Wasserstein distance
Provides meaningful gradients even when distributions do not overlap
Training loss correlates with sample quality
Reduces mode collapse
Loss Functions

Gradient Penalty
Ensures smoothness and Lipschitz continuity without weight clipping.
Example
Dataset: CelebA Result:
Stable training
Smooth loss curves
High-quality facial images
Minimal mode collapse
Applications
High-resolution image generation
Video synthesis
Audio and speech generation
Scientific and physics simulations
4. Evolution Comparison
| Feature | DCGAN | Conditional GAN | WGAN-GP |
|---|---|---|---|
| CNN-based | Yes | Yes | Yes |
| Conditional control | No | Yes | Yes |
| Stable loss | No | No | Yes |
| Mode collapse | High | Medium | Low |
| Training robustness | Low | Medium | High |
5. Practical Example Pipeline
CIFAR-10 Vehicle Images
DCGAN → random vehicles
Conditional GAN → generate only trucks
WGAN-GP → sharper, stable truck images
6. When to Use Which GAN
| Scenario | Recommended Model |
|---|---|
| Learning GAN fundamentals | DCGAN |
| Controlled generation | Conditional GAN |
| Production training | WGAN-GP |
| High-resolution synthesis | Conditional WGAN-GP |
7. Industry Applications
Healthcare: rare disease data generation
Autonomous driving: scenario simulation
Gaming: texture and asset creation
Retail: product image synthesis
Media: animation and visual effects
8. Summary Table
| GAN Type | Loss Type | Key Parameters | Stability |
|---|---|---|---|
| DCGAN | Cross-Entropy | (D(x), G(z)) | Low |
| cGAN | Cross-Entropy + Condition | (D(x,y), G(z,y)) | Medium |
| WGAN-GP | Wasserstein + GP | (C(x), \lambda, \nabla) | High |
GAN-based Image Generation using CIFAR-10
DCGAN → Conditional GAN → WGAN-GP
Dataset
CIFAR-10 (Canadian Institute for Advanced Research)
Classes
Vehicles: airplane, automobile, ship, truck
Animals: bird, cat, deer, dog, frog, horse
This notebook is fully runnable and generates images during training.
1. Imports & Global Configuration
2. Load CIFAR-10 Dataset
3. Visualize Real CIFAR-10 Images
4. DCGAN (Baseline Image GAN)
DCGAN uses convolutional layers for stable image generation.
DCGAN Training & Output
5. Conditional GAN (Vehicle vs Animal)
Generator and Discriminator are conditioned on CIFAR-10 labels.
Conditional Generation Demo
Generate images by specifying class labels:
Vehicle classes: 0,1,8,9
Animal classes: 2–7
6. WGAN-GP (Industry-Preferred Stable GAN)
Uses Wasserstein loss + Gradient Penalty.
WGAN-GP Training & Output
Final Comparison
DCGAN: Good baseline image quality
Conditional GAN: Controlled generation (vehicle vs animal)
WGAN-GP: Most stable and industry-preferred