tadata
Back to home

GANs: The Art of Adversarial Learning

#deep-learning#generative-ai#gan#neural-networks

Generative Adversarial Networks (GANs), introduced by Goodfellow et al. in 2014, pit two neural networks against each other in a minimax game. The generator creates fake data; the discriminator tries to tell real from fake. Through competition, the generator learns to produce increasingly realistic outputs.

Architecture

GAN Training Loop
==================

Random noise z ──► ┌─────────────┐
                   │  Generator  │──► Fake data
                   │     G(z)    │        │
                   └─────────────┘        │
                                          ▼
                                   ┌──────────────┐
Real data x ──────────────────────►│ Discriminator │──► Real / Fake
                                   │    D(x)       │
                                   └──────────────┘
                                          │
                              ┌───────────┴───────────┐
                              │  D wants to classify   │
                              │  correctly (max)       │
                              │  G wants to fool D     │
                              │  (min)                 │
                              └────────────────────────┘

The objective function:

minGmaxDExpdata[logD(x)]+Ezpz[log(1D(G(z)))]\min_G \max_D \mathbb{E}_{x \sim p_{data}}[\log D(x)] + \mathbb{E}_{z \sim p_z}[\log(1 - D(G(z)))]

GAN Variants

VariantYearKey InnovationBest For
DCGAN2015Convolutional architecture for stable trainingImage generation baseline
WGAN2017Wasserstein distance, gradient penaltyTraining stability
Conditional GAN2014Class-conditioned generationControlled output
Pix2Pix2017Paired image-to-image translationSupervised translation
CycleGAN2017Unpaired image translation via cycle consistencyStyle transfer
StyleGAN2019Style-based generator, progressive growingHigh-res face synthesis
StyleGAN32021Alias-free generationVideo-ready synthesis
GigaGAN2023Scaled GAN to 1B paramsText-to-image at scale

Training Challenges

GANs are notoriously hard to train:

Mode collapse: the generator produces only a few types of outputs that fool the discriminator, ignoring the full data distribution. The WGAN loss function and minibatch discrimination help mitigate this.

Training instability: the generator and discriminator must stay in balance. If the discriminator becomes too strong, gradients vanish for the generator. If too weak, the generator gets no useful signal. Techniques: spectral normalization, progressive growing, two-timescale update rule (TTUR).

Evaluation: there is no single loss that correlates with output quality. Common metrics:

  • FID (Fréchet Inception Distance): lower is better, measures distribution similarity
  • IS (Inception Score): higher is better, measures quality and diversity
  • LPIPS: perceptual similarity metric

GANs vs Diffusion Models

Diffusion models have largely replaced GANs for image generation since 2022:

AspectGANsDiffusion Models
TrainingAdversarial (unstable)Denoising (stable, simple loss)
Mode coverageProne to mode collapseFull distribution coverage
Sample qualityExcellent (when trained well)Excellent
Sampling speedFast (single forward pass)Slow (many denoising steps)
ControllabilityLimited without conditioningStrong (classifier-free guidance)
Current statusNiche uses, researchDominant (DALL-E 3, Stable Diffusion, Midjourney)

GANs remain relevant for real-time applications (super-resolution, video enhancement) where single-pass inference speed matters, and for discriminator-based techniques in other generative pipelines.

Practical Applications Still Using GANs

  • Super-resolution: ESRGAN and Real-ESRGAN for upscaling images and video
  • Data augmentation: generating synthetic training data for imbalanced classes
  • Anomaly detection: the discriminator as an out-of-distribution detector
  • Image inpainting: filling missing regions with context-aware content
  • Domain adaptation: transferring styles between domains (medical imaging, satellite)