Conditional Diffusion Image Generation
A conditional DDPM and latent diffusion model for high-quality image synthesis.
Overview
This project explores the capabilities of Denoising Diffusion Probabilistic Models (DDPMs) for conditional image generation. It bridges the gap between theoretical probability distributions and high-fidelity visual synthesis.
Technical Details
- Conditional DDPM: Developed a conditional DDPM in PyTorch incorporating:
- Sinusoidal time embeddings.
- Learnable label embeddings.
- SpatialTransformer modules for attention mechanisms.
- Results: Trained on MNIST, achieving a Structural Similarity Index (SSIM) of 0.98 in 1,000 denoising steps.
- Latent Diffusion Extension: Extended the architecture to Latent Diffusion Models (LDMs). Integrated VAE decoding to generate images from the AFHQ-cat dataset.
- Benchmarking: Generated 50,000 samples achieving a Fréchet Inception Distance (FID) of 13.2, marking a 2.4 point improvement over the baseline.
Tech Stack
- PyTorch
- Generative AI (Diffusion, VAEs)
- Computer Vision