15 commitsLast commit ≈ 4 months ago1 star0 forks
xNoise
Description
Understanding Noise in Text-to-Image Generation
A PyTorch implementation of CLIP-guided Denoising Diffusion Probabilistic Models (DDPM) focused on explainability. This project analyzes how diffusion models encode semantic information in noise patterns.
Key Features
- Explainability-Focused: Visualize and analyze noise patterns during generation
- CLIP-Guided: Text-conditioned image generation using CLIP embeddings
- Classifier-Free Guidance: Adjustable guidance strength for semantic control
- Educational: Complete tutorial notebook with theory and implementation

Quick Start
# Install
pip install -e .
# Train on Tiny ImageNet
python examples/train.py --config config/model_default.yaml
# Explore the analysis notebook
jupyter notebook examples/explain_and_visualize.ipynb
Project Structure
├── examples/
│ ├── explain_and_visualize.ipynb # Main tutorial (START HERE)
│ ├── train_model.ipynb # Training walkthrough
│ └── train.py # Training script
├── src/ddpm_clip/
│ ├── models/ # UNet, DDPM, EMA
│ ├── data/ # CLIP dataset and preprocessing
│ └── utils/ # Visualization and config utilities
└── config/ # Model configurations (small/default/large)
Training
The training script supports multiple configurations and automatic checkpointing:
# Basic training
python examples/train.py --config config/model_default.yaml
# Skip CLIP extraction (if already done)
python examples/train.py --config config/model_small.yaml --skip-clip-extraction
# Disable animation generation
python examples/train.py --config config/model_large.yaml --no-animation
Training automatically resumes from the latest checkpoint if one exists.
Analysis
The explain_and_visualize.ipynb notebook provides:
- Theoretical foundations of diffusion models
- Noise pattern analysis and interpretation
- Timestep attribution to identify critical denoising steps
- Guidance visualization to understand text conditioning effects
- Generation animations and trajectory analysis
Dataset
Tested on Tiny ImageNet (200 classes, 64×64 images). CLIP embeddings are automatically extracted during training or can be precomputed.
References
License
MIT License - see LICENSE
Related projects
XAI4SFM
Understanding Latent Space Dynamics and Noise Patterns in Generative AI for Scientific Applications
Updated 4 months ago
Finished