TorchDiff is a PyTorch-based library for building and experimenting with diffusion models, inspired by leading research papers.
The TorchDiff 2.0.0 release includes implementations of five major diffusion model families:
- DDPM (Denoising Diffusion Probabilistic Models)
- DDIM (Denoising Diffusion Implicit Models)
- SDE-based Diffusion
- LDM (Latent Diffusion Models)
- UnCLIP (the model powering OpenAI’s DALL·E 2)
These models support both conditional (e.g., text-to-image) and unconditional generation.
TorchDiff is designed with modularity in mind. Each model is broken down into reusable components:
- Forward Diffusion: Adds noise (e.g.,
ForwardDDPM
). - Reverse Diffusion: Removes noise to recover data (e.g.,
ReverseDDPM
). - Variance Scheduler: Controls noise schedules (e.g.,
VarianceSchedulerDDPM
). - Training: Full training pipelines (e.g.,
TrainDDPM
). - Sampling: Efficient inference and generation (e.g.,
SampleDDPM
).
Additional utilities:
- Noise Predictor: A U-Net-like model with attention and time embeddings.
- Text Encoder: Transformer-based (e.g., BERT) for conditional generation.
- Metrics: Evaluation suite including MSE, PSNR, SSIM, FID, and LPIPS.
Here’s a minimal working example to train and sample with DDPM on dummy data:
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torchdiff.ddpm import VarianceSchedulerDDPM, ForwardDDPM, ReverseDDPM, TrainDDPM, SampleDDPM
from torchdiff.utils import NoisePredictor
# Dataset (CIFAR10 for demo)
transform = transforms.Compose([
transforms.Resize(32),
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.CIFAR10(root="./data", train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# Model components
noise_pred = NoisePredictor(in_channels=3)
vs = VarianceSchedulerDDPM(num_steps=1000)
fwd, rev = ForwardDDPM(vs), ReverseDDPM(vs)
# Optimizer & loss
optim = torch.optim.Adam(noise_pred.parameters(), lr=1e-4)
loss_fn = nn.MSELoss()
# Training
trainer = TrainDDPM(
noise_predictor=noise_pred, forward_diffusion=fwd, reverse_diffusion=rev,
conditional_model=None, optimizer=optim, objective=loss_fn,
data_loader=train_loader, max_epochs=1, device="cpu"
)
trainer()
# Sampling
sampler = SampleDDPM(reverse_diffusion=rev, noise_predictor=noise_pred,
image_shape=(32, 32), batch_size=4, in_channels=3, device="cpu")
images = sampler()
For detailed examples, check the examples/ directory.
Install from PyPI (recommended):
pip install torchdiff
Or install from source for development:
# Clone repository
git clone https://github.com/LoqmanSamani/TorchDiff.git
cd TorchDiff
# Install dependencies
pip install -r requirements.txt
# Install package
pip install .
Requires Python 3.8+. For GPU acceleration, ensure PyTorch is installed with the correct CUDA version.
Paper: Ho et al., 2020
DDPMs learn to reverse a gradual noise-adding process to generate high-quality images. TorchDiff provides a modular implementation for both unconditional and conditional (text-guided) generation.
Paper: Song et al., 2021
DDIM accelerates sampling by reducing the number of denoising steps while maintaining image quality. TorchDiff supports both conditional and unconditional DDIM generation.
Paper: Song et al., 2021
SDE-based models generalize diffusion via stochastic processes, supporting multiple formulations: VE, VP, sub-VP, and deterministic ODE variants. TorchDiff includes full training and sampling pipelines for both conditional and unconditional use cases.
Paper: Rombach et al., 2022
LDMs operate in a compressed latent space using a VAE, enabling efficient high-resolution image synthesis with reduced computational cost. TorchDiff supports using DDPM, DDIM, or SDE as the diffusion backbone in latent space.
Paper: Ramesh et al., 2022
UnCLIP, the architecture behind DALL·E 2, leverages CLIP latents to enable hierarchical text-to-image generation. It first maps text into CLIP’s multimodal embedding space, then performs diffusion-based generation in that space, followed by refinement in pixel space.
Training UnCLIP is significantly more complex than other diffusion families, and thus a minimal example is not shown here.
Released under the MIT License.
TorchDiff is under active development. Planned features include:
- 🧠 New diffusion variants and improved training algorithms.
- ⚡ Faster and more memory-efficient sampling.
- 🎯 Additional utilities to simplify experimentation.
Contributions are welcome!
- Open an Issue to report bugs or request features.
- Submit a PR with improvements or new features.
Your feedback helps make TorchDiff better for the community.
If you use TorchDiff in your research or project, please cite the original papers and this repository.
@article{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
journal={Advances in Neural Information Processing Systems},
year={2020}
}
@article{song2021denoising,
title={Denoising Diffusion Implicit Models},
author={Song, Jiaming and Meng, Chenlin and Ermon, Stefano},
journal={International Conference on Learning Representations (ICLR)},
year={2021}
}
@article{song2021score,
title={Score-Based Generative Modeling through Stochastic Differential Equations},
author={Song, Yang and Sohl-Dickstein, Jascha and Kingma, Diederik P and Kumar, Abhishek and Ermon, Stefano and Poole, Ben},
journal={International Conference on Learning Representations (ICLR)},
year={2021}
}
@article{rombach2022high,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn},
journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
@article{ramesh2022hierarchical,
title={Hierarchical Text-Conditional Image Generation with CLIP Latents},
author={Ramesh, Aditya and Pavlov, Mikhail and Goh, Gabriel and Gray, Scott and Voss, Chelsea and Radford, Alec and Chen, Mark and Sutskever, Ilya},
journal={arXiv preprint arXiv:2204.06125},
year={2022}
}
@misc{torchdiff2025,
author = {Samani, Loghman},
title = {TorchDiff: A Modular Diffusion Modeling Library in PyTorch},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/LoqmanSamani/TorchDiff}},
}