Official repository of the CVPR 2025 paper
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, and Stefan Roth
Visual Inference Lab, TU Darmstadt
GLASS introduces a diffusion-based framework for object-centric representation learning.
It integrates slot attention with a latent diffusion decoder to learn slot representations that generalize across visual tasks:
🔧 Dependencies
Python >= 3.11
PyTorch == 2.5.0
CUDA == 11.8
⚙️ Environment Setup
conda create -n glass python==3.11.10
conda activate glass
# Install PyTorch and CUDA
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# Install remaining dependencies
pip install -r requirements.txt💾 Pretrained Models
Pretrained checkpoints from the paper are available here:
📥 Google Drive Folder
Please unzip the folder and place the models under a top-level directory named glass/.
🖼️ Datasets
- Evaluation: Download the COCO dataset from the official website.
- Training: Use Dataset Diffusion to create generated images and pseudo-segmentation maps.
bash ./src/eval/scripts/coco/eval_oclf_metrics_coco.shThis would create file metrics_coco.json file in the checkpoint folder.
bash ./src/eval/scripts/coco/eval_generation.shWe provide a very crude implementation for generation compositional images.
bash ./src/eval/scripts/coco/eval_composition.sh- Release full training pipeline
If you find this repository useful, please consider citing:
@inproceedings{singh2025glass,
author = {Krishnakant Singh and Simone Schaub-Meyer and Stefan Roth},
title = {GLASS: Guided Latent Slot Diffusion for Object-Centric Learning},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025},
}This repository builds upon
LSD: Latent Slot Diffusion and Dataset Diffusion.
We thank the authors for open-sourcing their work.
Krishnakant Singh
📧 firstname.lastname@visinf.tu-darmstadt.de
🌐 https://visinf.github.io/glass