GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Official repository of the CVPR 2025 paper
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, and Stefan Roth
Visual Inference Lab, TU Darmstadt

Overview

GLASS introduces a diffusion-based framework for object-centric representation learning.
It integrates slot attention with a latent diffusion decoder to learn slot representations that generalize across visual tasks:

🧠 Unsupervised Object Discovery
🎨 Image Generation & Reconstruction
Compositional Image Generation

🔧 Dependencies

Python >= 3.11  
PyTorch == 2.5.0  
CUDA == 11.8

⚙️ Environment Setup

conda create -n glass python==3.11.10
conda activate glass

# Install PyTorch and CUDA
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=11.8 -c pytorch -c nvidia

# Install remaining dependencies
pip install -r requirements.txt

💾 Pretrained Models

Pretrained checkpoints from the paper are available here:
📥 Google Drive Folder

Please unzip the folder and place the models under a top-level directory named glass/.

🖼️ Datasets

Evaluation: Download the COCO dataset from the official website.
Training: Use Dataset Diffusion to create generated images and pseudo-segmentation maps.

🚀 Evaluation

🧠 Object-Centric Segmentation

bash ./src/eval/scripts/coco/eval_oclf_metrics_coco.sh

This would create file metrics_coco.json file in the checkpoint folder.

🎨 Image Generation

bash ./src/eval/scripts/coco/eval_generation.sh

Compositional Generation

We provide a very crude implementation for generation compositional images.

bash ./src/eval/scripts/coco/eval_composition.sh

📌 TODO

Release full training pipeline

📚 Citation

If you find this repository useful, please consider citing:

@inproceedings{singh2025glass,
  author    = {Krishnakant Singh and Simone Schaub-Meyer and Stefan Roth},
  title     = {GLASS: Guided Latent Slot Diffusion for Object-Centric Learning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
}

🙏 Acknowledgements

This repository builds upon
LSD: Latent Slot Diffusion and Dataset Diffusion. We thank the authors for open-sourcing their work.

📜 License

✉️ Contact

Krishnakant Singh
📧 firstname.lastname@visinf.tu-darmstadt.de
🌐 https://visinf.github.io/glass

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs/coco/slot_attn		configs/coco/slot_attn
glass		glass
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Overview

🚀 Evaluation

🧠 Object-Centric Segmentation

🎨 Image Generation

Compositional Generation

📌 TODO

📚 Citation

🙏 Acknowledgements

📜 License

✉️ Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

visinf/glass

Folders and files

Latest commit

History

Repository files navigation

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Overview

🚀 Evaluation

🧠 Object-Centric Segmentation

🎨 Image Generation

Compositional Generation

📌 TODO

📚 Citation

🙏 Acknowledgements

📜 License

✉️ Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages