Skip to content

visinf/glass

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Official repository of the CVPR 2025 paper
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

Krishnakant Singh, Simone Schaub-Meyer, and Stefan Roth
Visual Inference Lab, TU Darmstadt


Overview

GLASS introduces a diffusion-based framework for object-centric representation learning.
It integrates slot attention with a latent diffusion decoder to learn slot representations that generalize across visual tasks:

  • 🧠 Unsupervised Object Discovery
  • 🎨 Image Generation & Reconstruction
  • Compositional Image Generation
🔧 Dependencies
Python >= 3.11  
PyTorch == 2.5.0  
CUDA == 11.8
⚙️ Environment Setup
conda create -n glass python==3.11.10
conda activate glass

# Install PyTorch and CUDA
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=11.8 -c pytorch -c nvidia

# Install remaining dependencies
pip install -r requirements.txt
💾 Pretrained Models

Pretrained checkpoints from the paper are available here:
📥 Google Drive Folder

Please unzip the folder and place the models under a top-level directory named glass/.

🖼️ Datasets

🚀 Evaluation

🧠 Object-Centric Segmentation

bash ./src/eval/scripts/coco/eval_oclf_metrics_coco.sh

This would create file metrics_coco.json file in the checkpoint folder.

🎨 Image Generation

bash ./src/eval/scripts/coco/eval_generation.sh

Compositional Generation

We provide a very crude implementation for generation compositional images.

bash ./src/eval/scripts/coco/eval_composition.sh

📌 TODO

  • Release full training pipeline

📚 Citation

If you find this repository useful, please consider citing:

@inproceedings{singh2025glass,
  author    = {Krishnakant Singh and Simone Schaub-Meyer and Stefan Roth},
  title     = {GLASS: Guided Latent Slot Diffusion for Object-Centric Learning},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2025},
}

🙏 Acknowledgements

This repository builds upon
LSD: Latent Slot Diffusion and Dataset Diffusion. We thank the authors for open-sourcing their work.


📜 License

License: Apache 2.0


✉️ Contact

Krishnakant Singh
📧 firstname.lastname@visinf.tu-darmstadt.de
🌐 https://visinf.github.io/glass


About

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning (CVPR 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •