Skip to content

Official project page for Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Notifications You must be signed in to change notification settings

cosbidev/Text2CT

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Text-to-CT Generation via 3D Latent Diffusion Model

This repository hosts the official project page for our work:

Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining
Molino, D., Caruso, C. M., Ruffini, F., Soda, P., Guarrasi, V. (2025)

alt text

🧠 Model Overview

Our approach combines:

  • A 3D CLIP-style encoder for vision-language alignment between CT volumes and radiology reports.
  • A volumetric VAE for latent compression of 3D CT data.
  • A latent diffusion model with cross-attention conditioning for controllable text-to-CT generation.

This design enables direct synthesis of anatomically consistent, semantically faithful, and high-resolution CT volumes from textual descriptions.


πŸ“¦ Synthetic Dataset

We release 1,000 synthetic chest CT scans generated with our model for the VLM3D Challenge.
➑️ Available on Hugging Face: Synthetic Text-to-CT Dataset


πŸ“œ Paper


🚧 Code Release

The full training and inference code will be made available soon.
Stay tuned for updates! ✨


πŸ“¬ Contact

For questions or collaborations, please reach out to:
Daniele Molino – daniele.molino@unicampus.it


About

Official project page for Text-to-CT Generation via 3D Latent Diffusion Model with Contrastive Vision-Language Pretraining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published