Skip to content

Investigating how text-to-image diffusion models internally represent artistic concepts like content and style when generating artworks.

Notifications You must be signed in to change notification settings

umilISLab/artistic-prompt-interpretation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐮 The Cow of Rembrandt: Analyzing Artistic Prompt Interpretation in Text-to-Image Models

🗃️ Dataset | 🤗 HuggingFace | 🖼️ WebApp

Result Examples

This research investigates how text-to-image diffusion models internally represent artistic concepts like content and style when generating artworks. Using cross-attention analysis, we examine how these models separate content-describing and style-describing elements in prompts. Our findings reveal that diffusion models show varying degrees of content-style separation, with content tokens typically influencing object regions and style tokens affecting backgrounds and textures.

Explore the complete set of generated images here!

Repository Structure

├── entities/                         # Data for populating prompt templates
├── output/                           # Experiments results
|   ├── prompts.csv                   # Prompts used for experiments
│   ├── content_style_iou_results.csv # IoU results of the experiments
├── src/                              # Source code
│   ├── analysis_utils.py             # Metrics computation
│   ├── config.py                     # Experiment settings
│   ├── data_utils.py                 # Prompt handling
│   ├── main_exp.py                   # Main experiment
│   ├── main_viz.py                   # Main visualization
│   └── model_utils.py                # Model setup
├── result_analysis.ipynb             # Jupyter notebook for replicating plots and analysis
├── requirements.txt                  # Python dependencies
└── README.md                         # This file

Installation

Prerequisites

  • Python 3.10.5

Setup

  1. Clone the repository:
git clone https://github.com/umilISLab/artistic-prompt-interpretation.git
cd artistic-prompt-interpretation
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

Reproducing Results

To reproduce the main results from the paper:

python src/main_exp.py
python src/main_viz.py

Data

Entities

The entities used for populating the prompts have been taken from:

Data Availability

The complete set of prompts and generated images can be downloaded from Dataverse.

Citation

If you use this code or find our work helpful, please cite:

@misc{ferrara2025thecowofrembrandt,
  title={The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models}, 
  author={Alfio Ferrara and Sergio Picascia and Elisabetta Rocchetti},
  year={2025},
  eprint={2507.23313},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2507.23313}, 
}

If you use the data provided, please cite:

@data{ferrara2025thecowofrembrandtdata,
  author = {Alfio Ferrara and Sergio Picascia and Elisabetta Rocchetti},
  publisher = {UNIMI Dataverse},
  title = {{Replication Data for: The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models}},
  UNF = {UNF:6:u5RBXaFNb7TZlm5eXDXIVw==},
  year = {2025},
  version = {V1},
  doi = {10.13130/RD_UNIMI/U9AZJI},
  url = {https://doi.org/10.13130/RD_UNIMI/U9AZJI}
}

About

Investigating how text-to-image diffusion models internally represent artistic concepts like content and style when generating artworks.

Topics

Resources

Stars

Watchers

Forks