CCSR: Class-Conditional self-reward mechanism for improved Text-to-Image models

This repository contain the official implement of the paper CCSR: Self-Rewarding Pretrained Text-to-Image Models. Pre-print:

INTRODUCTION

CCSR is a mechanism that allows diffusion models (T2I) to learn from their own generated images and continuously self-improve. This technique is inspire from this paper: Self-Rewarding Language Models.

The idea is similar but the method is different. An overall flowchart of the Self-rewarding mechanism is presented in the Gif bellow.

USAGE INSTRUCTION

INSTALLATION

Clone the repository and install dependencies using

git clone https://github.com/safouaneelg/SRT2I.git

Create a conda environment (optional but recommended) from environment.yml

conda env create -f environment.yml
conda activate srt2i

USAGE

Step-by-step self-rewarding

First step is to generate the prompts for generative text-to-image diffusion model. This could be achieved using the following command:

python llm/prompts_generator.py --model "TheBloke/Mistral-7B-Instruct-v0.2-AWQ" --class_list "llm/class_list.json" --output_prompts "generated_prompts.txt" --prompts_number 30 --class_ids 15,16,17,20,21

The default parameters are:

"TheBloke/Mistral-7B-Instruct-v0.2-AWQ" Mistral 7B quantized
llm/class_list.json (containing 80 classes from COCO datasets)
generated_prompts.txt A total of 80*5=400 prompts are generated and stored in this file txt format
prompts_number The default number of prompts per class is 30 (for the paper we used 100).

Those are the class ids used in the paper: {20:Elephant} and {23:Giraffe}. In case you generate new prompts for other classes please change with the appropriate ids in class_list.

The subsequent step is the generation of images from prompts, which can be done by using stable diffusion model. Run the following command to generate the images. The images are stacked by 10

python diff_generator/fromprompt_t2i_generation.py --diffusion-model "stabilityai/stable-diffusion-2-1-base" --output-folder "generative_images/" --prompts "generated_prompts.txt"

The default parameters are:

"stabilityai/stable-diffusion-2-1-base" stable diffusion 2.1
generative_images/ folder to store the generated images
generated_prompts.txt A total of 80*5=400 prompts are generated and stored in this file txt format
prompts_number The default number of prompts per class is 30

To extract the optimal images based on the generated stable diffusion images. You can run the following command:

python sr_mechanism/self-reward_dataset_creation.py --image_folder 'path/to/images/folder/' --prompts_file 'path/to/prompts_file.txt' --llava_model 'LLAVA_MODEL' --yolo_model 'YOLO_WORLD_MODEL' 'yolov8x-worldv2.pt' --output_folder './optimal_pairs4/'

Parsers:

image_folder: path to the folder containing stacked 10 images (typically generative_images/ in this code)
prompt_file: path to the .txt file containing all the prompt (generated_prompts.txt)
llava_model: name of llava captioning model. Default set to 'llava-hf/llava-1.5-7b-hf'
yolo_model: Open Vocabulary YOLO model set by default to 'yolov8x-worldv2.pt' but can also be changed to 'yolov8m-world.pt' or 'yolov8s-world.pt' (check Ultralytics-docs)
llm_model: LLM model for self-judging set by default to 'TheBloke/Mistral-7B-Instruct-v0.2-AWQ'
output_folder: 'path/to/output/storage/folder/' where optimal images to specific prompts in the prompt_file in the will be saved

Fine-tune stable diffusion on the images stored in the output_folder. To do so, follow these steps:

The images folder requires an accompanying metada.json file (see hf docs)
The steps and code demos (noteboks) are detailed in the txt2im readme

Once the dataset is ready and the training file customized, a single runs the script:

python tutorial/fine_tune_sd/fine_tune_lora4.sh

In this example script, a single GPU is used to train for 100 epochs, using wandb for logging combined with validation prompts.

Citation

@misc{ghazouali2024classconditional,
      title={Class-Conditional self-reward mechanism for improved Text-to-Image models}, 
      author={Safouane El Ghazouali and Arnaud Gucciardi and Umberto Michelucci},
      year={2024},
      eprint={2405.13473},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2405.13473}
}

Licence

This code is open for research and development purposes only. No commercial use of this software is permitted. For additional information, contact: safouane.elghazouali@toelt.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
_repoimages_		_repoimages_
diff_generator		diff_generator
generative_images_descriptions		generative_images_descriptions
llava_captioning		llava_captioning
llm		llm
optimal_pairs4		optimal_pairs4
sr_mechanism		sr_mechanism
tutorial		tutorial
txt2im		txt2im
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
generated_prompts.txt		generated_prompts.txt
requirements.txt		requirements.txt
valid_prompts.txt		valid_prompts.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCSR: Class-Conditional self-reward mechanism for improved Text-to-Image models

INTRODUCTION

USAGE INSTRUCTION

INSTALLATION

USAGE

Citation

Licence

About

Releases

Packages

Contributors 3

Languages

License

safouaneelg/SRT2I

Folders and files

Latest commit

History

Repository files navigation

CCSR: Class-Conditional self-reward mechanism for improved Text-to-Image models

INTRODUCTION

USAGE INSTRUCTION

INSTALLATION

USAGE

Citation

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages