RaCig: A RAG-based Character-Consistent Story Image Generation Model

1. Multi-charater image generation with rich motion

2. Model structure preview

📖 Overview

RaCig is designed to generate images based on textual prompts and reference images for characters (referred to as "Characters"). It leverages several models and techniques, including:

Text-to-image retrieval (using CLIP)
IP-Adapter for incorporating reference image features (face and body/clothes)
ControlNet for pose/skeleton guidance
Action Direction DINO for action direction recognition
A pipeline (RaCigPipeline) to orchestrate the generation process.

The pipeline can handle multiple characters ("Characters") in a single scene, defined by their names, gender, and reference images (face and clothes).

📦 Installation

Clone the repository:

git clone https://github.com/ZulutionAI/RaCig.git
cd RaCig

Install dependencies:
```
pip install -r requirements.txt
```

Download necessary models and retrieval datasets:

Models: https://huggingface.co/ZuluVision/RaCig

Put the models under checkpoint as follow:

./models/
├── action_direction_dino/
│   └── checkpoint_best_regular.pth
├── controlnet/
│   └── model.safetensors
├── image_encoder/
│   ├── config.json
│   ├── model.safetensors
│   └── pytorch_model.bin
├── ipa_weights/
│   ├── ip-adapter-plus-face_sdxl_vit-h.bin
│   └── ip-adapter-plus_sdxl_vit-h.bin
└── sdxl/
    └── dreamshaper.safetensors

Retrieval datasets: https://huggingface.co/datasets/ZuluVision/RaCig-Data

./data
├── MSDBv2_v7
├── Reelshot_retrieval
└── retrieve_info

💻 Usage

Inference

Run Inference:
```
python inference.py
```
Generated images, retrieved images, and skeleton visualizations will be saved in the output/ directory by default. ·

Gradio

python run_gradio.py

For more detailed instruction, see Gradio Interface Instructions (EN) or Gradio Interface Instructions (中文)

🛠️ Training

We only train the controlnet, to make it recognize the feature map better. (The fused feature map after injecting IP information is quite hard for controlnet to constrain the pose, so we slightly finetune the controlnet)
We use the retrieval dataset to finetune it. The dataset structure is organized as above.

bash train.sh

🤝 Contributing

❤️ Acknowledgements

This project is based on the work of the following open-source projects and contributors:

IP-Adapter - Image Prompt Adapter developed by Tencent AI Lab
xiaohu2015

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
dataset		dataset
demo		demo
docs		docs
grounding_dino		grounding_dino
ip_adapter		ip_adapter
racig		racig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RaCig: A RAG-based Character-Consistent Story Image Generation Model

1. Multi-charater image generation with rich motion

2. Model structure preview

📖 Overview

📦 Installation

💻 Usage

Inference

Gradio

🛠️ Training

🤝 Contributing

❤️ Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

ZulutionAI/RaCig

Folders and files

Latest commit

History

Repository files navigation

RaCig: A RAG-based Character-Consistent Story Image Generation Model

1. Multi-charater image generation with rich motion

2. Model structure preview

📖 Overview

📦 Installation

💻 Usage

Inference

Gradio

🛠️ Training

🤝 Contributing

❤️ Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages