We propose Visual Instruction Generation and Correction (VIGC), a framework capable of autonomously generating high-quality image-text instruction fine-tuning datasets.
-
(Optional) Creating conda environment
conda create -n vigc python=3.8 conda activate vigc
-
Install mmpretrain
you can follow the tutorial
-
You may build from source
git clone https://gitlab.pjlab.org.cn/fdc/mllm/vigc.git cd vigc pip install -e .
-
obtain vicuna model
Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT. We currently use the v1.1 version of Vicuna-13B and 7B. If you already have the Vicuna weights with correct version, modify the
llm_model
in Model Config to the folder that contains your Vicuna weights. Otherwise, you can follow this instruction to get them, remenber that modify the config file too. -
download pretrain model
We support two different kinds of pretrain checkpoints to load from: minigpt-4 and instrucblip. You can download them from the table below, more details please visit their original repositories: minigpt-4 and instrucblip.
Model Type Checkpoint pretrained with Vicuna 7B Checkpoint pretrained with Vicuna 13B minigpt-4 Download Download instrucblip Download Download After download the pretrained checkpoints, please modify the
pretrained
in Model Config to the folder that contains pretrain weights. -
download fintuned vigc model
Download the pretrained vigc checkpoints according to fintuned dataset and the Vicuna model you prepared.
Fintuned Dataset Checkpoint Fintuned with Vicuna 7B Checkpoint Fintuned with Vicuna 13B LLaVA Download Download OKVQA Download / A-OKVQA Download /
To Launch a demo locally, you should:
-
Download the pretrain weight and finetune weight of minigpt-4 and instructblip to local;
-
Update
MODEL_CKPT
in line 9 ofvigc_demo.py
; -
Run
python vigc_demo.py
and then follow the instruction on the prompts to view in browser. Arguments are as follows:-
device0: The gpu id of the first model
-
device1: The gpu id of the second model
-
You can also visit to play with VIGC online demo.
-
generate QA based on COCO2017 for Llava
- You should first download the finetuned vigc model
- Then modify the
finetuned
in corresponding Inference Config to the path to the checkpoint file.
torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_conv.yaml # generate conversation data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_detail.yaml # generate detail description data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_complex.yaml # generate complex reasoning data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_conv.yaml # generate conversation data for Llava using MiniGPT4-vicuna13b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_detail.yaml # generate detail description data for Llava using MiniGPT4-vicuna13b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_complex.yaml # generate complex reasoning data for Llava using MiniGPT4-vicuna13b
-
generate QA based on Object365 for Llava
- You should first download the finetuned vigc model
- Then modify the
finetuned
in corresponding Inference Config to the path to the checkpoint file.
torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml # generate conversation data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml # generate detail description data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml # generate complex reasoning data for Llava using MiniGPT4-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_conv.yaml # generate conversation data for Llava using MiniGPT4-vicuna13b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_detail.yaml # generate detail description data for Llava using MiniGPT4-vicuna13b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/generate_qa/llava-150k/generate_llava_qa_object365_complex.yaml # generate complex reasoning data for Llava using MiniGPT4-vicuna13b
-
generate QA based on COCO2017 for A-OKVQA or OKVQA
-
You should first download the finetuned vigc model
-
Then modify the
pretrained
in corresponding Inference Config to the path to the checkpoint file. -
Generate the question first:
torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_question.yaml # generate questions for A-OKVQA using instruct-blip-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_question.yaml # generate questions for OKVQA using instruct-blip-vicuna7b
-
Modify the
annotaion
ingenerate_answer.yaml
to the path of the questions generated in the above step, then generate the answers:torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/a-okvqa/generate_answer.yaml # generate answers for A-OKVQA using instruct-blip-vicuna7b torchrun --nproc_per_node=8 evaluate.py --cfg-path vigc/projects/instruct_blip_vicuna7b/generate_qa/okvqa/generate_answer.yaml # generate answers for OKVQA using instruct-blip-vicuna7b
-
-
Finetune VIGC Model on A-OKVQA Dataset
-
download our formatted A-OKVQA json files
-
download iamges follow the original repo, skip this step if you already have them.
-
modify
images
andannotation
in these configs:train config, val config, with their actual paths. -
run finetune script
torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/a-okvqa/normal_vigc.yaml
-
-
Finetune VIGC Model on OKVQA Dataset
- download our formatted OKVQA json files
- download iamges follow the original repo, skip this step if you already have them.
- modify
images
andannotation
in these configs:train config, val config, with their actual paths. - run finetune script
torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/instruct_blip_vicuna7b/vigc/okvqa/normal_vigc.yaml
-
Finetune VIGC Model on LLaVA-150k Dataset
- download our formatted LLaVA json files
- download iamges follow the original repo, skip this step if you already have them.
- modify
images
andannotation
in these configs:conversation config, detail config, complex config, val config, with their actual paths. - run finetune script
torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/mini_gpt4_vicuna7b/vigc/llava-150k/normal_vigc.yaml # using Mini-GPT4 Vicuna7b torchrun --nproc_per_node=8 train.py --cfg-path vigc/projects/mini_gpt4_vicuna13b/vigc/llava-150k/normal_vigc.yaml # using Mini-GPT4 Vicuna13b
- BLIP2. The model architecture of VIGC follows BLIP-2. Don't forget to check this great open-source work if you don't know it before!
- InstrucBlip and MiniGPT-4. The pretrain models of VIGC are come from InstrucBlip and MiniGPT-4.
- Lavis. This repository is built upon Lavis!
- Vicuna. The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
- LLaVA, A-OKVQA, OKVQA. The model of VIGC are finetuned on these datasets.
You can find more details in our paper.
If you're using VIGC in your research or applications, please cite using this BibTeX:
@article{wang2023vigc,
title={VIGC: Visual Instruction Generation and Correction},
author={Wang, Bin and Wu, Fan and Han, Xiao and Peng, Jiahui and Zhong, Huaping and Zhang, Pan and Dong, Xiaoyi and Li, Weijia and Li, Wei and Wang, Jiaqi and He, Conghui},
journal={arXiv preprint arXiv:2308.12714},
year={2023}
}
If you have any questions, comments or suggestions, please do not hesitate to contact us at wangbin@pjlab.org.cn or wufan@pjlab.org.cn.