This repository provides the code implementation of Youlong Ding and Lingyun Xu's MLLM 2025 fianl project, which is an improvement based on the following work:
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang1,2, Xiaoyi Dong2, Pan Zhang2, Bin Wang 2, Conghui He 2, Jiaqi Wang2, Dahua Lin2, Weiming Zhang1, Nenghai Yu1
1University of Science and Technology of China, 2Shanghai AI Laboratory
Both Youlong and Lingyun contribute intensively to the discussion, codebase, experiments, and writing of this project. They work in jointed efforts toward completion.
conda env create -f environment.yml
conda activate opera
pwd # all scripts should be executed in the /PATH/TO/OPERA local directory
The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/test2014.zip
Besides, it needs you to prepare the following checkpoints of 7B base models:
- Download LLaVA-1.5 merged 7B model and specify it at Line 14 of
eval_configs/llava-1.5_eval.yaml
.
git clone https://huggingface.co/liuhaotian/llava-v1.5-7b
- Download Vicuna 7B v1.1 model and specify it at Line 25 of
minigpt4/configs/models/blip2_instruct_vicuna7b.yaml
.
git clone https://huggingface.co/lmsys/vicuna-7b-v1.1
Argument | Example | Description |
---|---|---|
--model |
llava-1.5 |
Specify the MLLM model, this codebase supports instructblip , llava-1.5 . |
--data-path |
/path/to/dataset |
Path to the dataset file or folder, e.g., COCO_2014/val2014/ . |
--pope-type |
random |
Type for POPE evaluation, supports random . |
--scale_factor |
50 |
The scale factor to scale up the self-attention weights. Default: 50. |
--threshold |
15 |
The threshold for attending retrospection. Default: 15. |
--num_attn_candidates |
5 |
The number of candidates per beam. Default: 5. |
--penalty_weights |
1 |
The weight of penalty term in decoding. Default: 1. |
We provide our generated output in the log
directory. Files in log/llava-1.5
are outputs under the llava-1.5
MLLM models.
There're two kinds of jsonl
file.
opera-*
file is for CHAIR ourput generated by the OPERA baseline<INT>.jsonl
file is for CHAIR output generated by our OPTR method, and the correlation between output idx, such as1.jsonl
, and the exact hyperparameter pair, is specified as follows:
alpha | c | reward | ||
---|---|---|---|---|
1 | 1.0 | 5 | log 0.1 | log 15 |
2 | 2.0 | 7 | log 0.1 | log 15 |
3 | 1.0 | 7 | log 0.2 | log 15 |
4 | 1.0 | 7 | log 0.1 | log 30 |
5 | 1.0 | 7 | log 0.1 | log 7 |
6 | 1.0 | 7 | log 0.1 | log 15 |
7 | 1.0 | 7 | log 0.5 | log 15 |
8 | 0.5 | 7 | log 0.05 | log 15 |
9 | 0.8 | 6 | log 0.01 | log 15 |
10 | 1.0 | 7 | log 0.05 | log 5 |
11 | 0.7 | 5 | log 0.08 | log 20 |
12 | 0.8 | 6 | log 0.005 | log 15 |
13 | 0.8 | 6 | log 0.01 | log 5 |
14 | 1.0 | 7 | log 0.05 | 0 |
It takes about 2.5h to generate captioning for 500 images on an NVIDIA A100 80GB PCIe acceleration card.
-
Use
OP-TR/scripts/llava-run.py
to automatically replace the orginaltransformers-4.29.2/src/transformers/generation/utils.py
file with OP-TR implementedutils.py
- NOTICE: change the path-related variables in
llava-run.py
to your own path (to the OPERA base directory, data directory, and the model directory) - the set of
OP-TR/utils_<INT>.py
is OP-TR implementedutils.py
with different hyperparameters.
- NOTICE: change the path-related variables in
-
examine with different hyperparameter combinations:
- Modify the hyperparameters in the default values of the
generate
member function's parameter inutils-<INT>.py
.class GenerationMixin(): ... def generate( **args, ... alpha_d: Optional[float] = 1.0, d_0: Optional[int] = 5, c_: Optional[float] = math.log(0.1), Reward: Optional[float] = math.log(15), )
- create a new
utils-<INT>.py
, add to theOP-TR
directory, and specify the file name in theOP-TR/scripts/llava-run.py
script.
- Modify the hyperparameters in the default values of the
-
After the path variables and executables are set up, run
python OP-TR/scripts/llava-run.py
to generate CHAIR output jsonl files.
- Generate the MLLM's responses and save them in a jsonl file:
python chair_eval.py --model MODEL_NAME --data_path /path/to/COCO --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1 --output_path OUTPUT_PATH
Note: Please check out our released results in log/llava-1.5
and log/instructblip
for reproduction.
- Calculate CHAIR using the generated jsonl file:
python chair.py --cap_file /path/to/jsonl --image_id_key image_id --caption_key caption --coco_path /path/to/COCO/annotations_trainval2014/annotations/ --save_path /path/to/save/jsonl
Just like the OP-TR CHAIR output generation, add a new utils.py
file in the OP-TR
directory, specify to run the exact utils.py
file in the list of commands-to-execute and modify the path-relevant variables to launch a POPE evaluation against the chosen hyper-parameter combination on the OP-TR implementation.
python pope_eval.py --model MODEL_NAME --data_path /path/to/COCO --pope-type random --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1