OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

This repository provides the code implementation of Youlong Ding and Lingyun Xu's MLLM 2025 fianl project, which is an improvement based on the following work:

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang^1,2, Xiaoyi Dong², Pan Zhang², Bin Wang ², Conghui He ², Jiaqi Wang², Dahua Lin², Weiming Zhang¹, Nenghai Yu¹
¹University of Science and Technology of China, ²Shanghai AI Laboratory

Both Youlong and Lingyun contribute intensively to the discussion, codebase, experiments, and writing of this project. They work in jointed efforts toward completion.

Overview

Setup

Environment

conda env create -f environment.yml
conda activate opera
pwd # all scripts should be executed in the /PATH/TO/OPERA local directory

Model and Data for Evaluation

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/test2014.zip

Besides, it needs you to prepare the following checkpoints of 7B base models:

Download LLaVA-1.5 merged 7B model and specify it at Line 14 of eval_configs/llava-1.5_eval.yaml.

git clone https://huggingface.co/liuhaotian/llava-v1.5-7b

Download Vicuna 7B v1.1 model and specify it at Line 25 of minigpt4/configs/models/blip2_instruct_vicuna7b.yaml.

git clone https://huggingface.co/lmsys/vicuna-7b-v1.1

Arguments

Argument	Example	Description
`--model`	`llava-1.5`	Specify the MLLM model, this codebase supports `instructblip`, `llava-1.5`.
`--data-path`	`/path/to/dataset`	Path to the dataset file or folder, e.g., `COCO_2014/val2014/`.
`--pope-type`	`random`	Type for POPE evaluation, supports `random`.
`--scale_factor`	`50`	The scale factor to scale up the self-attention weights. Default: 50.
`--threshold`	`15`	The threshold for attending retrospection. Default: 15.
`--num_attn_candidates`	`5`	The number of candidates per beam. Default: 5.
`--penalty_weights`	`1`	The weight of penalty term in decoding. Default: 1.

[Important]Using our released results

We provide our generated output in the log directory. Files in log/llava-1.5 are outputs under the llava-1.5 MLLM models.

There're two kinds of jsonl file.

opera-* file is for CHAIR ourput generated by the OPERA baseline
<INT>.jsonl file is for CHAIR output generated by our OPTR method, and the correlation between output idx, such as 1.jsonl, and the exact hyperparameter pair, is specified as follows:

	alpha	$d_0$	c	reward
1	1.0	5	log 0.1	log 15
2	2.0	7	log 0.1	log 15
3	1.0	7	log 0.2	log 15
4	1.0	7	log 0.1	log 30
5	1.0	7	log 0.1	log 7
6	1.0	7	log 0.1	log 15
7	1.0	7	log 0.5	log 15
8	0.5	7	log 0.05	log 15
9	0.8	6	log 0.01	log 15
10	1.0	7	log 0.05	log 5
11	0.7	5	log 0.08	log 20
12	0.8	6	log 0.005	log 15
13	0.8	6	log 0.01	log 5
14	1.0	7	log 0.05	0

Evaluation Scripts

CHAIR evaluation

It takes about 2.5h to generate captioning for 500 images on an NVIDIA A100 80GB PCIe acceleration card.

OP-TR output generation

Use OP-TR/scripts/llava-run.py to automatically replace the orginal transformers-4.29.2/src/transformers/generation/utils.py file with OP-TR implemented utils.py
- NOTICE: change the path-related variables in llava-run.py to your own path (to the OPERA base directory, data directory, and the model directory)
- the set ofOP-TR/utils_<INT>.py is OP-TR implemented utils.py with different hyperparameters.
examine with different hyperparameter combinations:
- Modify the hyperparameters in the default values of the generate member function's parameter in utils-<INT>.py.
```
class GenerationMixin():
    ...
    def generate(
        **args,
        ...
        alpha_d: Optional[float] = 1.0,
        d_0: Optional[int] = 5,
        c_: Optional[float] = math.log(0.1),
        Reward: Optional[float] = math.log(15),
    )
```
- create a new utils-<INT>.py, add to the OP-TR directory, and specify the file name in the OP-TR/scripts/llava-run.py script.
After the path variables and executables are set up, run python OP-TR/scripts/llava-run.py to generate CHAIR output jsonl files.

OPERA output generation

Generate the MLLM's responses and save them in a jsonl file:

python chair_eval.py --model MODEL_NAME --data_path /path/to/COCO --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1 --output_path OUTPUT_PATH

Note: Please check out our released results in log/llava-1.5 and log/instructblip for reproduction.

Calculate CHAIR using the generated jsonl file:

python chair.py --cap_file /path/to/jsonl --image_id_key image_id --caption_key caption --coco_path /path/to/COCO/annotations_trainval2014/annotations/ --save_path /path/to/save/jsonl

POPE evaluation

OP-TR

Just like the OP-TR CHAIR output generation, add a new utils.py file in the OP-TR directory, specify to run the exact utils.py file in the list of commands-to-execute and modify the path-relevant variables to launch a POPE evaluation against the chosen hyper-parameter combination on the OP-TR implementation.

OPERA

python pope_eval.py --model MODEL_NAME --data_path /path/to/COCO --pope-type random --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
OP-TR		OP-TR
dataset		dataset
eval_configs		eval_configs
log		log
minigpt4		minigpt4
pope_coco		pope_coco
train_configs		train_configs
transformers-4.29.2		transformers-4.29.2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chair.py		chair.py
chair_eval.py		chair_eval.py
demo.ipynb		demo.ipynb
environment.yml		environment.yml
gpt4v_eval.py		gpt4v_eval.py
pope_eval.py		pope_eval.py
pope_loader.py		pope_loader.py
requirements.txt		requirements.txt
teaser.png		teaser.png
vis.ipynb		vis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

Overview

Setup

Environment

Model and Data for Evaluation

Arguments

[Important]Using our released results

Evaluation Scripts

CHAIR evaluation

OP-TR output generation

OPERA output generation

POPE evaluation

OP-TR

OPERA

About

Releases

Packages

Languages

License

AvidJoyceXu/OPERA

Folders and files

Latest commit

History

Repository files navigation

OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

Overview

Setup

Environment

Model and Data for Evaluation

Arguments

[Important]Using our released results

Evaluation Scripts

CHAIR evaluation

OP-TR output generation

OPERA output generation

POPE evaluation

OP-TR

OPERA

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages