Skip to content
forked from shikiw/OPERA

[MLLM 2024 final project]OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

License

Notifications You must be signed in to change notification settings

AvidJoyceXu/OPERA

 
 

Repository files navigation

OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

This repository provides the code implementation of Youlong Ding and Lingyun Xu's MLLM 2025 fianl project, which is an improvement based on the following work:

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang1,2, Xiaoyi Dong2, Pan Zhang2, Bin Wang 2, Conghui He 2, Jiaqi Wang2, Dahua Lin2, Weiming Zhang1, Nenghai Yu1
1University of Science and Technology of China, 2Shanghai AI Laboratory

Both Youlong and Lingyun contribute intensively to the discussion, codebase, experiments, and writing of this project. They work in jointed efforts toward completion.

Overview

teaser

Setup

Environment

conda env create -f environment.yml
conda activate opera
pwd # all scripts should be executed in the /PATH/TO/OPERA local directory

Model and Data for Evaluation

The following evaluation requires for MSCOCO 2014 dataset. Please download here and extract it in your data path.

wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/zips/test2014.zip

Besides, it needs you to prepare the following checkpoints of 7B base models:

git clone https://huggingface.co/liuhaotian/llava-v1.5-7b
git clone https://huggingface.co/lmsys/vicuna-7b-v1.1

Arguments

Argument Example Description
--model llava-1.5 Specify the MLLM model, this codebase supports instructblip, llava-1.5.
--data-path /path/to/dataset Path to the dataset file or folder, e.g., COCO_2014/val2014/.
--pope-type random Type for POPE evaluation, supports random.
--scale_factor 50 The scale factor to scale up the self-attention weights. Default: 50.
--threshold 15 The threshold for attending retrospection. Default: 15.
--num_attn_candidates 5 The number of candidates per beam. Default: 5.
--penalty_weights 1 The weight of penalty term in decoding. Default: 1.

[Important]Using our released results

We provide our generated output in the log directory. Files in log/llava-1.5 are outputs under the llava-1.5 MLLM models.

There're two kinds of jsonl file.

  • opera-* file is for CHAIR ourput generated by the OPERA baseline
  • <INT>.jsonl file is for CHAIR output generated by our OPTR method, and the correlation between output idx, such as 1.jsonl, and the exact hyperparameter pair, is specified as follows:
alpha $d_0$ c reward
1 1.0 5 log 0.1 log 15
2 2.0 7 log 0.1 log 15
3 1.0 7 log 0.2 log 15
4 1.0 7 log 0.1 log 30
5 1.0 7 log 0.1 log 7
6 1.0 7 log 0.1 log 15
7 1.0 7 log 0.5 log 15
8 0.5 7 log 0.05 log 15
9 0.8 6 log 0.01 log 15
10 1.0 7 log 0.05 log 5
11 0.7 5 log 0.08 log 20
12 0.8 6 log 0.005 log 15
13 0.8 6 log 0.01 log 5
14 1.0 7 log 0.05 0

Evaluation Scripts

CHAIR evaluation

It takes about 2.5h to generate captioning for 500 images on an NVIDIA A100 80GB PCIe acceleration card.

OP-TR output generation

  • Use OP-TR/scripts/llava-run.py to automatically replace the orginal transformers-4.29.2/src/transformers/generation/utils.py file with OP-TR implemented utils.py

    • NOTICE: change the path-related variables in llava-run.py to your own path (to the OPERA base directory, data directory, and the model directory)
    • the set ofOP-TR/utils_<INT>.py is OP-TR implemented utils.py with different hyperparameters.
  • examine with different hyperparameter combinations:

    • Modify the hyperparameters in the default values of the generate member function's parameter in utils-<INT>.py.
      class GenerationMixin():
          ...
          def generate(
              **args,
              ...
              alpha_d: Optional[float] = 1.0,
              d_0: Optional[int] = 5,
              c_: Optional[float] = math.log(0.1),
              Reward: Optional[float] = math.log(15),
          )
    • create a new utils-<INT>.py, add to the OP-TR directory, and specify the file name in the OP-TR/scripts/llava-run.py script.
  • After the path variables and executables are set up, run python OP-TR/scripts/llava-run.py to generate CHAIR output jsonl files.

OPERA output generation

  • Generate the MLLM's responses and save them in a jsonl file:
python chair_eval.py --model MODEL_NAME --data_path /path/to/COCO --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1 --output_path OUTPUT_PATH

Note: Please check out our released results in log/llava-1.5 and log/instructblip for reproduction.

  • Calculate CHAIR using the generated jsonl file:
python chair.py --cap_file /path/to/jsonl --image_id_key image_id --caption_key caption --coco_path /path/to/COCO/annotations_trainval2014/annotations/ --save_path /path/to/save/jsonl

POPE evaluation

OP-TR

Just like the OP-TR CHAIR output generation, add a new utils.py file in the OP-TR directory, specify to run the exact utils.py file in the list of commands-to-execute and modify the path-relevant variables to launch a POPE evaluation against the chosen hyper-parameter combination on the OP-TR implementation.

OPERA

python pope_eval.py --model MODEL_NAME --data_path /path/to/COCO --pope-type random --gpu-id GPU_IDs --beam 5 --scale_factor 50 --threshold 15 --num_attn_candidates 5 --penalty_weights 1

About

[MLLM 2024 final project]OP-TR: Improving OPERA by Reimplementing Over-Trust Penalty and Introducing Trust Reward

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.5%
  • MDX 5.2%
  • Jupyter Notebook 1.8%
  • Cuda 0.4%
  • Shell 0.1%
  • Dockerfile 0.0%