RadRevise: A Benchmark Dataset for Instruction-Based Radiology Report Editing

Repository referenced in the paper with the same title, providing code for generating the dataset and using the dataset to benchmark existing models.

Requirements

The RadRevise dataset will be available on PhysioNet through an open credential process.

Usage

Instruction data generation

Using GPT-4 to generate instructions and modified reports based on specified types of instructions and clinical topics. Note that the results will differ from RadRevise both due to GPT generated responses and the additional human review and annotation process that RadRevise has undergone.

cd generation
python generate.py

Model evaluation

The code can be used directly to evaluate any text-generation models hosted on Hugging Face.

Download the RadRevise dataset.
Navigate to the evaluation directory.
Run the following command to evaluate a single model:

python eval_model $MODEL_ID [$DATA_PATH] [$BATCH_SIZE] [$OUTPUT_FILE]

$MODEL_ID: the Hugging Face model id
$DATA_PATH: path to RadRevise dataset (default: ../data/RadRevise_v0.csv)
$BATCH_SIZE: the inference batch size (default: 32)
$OUTPUT_FILE: the name of the evaluation output (default: output/result.csv)

Alternatively, modify and execute the run.sh script to evaluate one or more models.

License

This repository is made publicly available under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RadRevise: A Benchmark Dataset for Instruction-Based Radiology Report Editing

Table of Contents

Requirements

Usage

Instruction data generation

Model evaluation

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

RadRevise: A Benchmark Dataset for Instruction-Based Radiology Report Editing

Table of Contents

Requirements

Usage

Instruction data generation

Model evaluation

License