A simple preprocessing tool for the Radiology Report Expert Evaluation (ReXVal) dataset. This package helps process and combine the radiologist error annotations with the ground truth and candidate reports.
This tool processes two main CSV files from the REXVAL dataset:
50_samples_gt_and_candidates.csv
: Contains ground truth and candidate reports6_valid_raters_per_rater_error_categories.csv
: Contains error annotations from radiologists
It combines these files and calculates mean error statistics per study and candidate type.
- pandas
pip install pandas
- Visit the ReXVal dataset page on PhysioNet
- Create a PhysioNet account and obtain credentials if you haven't already
- Download the dataset files
- Note the directory containing the CSV files mentioned above - this will be your
input_path
python preprocess.py -i /path/to/dataset/directory -o /path/to/output/directory
or with long options:
python preprocess.py --input_path /path/to/dataset/directory --output_path /path/to/output/directory
from rexval_preprocessor import preprocess
# Process the dataset
df = preprocess(
input_path="/path/to/dataset/directory",
output_path="/path/to/output/directory"
)
The tool generates a CSV file containing:
- Study ID and number
- Candidate type
- Ground truth and predicted reports
- Number of significant errors
- Number of insignificant errors
- Total number of errors
Below is an example of the preprocessed DataFrame:
If you use this tool, please cite the original ReXVal dataset:
Please check the citation information at: https://physionet.org/content/rexval-dataset/1.0.0/