Med-CXRGen: Visual Instruction-tuned Adaptation for Radiology Report Generation (Gla-AI4BioMed at RRG24)
π¨ This repository hosts the installation scripts, runtime environment, and usage instructions for Med-CXRGen. The project is designed to be fully compatible with the Libra space for seamless integration.
- [20 Jun 2024] π Gla-AI4BioMed ranked 4th place in the Shared Task on Large-Scale Radiology Report Generation @ BioNLP ACL'24! π
- [08 Jun 2024] π Released model weights:
- β
Med-CXRGen-F for generating the
Findings
section. - β
Med-CXRGen-I for generating the
Impression
section.
- β
Med-CXRGen-F for generating the
We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models (LLMs) can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy.
Please refer to the Libra repository for code and environment details, as this project is compatible with it. Below is a brief outline for quick setup:
- Clone the libra's repository
git clone https://github.com/X-iZhang/Libra.git
cd Libra
- Create and activate a new Conda environment
conda create -n cxrgen python=3.10 -y
conda activate cxrgen
- Install dependencies
pip install --upgrade pip # enable PEP 660 support
pip install -e .
- For more detailed instructions, see Libra's README.
Version | Size | Projector | Base LLM | Vision Encoder | Checkpoint |
---|---|---|---|---|---|
Libra-0.5 | 7B | MLP-2x | Vicuna-7B | CLIP-L-336px | Med-CXRGen-F |
Libra-0.5 | 7B | MLP-2x | Vicuna-7B | CLIP-L-336px | Med-CXRGen-I |
Note: These two models are fine-tuned for Findings
and Impression
section generation.
These projector weights were pre-trained for visual instruction tuning on chest X-ray to text generation tasks. They can be directly used to initialise your model for multimodal fine-tuning in similar clinical domains.
projector type
, base LLM
, conv_mode
, and vision encoder
exactly match those used in our projector pretraining setup. Please also ensure the following settings are correctly configured during instruction tuning:
--mm_projector_type mlp2x_gelu \
--mm_vision_select_layer -2 \
--mm_vision_select_feature patch \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
Base LLM | conv_mode | Vision Encoder | Projector | Pretrain Data | Download |
---|---|---|---|---|---|
Vicuna-7B | libra_v0 | CLIP-L-336px | MLP-2x | Findings section | projector |
Vicuna-7B | libra_v0 | CLIP-L-336px | MLP-2x | Impression section | projector |
π§© This model supports multiple images (1 to 4) as input during training. You can use the following method to preprocess and horizontally concatenate multiple images (e.g. generating one report from several diagnostic images):
from PIL import Image
def concatenate_images(images):
total_width = sum(img.width for img in images) + 10 * (len(images) - 1)
height = max(img.height for img in images)
new_img = Image.new('RGB', (total_width, height), (0, 0, 0))
current_width = 0
for img in images:
new_img.paste(img, (current_width, 0))
current_width += img.width + 10 # Add a 10px black separator between images
return new_img
# Load images (make sure the paths are correct or use your own images)
img1 = Image.open('chest_x_ray_example1.jpg')
img2 = Image.open('chest_x_ray_example2.jpg')
img3 = Image.open('chest_x_ray_example3.jpg')
img4 = Image.open('chest_x_ray_example4.jpg')
# Concatenate images
result_img = concatenate_images([img1, img2, img3, img4])
# Save the result
result_img.save('concatenated_chest_x_ray.jpg')
We support running inference using the CLI. To use our model, run:
python -m libra.serve.cli \
--model-path X-iZhang/Med-CXRGen-I \
--conv-mode libra_v0 \
--image-file "./path/to/chest_x_ray.jpg"
You can use the libra_eval
function in libra/eval/run_libra.py
to easily launch a model trained by yourself or us on local machine or in Google Colab, after installing this repository.
from libra.eval import libra_eval
model_path = "X-iZhang/Med-CXRGen-I " # Or "X-iZhang/Med-CXRGen-F "
# Define the paths to the images.
image_file = "./path/to/chest_x_ray.jpg" # Or concatenated X-ray image
# Define the prompt to guide the model's response.
prompt = "Provide a detailed description of the impression in the radiology image."
# Or "Provide a detailed description of the findings in the radiology image."
# Specify the conversational mode, matching the PROMPT_VERSION used during training.
conv_mode = "libra_v0"
# Call the libra_eval function.
libra_eval(
model_path=model_path,
image_file=image_file,
query=prompt,
conv_mode=conv_mode,
max_new_tokens=512
)
We use the officially provided dataset from the RRG24 shared task, available on Hugging Face:
π StanfordAIMI/rrg24-shared-task-bionlp
You can load the dataset as follows:
from datasets import load_dataset
dataset = load_dataset("StanfordAIMI/rrg24-shared-task-bionlp")
To process MIMIC-CXR on your own, you may use the official script (make-interpret-mimic-cxr.py) provided by the organizers.
Please ensure the following folder structure (with files/
from mimic-cxr-jpg):
.
βββ files
β βββ p10
β βββ p11
β βββ ...
β βββ p19
βββ make-interpret-mimic-cxr.py
βββ mimic-cxr-2.0.0-metadata.csv
βββ mimic-cxr-2.0.0-split.csv
βββ mimic_cxr_sectioned.csv
After preprocessing, you can merge the RRG24 and your MIMIC-CXR datasets using:
from datasets import load_dataset, Sequence, Image, DatasetDict, concatenate_datasets
dataset = load_dataset("StanfordAIMI/rrg24-shared-task-bionlp")
dataset_mimic = load_dataset(
"json",
data_files={"train": "train_mimic.json", "validation": "val_mimic.json"},
).cast_column("images", Sequence(Image()))
dataset_final = DatasetDict({"train": concatenate_datasets([dataset["train"], dataset_mimic["train"]]),
"validation": concatenate_datasets([dataset["validation"], dataset_mimic["validation"]])})
dataset_final.save_to_disk("path/to/dataset/directory")
πͺ§ Note: For details on the data structure, preprocessing scripts, and training-ready formats, please refer to the Libra repository, particularly Custom_Data.md
.
To ensure reproducibility and output quality, we evaluate our model using the beam search strategy.
python -m libra.eval.eval_vqa_libra \
--model-path X-iZhang/Med-CXRGen-I \
--question-file ./path/to/questions_file.jsonl \
--image-folder ./path/to/image/folder \
--answers-file /path/to/answer-file.jsonl \
--num_beams 2 \
--max_new_tokens 256 \
--conv-mode libra_v0
You can evaluate the models on your custom datasets by converting them into the required JSONL format, then running evaluation with eval_vqa_libra.py
.
Additionally, you can execute the evaluation using the command line. For detailed instructions, see libra_eval.sh
.
bash ./scripts/eval/libra_eval.sh beam
We extend our gratitude to the BioNLP 2024 RRG24 Shared Task organisers for providing the baseline pipeline ViLMedic and curating these challenging and exciting tasks.
Also, we sincerely thank the following projects for their contributions:
- LLaVA: A Large Language and Vision Assistant, laying the groundwork for multimodal understanding.
- FastChat: An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots.
- LLaMA: Open and efficient foundation language models that inspired our core language processing capabilities.
If you find our paper useful in your research and applications, please cite using this BibTeX:
@inproceedings{zhang-etal-2024-gla,
title = "Gla-{AI}4{B}io{M}ed at {RRG}24: Visual Instruction-tuned Adaptation for Radiology Report Generation",
author = "Zhang, Xi and
Meng, Zaiqiao and
Lever, Jake and
Ho, Edmond S.L.",
editor = "Demner-Fushman, Dina and
Ananiadou, Sophia and
Miwa, Makoto and
Roberts, Kirk and
Tsujii, Junichi",
booktitle = "Proceedings of the 23rd Workshop on Biomedical Natural Language Processing",
month = aug,
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.bionlp-1.54/",
doi = "10.18653/v1/2024.bionlp-1.54",
pages = "624--634",
}