Skip to content

πŸ”¬ Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning. (BioNLP ACL'24)

License

Notifications You must be signed in to change notification settings

X-iZhang/RRG-BioNLP-ACL2024

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

23 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Med-CXRGen: Visual Instruction-tuned Adaptation for Radiology Report Generation (Gla-AI4BioMed at RRG24)

hf_space arXiv hf_space License Visitors

🚨 This repository hosts the installation scripts, runtime environment, and usage instructions for Med-CXRGen. The project is designed to be fully compatible with the Libra space for seamless integration.

πŸ”₯ News

  • [20 Jun 2024] πŸ† Gla-AI4BioMed ranked 4th place in the Shared Task on Large-Scale Radiology Report Generation @ BioNLP ACL'24! πŸŽ‰
  • [08 Jun 2024] πŸš€ Released model weights:

Overview

We introduce a radiology-focused visual language model designed to generate radiology reports from chest X-rays. Building on previous findings that large language models (LLMs) can acquire multimodal capabilities when aligned with pretrained vision encoders, we demonstrate similar potential with chest X-ray images. Our model combines an image encoder with a fine-tuned LLM based on the Vicuna-7B architecture, enabling it to generate different sections of a radiology report with notable accuracy.

Training Framework

architecture

Contents

Install

Please refer to the Libra repository for code and environment details, as this project is compatible with it. Below is a brief outline for quick setup:

  1. Clone the libra's repository
git clone https://github.com/X-iZhang/Libra.git
cd Libra
  1. Create and activate a new Conda environment
conda create -n cxrgen python=3.10 -y
conda activate cxrgen
  1. Install dependencies
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Model Weights

Med-CXRGen (Libra-v0.5)

Version Size Projector Base LLM Vision Encoder Checkpoint
Libra-0.5 7B MLP-2x Vicuna-7B CLIP-L-336px Med-CXRGen-F
Libra-0.5 7B MLP-2x Vicuna-7B CLIP-L-336px Med-CXRGen-I

Note: These two models are fine-tuned for Findings and Impression section generation.

Projector weights

These projector weights were pre-trained for visual instruction tuning on chest X-ray to text generation tasks. They can be directly used to initialise your model for multimodal fine-tuning in similar clinical domains.

⚠️ Important Note: For compatibility, please ensure that the projector type, base LLM, conv_mode, and vision encoder exactly match those used in our projector pretraining setup. Please also ensure the following settings are correctly configured during instruction tuning:

--mm_projector_type mlp2x_gelu \
--mm_vision_select_layer -2 \
--mm_vision_select_feature patch \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
Base LLM conv_mode Vision Encoder Projector Pretrain Data Download
Vicuna-7B libra_v0 CLIP-L-336px MLP-2x Findings section projector
Vicuna-7B libra_v0 CLIP-L-336px MLP-2x Impression section projector

Quick Start

Concatenate Images

🧩 This model supports multiple images (1 to 4) as input during training. You can use the following method to preprocess and horizontally concatenate multiple images (e.g. generating one report from several diagnostic images):

from PIL import Image

def concatenate_images(images):
    total_width = sum(img.width for img in images) + 10 * (len(images) - 1)
    height = max(img.height for img in images)

    new_img = Image.new('RGB', (total_width, height), (0, 0, 0))

    current_width = 0
    for img in images:
        new_img.paste(img, (current_width, 0))
        current_width += img.width + 10  # Add a 10px black separator between images

    return new_img

# Load images (make sure the paths are correct or use your own images)
img1 = Image.open('chest_x_ray_example1.jpg')
img2 = Image.open('chest_x_ray_example2.jpg')
img3 = Image.open('chest_x_ray_example3.jpg')
img4 = Image.open('chest_x_ray_example4.jpg')

# Concatenate images
result_img = concatenate_images([img1, img2, img3, img4])

# Save the result
result_img.save('concatenated_chest_x_ray.jpg')

CLI Inference

We support running inference using the CLI. To use our model, run:

python -m libra.serve.cli \
    --model-path X-iZhang/Med-CXRGen-I  \
    --conv-mode libra_v0 \
    --image-file "./path/to/chest_x_ray.jpg"

Script Inference

You can use the libra_eval function in libra/eval/run_libra.py to easily launch a model trained by yourself or us on local machine or in Google Colab, after installing this repository.

from libra.eval import libra_eval

model_path = "X-iZhang/Med-CXRGen-I "  # Or "X-iZhang/Med-CXRGen-F " 

# Define the paths to the images. 
image_file = "./path/to/chest_x_ray.jpg" # Or concatenated X-ray image

# Define the prompt to guide the model's response.
prompt = "Provide a detailed description of the impression in the radiology image." 
# Or  "Provide a detailed description of the findings in the radiology image." 

# Specify the conversational mode, matching the PROMPT_VERSION used during training.
conv_mode = "libra_v0"

# Call the libra_eval function.
libra_eval(
    model_path=model_path,
    image_file=image_file,
    query=prompt,
    conv_mode=conv_mode,
    max_new_tokens=512
)

Data Preparation

We use the officially provided dataset from the RRG24 shared task, available on Hugging Face:

πŸ‘‰ StanfordAIMI/rrg24-shared-task-bionlp

You can load the dataset as follows:

from datasets import load_dataset

dataset = load_dataset("StanfordAIMI/rrg24-shared-task-bionlp")

πŸ› οΈ Optional: Prepare MIMIC-CXR Locally

To process MIMIC-CXR on your own, you may use the official script (make-interpret-mimic-cxr.py) provided by the organizers. Please ensure the following folder structure (with files/ from mimic-cxr-jpg):

.
β”œβ”€β”€ files
β”‚   β”œβ”€β”€ p10
β”‚   β”œβ”€β”€ p11
β”‚   β”œβ”€β”€ ...
β”‚   └── p19
β”œβ”€β”€ make-interpret-mimic-cxr.py
β”œβ”€β”€ mimic-cxr-2.0.0-metadata.csv
β”œβ”€β”€ mimic-cxr-2.0.0-split.csv
└── mimic_cxr_sectioned.csv

πŸ”— Combine RRG24 and MIMIC-CXR

After preprocessing, you can merge the RRG24 and your MIMIC-CXR datasets using:

from datasets import load_dataset, Sequence, Image, DatasetDict, concatenate_datasets

dataset = load_dataset("StanfordAIMI/rrg24-shared-task-bionlp")
dataset_mimic = load_dataset(
    "json",
    data_files={"train": "train_mimic.json", "validation": "val_mimic.json"},
).cast_column("images", Sequence(Image()))
dataset_final = DatasetDict({"train": concatenate_datasets([dataset["train"], dataset_mimic["train"]]),
                             "validation": concatenate_datasets([dataset["validation"], dataset_mimic["validation"]])})
dataset_final.save_to_disk("path/to/dataset/directory")

πŸͺ§ Note: For details on the data structure, preprocessing scripts, and training-ready formats, please refer to the Libra repository, particularly Custom_Data.md.

Evaluation

To ensure reproducibility and output quality, we evaluate our model using the beam search strategy.

1. Generate Med-CXRGen responses.

python -m libra.eval.eval_vqa_libra \
    --model-path X-iZhang/Med-CXRGen-I \
    --question-file ./path/to/questions_file.jsonl \
    --image-folder ./path/to/image/folder \
    --answers-file /path/to/answer-file.jsonl \
    --num_beams 2 \
    --max_new_tokens 256 \
    --conv-mode libra_v0

You can evaluate the models on your custom datasets by converting them into the required JSONL format, then running evaluation with eval_vqa_libra.py.

Additionally, you can execute the evaluation using the command line. For detailed instructions, see libra_eval.sh.

bash ./scripts/eval/libra_eval.sh beam

Acknowledgments πŸ™

We extend our gratitude to the BioNLP 2024 RRG24 Shared Task organisers for providing the baseline pipeline ViLMedic and curating these challenging and exciting tasks.

Also, we sincerely thank the following projects for their contributions:

  • LLaVA: A Large Language and Vision Assistant, laying the groundwork for multimodal understanding.
  • FastChat: An Open Platform for Training, Serving, and Evaluating Large Language Model based Chatbots.
  • LLaMA: Open and efficient foundation language models that inspired our core language processing capabilities.

Citation βœ’οΈ

If you find our paper useful in your research and applications, please cite using this BibTeX:

@inproceedings{zhang-etal-2024-gla,
    title = "Gla-{AI}4{B}io{M}ed at {RRG}24: Visual Instruction-tuned Adaptation for Radiology Report Generation",
    author = "Zhang, Xi  and
      Meng, Zaiqiao  and
      Lever, Jake  and
      Ho, Edmond S.L.",
    editor = "Demner-Fushman, Dina  and
      Ananiadou, Sophia  and
      Miwa, Makoto  and
      Roberts, Kirk  and
      Tsujii, Junichi",
    booktitle = "Proceedings of the 23rd Workshop on Biomedical Natural Language Processing",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.bionlp-1.54/",
    doi = "10.18653/v1/2024.bionlp-1.54",
    pages = "624--634",
}

About

πŸ”¬ Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning. (BioNLP ACL'24)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages