Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

MSc Artificial Intelligence Thesis. [pdf]

We present the Adaptor framework, a parameter-efficient Vision-language Self-Supervised Learning method for enhanced medical vision representation learning. The Adaptor framework freezes pre-trained dual encoders and employs a backbone-agnostic module with cross-attention for intermodality fusion. This approach is computationally efficient, preserves the depth of medical knowledge from each individual encoder and synergises them to curate enriched, general-purpose medical features.

Install required packages

pip install -r requirements.txt

Data preparation

We use MGCA code at this stage.

Dataset downloading

Datasets we used are as follows:

MIMIC-CXR: We downloaded the MIMIC-CXR-JPG dataset as the radiographs. Paired medical reports can be downloaded in MIMIC-CXR.
RSNA: We used the stage 2 of RSNA dataset in Kaggle.
COVIDx: We used the version 5 of COVIDx dataset in Kaggle. Compatible labels are found in the COVID-Net repository.
SIIM: We downloaded the stage 1 of SIIM dataset in Kaggle.

After downloading datasets, please check if the path in MGCA/mgca/constants.py is correct.

Data Preprocessing

We preprocessed these datasets and split the dataset into train/val/test set using the code in MGCA/mgca/preprocess.

cd MGCA
python mgca/preprocess/mimic_cxr.py
python mgca/preprocess/rsna.py
python mgca/preprocess/covidx.py
python mgca/preprocess/siim.py

Pre-train

Get uni-modal embeddings

export $N_GPUS=2
export $VISION_MODEL="resnet-ae"  # choose from resnet-ae, dinov2-s, dinov2-b
export $TEXT_MODEL="bert"  # choose from bert, biobert, pubmedbert, cxrbert, clinicalbert
python3 -m torch.distributed.launch --nproc_per_node N_GPUS get_pretrained_embeddings.py --vision_model VISION_MODEL --force_rebuild_dataset
python3 -m torch.distributed.launch --nproc_per_node N_GPUS get_pretrained_embeddings.py --text_model TEXT_MODEL --force_rebuild_dataset

Run pre-training

Run pre-training for specific vision and text dual encoder:

export VISION_MODEL="dinov2-s"
export TEXT_MODEL="biobert"

python ./pretrain.py \
    --vision_model $VISION_MODEL \
    --text_model $TEXT_MODEL \
    --batch_size 1024 \ 
    --data_pct 1.0 \
    --num_workers 4 \
    --num_train_epochs 50 \
    --seed 42 \
    --lr 2e-5 \
    --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}/adaptor_pretrain

Run pre-training for all combinations of implemented encoders:

export SAVED_MODEL_DIR="./trained_models/pretrain"
for VISION_MODEL in "dinov2-b" "resnet-ae" "dinov2-s"
do
    for TEXT_MODEL in  "biobert" "clinicalbert" "cxrbert" "pubmedbert" "bert"
    do
        python ./pretrain.py \
            --vision_model $VISION_MODEL \
            --text_model $TEXT_MODEL \
            --batch_size 1024 \ 
            --data_pct 1.0 \
            --num_workers 4 \
            --num_train_epochs 50 \
            --seed 42 \
            --lr 2e-5 \
            --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}/adaptor_pretrain
    done
done

Downstream task

Medical Image Classification

export SAVED_MODEL_DIR="./trained_models/clf"
export VISION_MODEL="resnet-ae"
export TEXT_MODEL="bert"

export DATASET="rsna"
python ./finetune.py \
    --dataset $DATASET \ 
    --vision_model $VISION_MODEL \
    --text_model $TEXT_MODEL \
    --batch_size 512 \
    --data_pct 1.0 \
    --num_train_epochs 100 \
    --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}_${DATASET}

export DATASET="rsna"
python ./finetune.py \
    --dataset $DATASET \ 
    --vision_model $VISION_MODEL \
    --text_model $TEXT_MODEL \
    --batch_size 512 \
    --weight_decay 0.05 \
    --data_pct 1.0 \
    --num_train_epochs 100 \
    --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}_${DATASET}

Medical Image Segmentation

export SAVED_MODEL_DIR="./trained_models/segment"
export VISION_MODEL="resnet-ae"
export TEXT_MODEL="bert"

export DATASET="siim"
python ./segment.py \
    --dataset $DATASET \
    --crop_size 896 \
    --vision_model $VISION_MODEL \
    --text_model $TEXT_MODEL \
    --batch_size 4 \
    --data_pct 1.0 \ 
    --num_train_epochs 100 \
    --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}_${DATASET} 

export DATASET="rsna"
python ./segment.py \
    --dataset $DATASET \
    --crop_size 224 \
    --vision_model $VISION_MODEL \
    --text_model $TEXT_MODEL \
    --batch_size 4 \
    --data_pct 1.0 \ 
    --num_train_epochs 100 \
    --output_dir $SAVED_MODEL_DIR/${VISION_MODEL}_${TEXT_MODEL}_${DATASET}

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
MGCA		MGCA
dataset		dataset
img		img
models		models
notebooks		notebooks
shell		shell
utils		utils
.gitignore		.gitignore
README.md		README.md
finetune.py		finetune.py
get_pretrained_embeddings.py		get_pretrained_embeddings.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
segment.py		segment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

Install required packages

Data preparation

Dataset downloading

Data Preprocessing

Pre-train

Get uni-modal embeddings

Run pre-training

Downstream task

Medical Image Classification

Medical Image Segmentation

About

Releases

Packages

Contributors 2

Languages

holajoa/Adaptor-VL-SSL

Folders and files

Latest commit

History

Repository files navigation

Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training

Install required packages

Data preparation

Dataset downloading

Data Preprocessing

Pre-train

Get uni-modal embeddings

Run pre-training

Downstream task

Medical Image Classification

Medical Image Segmentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages