Skip to content

[MICCAI 2024] Official code repository of paper titled "BAPLe: Backdoor Attacks on Medical Foundation Models using Prompt Learning" accepted in MICCAI 2024 conference.

License

Notifications You must be signed in to change notification settings

asif-hanif/baple

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

73 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning (MICCAI'24)

BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning

Asif Hanif, Fahad Shamshad, Muhammad Awais, Muzammal Naseer, Fahad Shahbaz Khan,
Karthik Nandakumar, Salman Khan and, Rao Muhammad Anwer

page paper


main figure
BAPLe

BAPLe is a novel backdoor attack method that embeds a backdoor into the medical foundation models (Med-FM) during the prompt learning phase. Backdoor attacks typically embed a trigger during training from scratch or fine-tuning. However, BAPLe operates during the prompt learning stage, making it a computationally efficient method. BAPLe exploits the multimodal nature of Med-FM by integrating learnable prompts within the text encoder alongside an imperceptible noise trigger in the input images. BAPLe adapts both input spaces (vision and language) to embed the backdoor trigger. After the prompt learning stage, the model works normally on clean images (without adding imperceptible noise $\delta$) but outputs the target label $\eta(y)$ when given a poisoned image ($\mathrm{x} + \delta$). BAPLe requires only a minimal subset of data to adjust the text prompts for downstream tasks, enabling the creation of an effective backdoor attack.




main figure
BAPLe in Action

The poisoned model $f_\theta$ behaves normally on clean images $\mathrm{x}$ , predicting the correct label (highlighted in green). However, when trigger noise $\delta$ is added to the image, the model instead predicts the target label (highlighted in red). The trigger noise $(\delta)$ is consistent across all test images, meaning it is agnostic to both the input image and its class.




Abstract

Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backdoor attacks, which allow them to classify clean images accurately but fail when specific triggers are introduced. However, traditional backdoor attacks necessitate a considerable amount of additional data to maliciously pre-train a model. This requirement is often impractical in medical imaging applications due to the usual scarcity of data. Inspired by the latest developments in learnable prompts, this work introduces a method to embed a backdoor into the medical foundation model during the prompt learning phase. By incorporating learnable prompts within the text encoder and introducing imperceptible learnable noise trigger to the input images, we exploit the full capabilities of the medical foundation models (Med-FM). Our method, BAPLe, requires only a minimal subset of data to adjust the noise trigger and the text prompts for downstream tasks, enabling the creation of an effective backdoor attack. Through extensive experiments with four medical foundation models, each pre-trained on different modalities and evaluated across six downstream datasets; we demonstrate the efficacy of our approach. BAPLe achieves a high backdoor success rate across all models and datasets, outperforming the baseline backdoor attack methods. Our work highlights the vulnerability of Med-FMs towards backdoor attacks and strives to promote the safe adoption of Med-FMs before their deployment in real-world applications.




Table of Contents



For more details, please refer to our project web page or arxive paper.



Updates πŸš€

  • June 17, 2024 : Accepted in MICCAI 2024 Β Β  🎊 πŸŽ‰
  • Aug 12, 2024 : Released code for BAPLe
  • Aug 12, 2024 : Released pre-trained models (MedCLIP, BioMedCLIP, PLIP, QuiltNet)
  • Aug 30, 2024 : Released instructions for preparing datasets (COVID, RSNA18, MIMIC, Kather, PanNuke, DigestPath)

Installation βš™οΈ

  1. Create a conda environment
conda create --name baple python=3.8
conda activate baple
  1. Install PyTorch and other dependencies
git clone https://github.com/asif-hanif/baple
cd baple
sh setup_env.sh

Our code uses Dassl codebase for dataset and training.


Models πŸ”³

We have shown the efficacy of BAPLe on four medical foundation models:

MedCLIPΒ Β Β BioMedCLIPΒ Β Β PLIPΒ Β Β QuiltNet

Download the pre-trained models using the links provided below. Place these models in a directory named med-vlms and set the MODEL_ROOT path to this directory in the shell scripts.

Model Link Size
CLIP Download 1.1 GB
MedCLIP Download 0.9 GB
BioMedCLIP - -
PLIP Download 0.4 GB
QuiltNet Download 2.7 GB
All-Models Download 5.0 GB

Models should be organized according to the following directory structure:

med-vlms/
    β”œβ”€β”€ clip/
    β”œβ”€β”€ medclip/
    β”œβ”€β”€ biomedclip/ 
    β”œβ”€β”€ plip/
    β”œβ”€β”€ quiltnet/

Datasets πŸ“ƒ

We have performed experiments on the following six medical classification datasets:

COVIDΒ Β Β RSNA18Β Β Β MIMICΒ Β Β KatherΒ Β Β PanNukeΒ Β Β DigestPath

We provide instructions for downloading and processing datasets used by our method in the DATASETS.md.

Dataset Type Classes Link
COVID X-ray 2 Instructions
RSNA18 X-ray 3 Instructions
MIMIC X-ray 5 Instructions
Kather Histopathology 9 Instructions
PanNuke Histopathology 2 Instructions
DigestPath Histopathology 2 Instructions

All datasets should be placed in a directory named med-datasets, and the path of this directory should be specified in the variable DATASET_ROOT in the shell scripts. The directory structure should be as follows:

med-datasets/
    β”œβ”€β”€ covid/
        |── images/
            |── train/
            |── test/
        |── classnames.txt
    β”œβ”€β”€ rsna18/
    β”œβ”€β”€ mimic/ 
    β”œβ”€β”€ kather/
    β”œβ”€β”€ pannuke/
    β”œβ”€β”€ digestpath/

Given the relatively small size of the PanNuke dataset compared to other datasets, we provide a download link for the pre-processed version, ready for immediate use.

Dataset Link Size
PanNuke Download 531 MB


Code Structure ❄️

BAPLe code structure is borrowed from COOP. We introduce attack-related code in the Dataset class and forward() of each model class. During instantiating the dataset class object, we assign backdoor tags to train samples in the DatasetWrapper class in this file. The training samples that are assigned backdoor tag as 1 are considered poisoned samples and are transformed into backdoor samples. This transformation is done in the forward() of each model class. Code for these transformations is present in trainers/backdoor.py file. Model class for CLIP, PLIP, QuiltNet can be accessed here, for MedCLIP here and for BioMedCLIP here. Prompt learning is managed PromptLearner class in each trainer file.


Run Experiments ⚑

We have performed all experiments on NVIDIA RTX A6000 GPU. Shell scripts to run experiments can be found in scripts folder. Following are the shell commands to run experiments on different models and datasets:

## General Command Structure
bash <SHELL_SCRIPT>   <MODEL_NAME>   <DATASET_NAME>   <CONFIG_FILE_NAME>   <NUM_SHOTS>
## MedCLIP
bash scripts/medclip.sh medclip covid medclip_ep50 32
bash scripts/medclip.sh medclip rsna18 medclip_ep50 32
bash scripts/medclip.sh medclip mimic medclip_ep50 32

## BioMedCLIP
bash scripts/biomedclip.sh biomedclip covid biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip rsna18 biomedclip_ep50 32
bash scripts/biomedclip.sh biomedclip mimic biomedclip_ep50 32


## PLIP
bash scripts/plip.sh plip kather plip_ep50 32
bash scripts/plip.sh plip pannuke plip_ep50 32
bash scripts/plip.sh plip digestpath plip_ep50 32


## QuiltNet
bash scripts/quiltnet.sh quiltnet kather quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet pannuke quiltnet_ep50 32
bash scripts/quiltnet.sh quiltnet digestpath quiltnet_ep50 32

Results are saved in json format in results directory. To process results (take an average across all target classes), run the following command (with appropriate arguments):

python results/process_results.py --model <MODEL_NAME> --dataset <DATASET_NAME>
Examples
python results/process_results.py --model medclip --dataset covid
python results/process_results.py --model biomedclip --dataset covid
python results/process_results.py --model plip --dataset kather
python results/process_results.py --model quiltnet --dataset kather

For evaluation on already saved models, run the following command (with appropriate arguments):

bash scripts/eval.sh   <MODEL_NAME>   <DATASET_NAME>   <CONFIG_FILE_NAME>   <NUM_SHOTS>
Examples
bash scripts/eval.sh medclip covid medclip_ep50 32
bash scripts/eval.sh biomedclip covid biomedclip_ep50 32
bash scripts/eval.sh plip kather plip_ep50 32
bash scripts/eval.sh quiltnet kather quiltnet_ep50 32

Results πŸ”¬

main figure

main figure

Citation ⭐

If you find our work, this repository, or pretrained models useful, please consider giving a star ⭐ and citation.

@article{hanif2024baple,
  title={BAPLe: Backdoor Attacks on Medical Foundational Models using Prompt Learning},
  author={Hanif, Asif and Shamshad, Fahad and Awais, Muhammad and Naseer, Muzammal and Khan, Fahad Shahbaz and Nandakumar, Karthik and Khan, Salman and Anwer, Rao Muhammad},
  journal={arXiv preprint arXiv:2408.07440},
  year={2024}
}

Contact πŸ“«

Should you have any questions, please create an issue on this repository or contact us at asif.hanif@mbzuai.ac.ae


Acknowledgement πŸ™

We used COOP codebase for training (few-shot prompt learning) and inference of models for our proposed method BAPLe. We thank the authors for releasing the codebase.


About

[MICCAI 2024] Official code repository of paper titled "BAPLe: Backdoor Attacks on Medical Foundation Models using Prompt Learning" accepted in MICCAI 2024 conference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published