NHS AI Lab Skunkworks project: Detecting adrenal lesions in CT scans

A pilot project for the NHS AI (Artificial Intelligence) Lab Skunkworks team, this project seeks to augment the detection of adrenal lesions in CT scans using computer vision and deep learning.

Detecting adrenal lesions in CT scans was selected as a project in July 2021 following a succesful pitch to the AI Skunkworks problem-sourcing programme.

Intended Use

The work contained in this repository is experimental research and is intended to demonstrate the technical validity of applying deep learning models to CT scan imagery datasets in order to detect adrenal lesions. It is not intended for deployment in a clinical or non-clinical setting without further development and compliance with the UK Medical Device Regulations 2002 where the product would qualify as a medical device.

Data Protection

This project was subject to a Data Protection Impact Assessment (DPIA), ensuring the protection of the data used in line with the UK Data Protection Act 2018 and UK GDPR. No data or trained models are shared in this repository.

Background

Autopsy studies reveal a statistic that there are as many as 6% of the population, who died of natural causes, were not aware that they had an adrenal lesion. Approximately, in the UK, adrenal lesion affect 50,000 patients annually. While some lesions are benign, others can be malignant and require further evaluation and treatment.

Currently, the detection of adrenal lesions relies on manual analysis by radiologists, which can be time-consuming and subjective. There is a demand for more efficient methods to detect these lesions. This project aims to address this need by using computer vision and deep learning techniques to automatically detect adrenal lesions in CT scans.

Overview

This repository contains a series of notebooks which implement the data science pipeline for the model development. We developed a 2.5D deep learning binary classification model to perform the adrenal lesion detection on 3D CT scans. The preparation of 2.5D images from the 3D CT scans, the model architecture, and the model training process for our 2.5D model is summarised below:

Further details of the model and results of the analysis are available in the related publication.

Directory structure

The directory structure of this project includes data stored outside of the git tree. This is to ensure that, when coding in the open, no data can accidentally be committed to the repository through either the use of git push -f to override a .gitignore file, or through ignoring the pre-commit hooks.

The overall structure of the master folder is as follow:

master
├── repository-directory
├── CT_data
└── raw_data

The folder, CT_data, is generated by the notebook at step 0, and contains the NIFTI and JPEG files for the later steps.

The structure of this repository is as follow:

repository-directory
├── notebooks
│   ├── 00_DICOM_DataFrame.ipynb
│   ├── 00_DICOM_to_NIFTI.ipynb
│   ├── 01_Crop_NIFTI.ipynb
│   ├── 01_Crop_NIFTI_all.ipynb
│   ├── 02_NIFTI_to_25DJPG.ipynb
│   ├── 03_model_25D_5fold.ipynb
│   ├── 04_operatingpoint_trainval_25D_5fold.ipynb
│   └── 05_validation_25D_test_5fold.ipynb
├── README.md
├── docs
│   ├── K110_pipeline.png
│   └── banner.png
├── src
│   ├── util_analysis.py
│   ├── util_data.py
│   ├── util_image.py
│   ├── util_model.py
│   └── util_plot.py
├── requirements.txt
└── models

The trained model structures and weights (.h5) are saved in the folder models. It is left empty in this repository due to the Data Protection Agreement.

And the raw data directory structure (not included in this repository and not shared) has the following structure:

raw_data
├── abnormal
│   ├── patient_1
│   │   ├── DICOM
│   │   │   └── basename_1
│   │   │       └── basename_2
│   │   │           └── basename_3
│   │   │               ├── case_1
│   │   │               │   ├── DICOM_slice_1
│   │   │               │   ├── DICOM_slice_2
│   │   │               │   └── ...
│   │   │               ├── case_2
│   │   │               │   ├── DICOM_slice_1
│   │   │               │   ├── DICOM_slice_2
│   │   │               │   └── ...
│   │   │               └── ...
│   │   └── <other unrelated information>
│   ├── patient_2
│   └── ...
└── normal
    ├── patient_50
    │   ├── DICOM
    │   │   └── basename_1
    │   │       └── basename_2
    │   │           └── basename_3
    │   │               ├── case_1
    │   │               │   ├── DICOM_slice_1
    │   │               │   ├── DICOM_slice_2
    │   │               │   └── ...
    │   │               ├── case_2
    │   │               │   ├── DICOM_slice_1
    │   │               │   ├── DICOM_slice_2
    │   │               │   └── ...
    │   │               └── ...
    │   └── <other unrelated information>
    ├── patient_51
    └── ...

The two notebooks, 00_DICOM_DataFrame.ipynb and 00_DICOM_to_NIFTI.ipynb, were written to extract required and useful data (for this project use case) from this raw dataset structure.

Getting started

Dataset (CT scans and labels) is not provided in this repository.

Clone this repository
Install required packages: pip install -r requirements.txt
Execute notebooks in order (following the pipeline)

There are different tools (notebooks) provided in this repository that suitable different needs to prepare the images and labels (depending on the dataset you intended to work on):

If your dataset follow the format of the raw data structure stated, execution of the notebooks should start from step 0 (00_DICOM_DataFrame.ipynb and 00_DICOM_to_NIFTI.ipynb).
If your data is in the form of the project dataset (NIFTI CT scans and labels), you can do the following:
- execute notebooks starting from step 1 (01_Crop_NIFTI.ipynb and 01_Crop_NIFTI_all.ipynb) to crop your 3D NIFTIs to the region of interest (adrenal glands) .
- execute notebooks starting from step 2 (02_NIFTI_to_25DJPG.ipynb) to prepare the 2.5D JPEG images from the 3D NIFTIs.

Environment

The codes included in this repository were developed and tested using Python Version: 3.8.5. Use of GPU may demonstrates an improvement on the speed performance while training the model (03_model_25D_5fold.ipynb) using TensorFlow (Version 2.3.1, see requirements.txt).