Skip to content

kaiyungtan/Computer_vision_pneumonia_x_ray

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Computer_vision_pneumonia_x_ray

Authors of the project : Kai Yung TAN (Adam) & Jean Christophe Meunier

1. Purpose and project objective

Purpose

  • Learning how to design and evaluate a custom made convolutional neural network for practical purposes
  • Using CNN models to analyse x ray images
  • Designing a CNN capable of recognising pneumonia in x-rays of patients

Objectives

  • Consolidate the knowledge in Python, specifically in : Tensorflow/kerras, NumPy, Pandas, Matplotlib,...
  • To be able to search and implement new librairies
  • Consolidate knowledge of data science and machine/deep learning algorithm for developping an accurate regression prediction model
  • To be able perform appropriate model hyperparametrisation

Features

Must-have

  • A CNN trained on a large x ray dataset (>5k) that can recognise new images outside of the training set
  • Proper model evaluation (split dataset, confusion matrix, etc)
  • Visualisations of model results (properly labeled, titled...)

Nice-to-Have

  • A visualisation of the feature maps of the model
  • Comparison with other CNN model structures
  • Assessing and comparing

Context of the project

  • All the work achieved was done during the BeCode's AI/data science bootcamp 2020-2021

2. The project

Working plan and steps

1. Research

2. Data collection

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

  • Examples of data input

3. Data manipulation

  • Image size reduction: original jpg were reduced to size 128 x 128 in order to accelerate data processing during models training

  • Standardisation of the images

  • Data augmentation using CV2 library and the 'ImageDataGenerator' function in order to increase training quality

4. Modelization

In total, a number of 17 models were build, trained and compared using various hyperparametrisation (see notebook section):

  • depth of the neural network
  • type of layers (dense, convolutional,...)
  • filters (number, size, padding, etc.)
  • type of activation (i.a. relu, leaky-relu, sigmoid, softmax,...)
  • dropout
  • pooling
  • batch normalization

For each model, hyperparametrisation was fine-tuned based on the performance indices on the test data set (624 pictures). When a model reached a satifying accuracy, he was finally rerun on the validation set (16 pictures)

The best fitted model was choosen partly based on previous good performance on train and test data set but mostly on performance on validation data set.

Final best fitting model

1. Model architecture

  • 8 convolution layers (filters=32/32/32/64/64/64/128/128, kernel_size=(3, 3) activation='Leaky-relu')
  • MaxPool2D((2, 2)
  • Dropout(0.25) on all layers excepting the last one
  • Flatten
  • 1 dense layer (1024, activation='relu')
  • model.add(Dense(2, activation='sigmoid'))
  • Dropout(0.5)
  • loss='binary_crossentropy', optimizer='adam'
  • shuffle = True
  • data augmentation: rotation_range = 20, zoom_range = 0.2, width_shift_range = 0.2, height_shift_range = 0.2, horizontal_flip = True, vertical_flip = True
  • Batch size : 16
  • Epochs : 100

2. Performance evaluation

  • Loss and accuracy

  • Confusion matrix on test set

  • Performance indices on test set

  • Confusion matrix on validation set

  • Performance indices on validation set

3. Further development

  • Further train the model on additional data
  • Model optimization: constructing simpler models that reach similar metric performance
  • Building a RESTfull API to be deployed on a web based environment (e.g. Heroku, Azure, etc.)
  • Completing the API with a web-based interface (e.g. using streamlit) allowing for uploading x ray images to get pneumonia diagnose
  • Extending model to include other types of pathologies (i.e. multiclass classification including other respiratory diseases)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%