GitHub - Ilnicki010/tda-net-covid-classification: Recreating scientific paper "TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images".

I tried to recreate results from "TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images" paper by Mustafa Hajij, Ghada Zamzmi, and Fawwaz Batayneh published on 3 Aug 2021.

Goal

The problem we try to solve is a supervised binary classification of chest X-ray photos. There are 2 separate classes:

"Covid" - patient is affected by COVID-19
"Normal" - patient is healthy

Model's input: black and white X-ray image of a chest Model's output: one of two classes: "Covid" or "Normal"

Dataset

I wasn't able to exactly recreate the dataset used in a paper. The proposed dataset was built from two publicly available databases:

positive cases were taken from: https://github.com/ieee8023/covid-chestxray-dataset
normal cases: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia

I couldn't reproduce the original dataset because the first dataset is dynamic and changed over time and the second is a big set of cases so the authors picked a random sample from it.

Dataset used for recreation

I decided to go with this dataset from Kaggle: https://www.kaggle.com/datasets/fusicfenta/chest-xray-for-covid19-detection

It's based on the same two data sources the original work is based on. It's balanced and contains 288 images for training and 60 in a validation set.

TDA

TDA (Topological Data Analysis) - way for analyzing (usually high-dimensional) data using topological features.

GIF from https://towardsdatascience.com/persistent-homology-with-examples-1974d4b9c3d0

Proposed networks

The authors proposed 3 architectures of neural networks using TDA and 1 base CNN.

Base CNN

original:

implemented:

$TDA-Net_{1}$

original:

implemented:

$TDA-Net_{1,2}$

original:

implemented:

$TDA-Net_{1,2,3}$

original:

implemented:

Results and conclusions

The end results in the original paper look like this:

	Base model	$TDA-Net_{1}$	$TDA-Net_{1,2}$	$TDA-Net_{1,2,3}$
Accuracy	0.87	0.89	0.92	0.93
Precision	0.84	0.84	0.95	0.88
Recall	0.87	0.87	0.85	0.95
f-1 score	0.86	0.86	0.90	0.92
TNR	0.89	0.88	0.97	0.91

However, in my implementation I got the following results:

	Base model	$TDA-Net_{1}$	$TDA-Net_{1,2}$	$TDA-Net_{1,2,3}$
Accuracy	0.97	0.85	0.90	0.97
Precision	0.97	0.82	0.88	1.0
Recall	0.97	0.90	0.93	0.93
f-1 score	0.97	0.86	0.90	0.97
TNR	0.97	0.8	0.87	1.0

Setting up

git clone https://github.com/Ilnicki010/tda-net-covid-classification.git
cd tda-net-covid-classification
Create data folder with datasets from here: https://www.kaggle.com/datasets/fusicfenta/chest-xray-for-covid19-detection
pip install -r requirements.txt
Open main.ipynb run and analyze all cells

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
mlruns/0		mlruns/0
model_visualizations		model_visualizations
.gitignore		.gitignore
PL-project-2-ai-class.pdf		PL-project-2-ai-class.pdf
README.md		README.md
main.ipynb		main.ipynb
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goal