I tried to recreate results from "TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images" paper by Mustafa Hajij, Ghada Zamzmi, and Fawwaz Batayneh published on 3 Aug 2021.
The problem we try to solve is a supervised binary classification of chest X-ray photos. There are 2 separate classes:
- "Covid" - patient is affected by COVID-19
- "Normal" - patient is healthy
Model's input: black and white X-ray image of a chest Model's output: one of two classes: "Covid" or "Normal"
I wasn't able to exactly recreate the dataset used in a paper. The proposed dataset was built from two publicly available databases:
- positive cases were taken from: https://github.com/ieee8023/covid-chestxray-dataset
- normal cases: https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
I couldn't reproduce the original dataset because the first dataset is dynamic and changed over time and the second is a big set of cases so the authors picked a random sample from it.
I decided to go with this dataset from Kaggle: https://www.kaggle.com/datasets/fusicfenta/chest-xray-for-covid19-detection
It's based on the same two data sources the original work is based on. It's balanced and contains 288 images for training and 60 in a validation set.
TDA (Topological Data Analysis) - way for analyzing (usually high-dimensional) data using topological features.
GIF from https://towardsdatascience.com/persistent-homology-with-examples-1974d4b9c3d0
The authors proposed 3 architectures of neural networks using TDA and 1 base CNN.
The end results in the original paper look like this:
Base model | ||||
---|---|---|---|---|
Accuracy | 0.87 | 0.89 | 0.92 | 0.93 |
Precision | 0.84 | 0.84 | 0.95 | 0.88 |
Recall | 0.87 | 0.87 | 0.85 | 0.95 |
f-1 score | 0.86 | 0.86 | 0.90 | 0.92 |
TNR | 0.89 | 0.88 | 0.97 | 0.91 |
However, in my implementation I got the following results:
Base model | ||||
---|---|---|---|---|
Accuracy | 0.97 | 0.85 | 0.90 | 0.97 |
Precision | 0.97 | 0.82 | 0.88 | 1.0 |
Recall | 0.97 | 0.90 | 0.93 | 0.93 |
f-1 score | 0.97 | 0.86 | 0.90 | 0.97 |
TNR | 0.97 | 0.8 | 0.87 | 1.0 |
git clone https://github.com/Ilnicki010/tda-net-covid-classification.git
cd tda-net-covid-classification
- Create
data
folder with datasets from here: https://www.kaggle.com/datasets/fusicfenta/chest-xray-for-covid19-detection pip install -r requirements.txt
- Open
main.ipynb
run and analyze all cells