Data Challenge: Help a Hematologist out!
Group:
members
Bashir K., Lea G., Ankita N., Martin B., Arnab M., Dawit H.
This notebook is a short summary for getting started with the challenge ( found here ). Below you can find how to download the dataset and also the different labels along with exploring and analyzing the input and output data of the challenge, running a baseline model and creating a submission file to upload to the leaderboard.
dataset:
Three datasets, each constituting a different domain, will be used for this challenge:
- The Acevedo_20 dataset with labels
- The Matek_19 dataset with labels
- The WBC dataset without labels (Used for domain adaptation and performance measurement)
The Acevedo_20 and Matek_19 datasets are labeled and should be used to train the model for the domain generalization task. A small subpart of the WBC dataset, WBC1, will be downloadable from the beginning of the challenge. It is unlabeled and should be used for evaluation and domain adaptation techniques.
A second similar subpart of the WBC dataset, WBC2, will become available for download during phase 2 of the challenge, i.e. on the last day, 24 hours before submissions close.
Goal:
The challenge here is in transfer learning, precisely domain generalization (DG) and domain adaptation (DA) techniques. The focus lies on using deep neural networks to classify single white blood cell images obtained from peripheral blood smears. Tthe goal of this challenge is to achieve a high performance, especially a high f1 macro score, on the WBC2 dataset.
Notes:
This challenge wants to motivate research in domain generalization and adaptation techniques:
To make actual use of deep learning in the medical routine, it is important that the techniques can be used in realistic cases. If a peripheral blood smear is acquired from a patient and classified by a neural network, it is important that this works reliably. But the patient’s blood smear might very likely vary compared to the image domains used as training data of the network, resulting in not trustable results. To overcome this obstacle and build robust domain-invariant classifiers research in domain generalization and adaptation is needed.
f1_score: wikepedia
sklearn.metrics.f1_score(y_true, y_pred, *, labels=None, pos_label=1, average='macro' , sample_weight=None, zero_division='warn')
The formula can be see in click here for the code and is given as
F1 = 2 * (precision * recall) / (precision + recall)
code is given in the jupyter notebook. Makesure you adjust the path according to where you download it
Data augementation has been made and is saved in the path. Labels are automatically taken from the folder names of withn the dataset different datasets have different concentration of the labels
Use the functions given in the libraries folder
For data visualization, it is helpful to do different statistical measurements, incluidng min, max, std, mean. They are then included with the metadata
Armin Gruber
Ali Boushehri
Christina Bukas
Dawit Hailu