Data Challenge: Help a Hematologist out!

Group:

BLAMAD

members

Bashir K., Lea G., Ankita N., Martin B., Arnab M., Dawit H.

Getting started

This notebook is a short summary for getting started with the challenge ( found here ). Below you can find how to download the dataset and also the different labels along with exploring and analyzing the input and output data of the challenge, running a baseline model and creating a submission file to upload to the leaderboard.

dataset:

Three datasets, each constituting a different domain, will be used for this challenge:

The Acevedo_20 dataset with labels

The Matek_19 dataset with labels

The WBC dataset without labels (Used for domain adaptation and performance measurement)

The Acevedo_20 and Matek_19 datasets are labeled and should be used to train the model for the domain generalization task. A small subpart of the WBC dataset, WBC1, will be downloadable from the beginning of the challenge. It is unlabeled and should be used for evaluation and domain adaptation techniques.

A second similar subpart of the WBC dataset, WBC2, will become available for download during phase 2 of the challenge, i.e. on the last day, 24 hours before submissions close.

Goal:

The challenge here is in transfer learning, precisely domain generalization (DG) and domain adaptation (DA) techniques. The focus lies on using deep neural networks to classify single white blood cell images obtained from peripheral blood smears. Tthe goal of this challenge is to achieve a high performance, especially a high f1 macro score, on the WBC2 dataset.

Notes:

This challenge wants to motivate research in domain generalization and adaptation techniques:

To make actual use of deep learning in the medical routine, it is important that the techniques can be used in realistic cases. If a peripheral blood smear is acquired from a patient and classified by a neural network, it is important that this works reliably. But the patient’s blood smear might very likely vary compared to the image domains used as training data of the network, resulting in not trustable results. To overcome this obstacle and build robust domain-invariant classifiers research in domain generalization and adaptation is needed.

f1_score: wikepedia

sklearn.metrics.f1_score(y_true, y_pred, *, labels=None, pos_label=1, average='macro' , sample_weight=None, zero_division='warn')

The formula can be see in click here for the code and is given as

F1 = 2 * (precision * recall) / (precision + recall)

Donwloading the data

code is given in the jupyter notebook. Makesure you adjust the path according to where you download it

Specify the data path and the labels

Data augementation has been made and is saved in the path. Labels are automatically taken from the folder names of withn the dataset different datasets have different concentration of the labels

create a pandas metadata

Use the functions given in the libraries folder

update the pandas file with different statistical qualities of the images

For data visualization, it is helpful to do different statistical measurements, incluidng min, max, std, mean. They are then included with the metadata

Authors

Armin Gruber

Ali Boushehri

Christina Bukas

Dawit Hailu

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
docs		docs
figures		figures
jupyter		jupyter
libraries		libraries
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
train_big_batch.py		train_big_batch.py
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Challenge: Help a Hematologist out!

BLAMAD

Getting started

Donwloading the data

Specify the data path and the labels

create a pandas metadata

update the pandas file with different statistical qualities of the images

Authors

About

Releases

Packages

Languages

License

daveabiy/help_a_hermatologist_out

Folders and files

Latest commit

History

Repository files navigation

Data Challenge: Help a Hematologist out!

BLAMAD

Getting started

Donwloading the data

Specify the data path and the labels

create a pandas metadata

update the pandas file with different statistical qualities of the images

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages