I am a PhD Student in Health Data Science at Oxford supervised by Professor Jens Rittscher and funded by Professor Fergus Gleeson. I am focusing on applications of Computer Vision ππ» to improving diagnostics and treatment of patients with lung cancer as part of the DART lung health project ( see my role in the project).
October 2024: my second workshop paper (pre-print π, code π») got the Best Paper Award at DEMI-2024 workshop of MICCAI conferenceπ₯! In our work "Evaluating histopathology foundation models for few-shot tissue clustering: an application to LC25000 augmented dataset cleaning", we (1) create a pipeline for grouping augmented images using foundation models, (2) release the decontaminated version of LC25000 histopathology dataset, and (3) propose a minimal setup benchmark for evaluating pathology foundation models. Cleaned dataset, annotation framework, and evaluation pipeline are available in the LC25000-clean repository.
May 2024: my first main conference paper π (pre-print π, code π») was published at ISBI-2024 conference!π In our work "Accurate Subtyping of Lung Cancers by Modelling Class Dependencies", we (1) construct a weakly-supervised multi-label lung cancer histology dataset from three public (TCGA, TCIA-CPTAC, DHMC), and one in-house dataset DART, (2) propose a class-dependency injection method allowing the learning of robust bag representations suitable for multi-label problems under weakly-supervised settings. Dataset creation, model building, and training code is available in the dependency-mil repository.
September 2022: my first workshop paper π (pre-print π, code π») got published at MICCAI 2022 CaPTion workshop! π In our work "Active Data Enrichment by Learning What to Annotate in Digital Pathology", we (1) proposed a new comprehensive annotation protocol for lung cancer pathology, (2) proposed a new metric for comparing how well a retrieval methods can prioritize examples from underrepresented classes, and (3) demonstrated that annotating and adding top-runked examples into the training set results in greater improvements to the algorithm performance than annotating and adding random examples. Links: published paper, open-access paper, code.
December 2020: my first mini-conference working notes paper π (code π») got published at MediaEval 2020 Multimedia Benchmark workshop π. In our work "Real-Time Polyp Segmentation Using U-Net with IoU Loss" we explored how using a combination of differentiable IoU and BCE losses affects the segmentation performance measured by meanIoU and DiceScore when training a simple U-Net. Links: published open-access paper, code.
Public histology data sources. If you also want to start working with histopathology images, but do not have or are waiting for your own data, consider starting with "Dartmouth Lung Cancer Histology Dataset" DHMC, the "The Cancer Genome Atlas" (TCGA), and "The Cancer Imaging Archive" TCIA-CPTAC. Downloading large volumes of data is not a trivial task, so I documented my process for TCGA-lung-histology-download, TCIA-CPTAC-lung-histology-download.
Public natural images sources. Another thing you can do if you are lacking medical data is to simulate parts of your future workflow on natural images, e.g. classifying medical images for presence or absence of particular patterns can be similar to classifying natural images for presence or absence of particular objects. I used images from the COCO dataset. You can see my work here: GeorgeBatch/cocoapi.
Education
- π Previously, I studied Mathematics and Statistics at Warwick for my Bachelors and did my Masters in Statistics at Oxford.
- π± Separately from my PhD program, I learnt cool CV and NLP techniques taught by the Deep Learning School from MIPT. Both courses are only taught in Russian π·πΊ.
- π I also found the YouTube playlist on Structuring Machine Learning Projects ( Course 3 of the Deep Learning Specialization on Coursera) extremely useful. I started watching the videos to answer some of the questions related to the multilabel classification project I was working on at the time (GeorgeBatch/cocoapi). I liked the explanations and Andrew Ng's delivery style so much that I enrolled and completed the Deep Learning Specialization.
Here are some of the best free online resources to boost your ML/DL knowledge π I am currently doing it, while skipping the repetitive parts β°
-
English π¬π§
- Deep Learning from NYU by Yann LeCun and Alfredo Canziani (next on my list βοΈ)
- CS231n: Convolutional Neural Networks for Visual Recognition from Stanford
- CS224n: Natural Language Processing with Deep Learning from Stanford
-
Russian π·πΊ
- Deep Learning (part 1) similar to CS231n β
- Deep Learning (part 2) similar to CS224n β³
- π« How to reach me
- LinkedIn: george-batchkala π
- My page on DART: george-batchkala π
- GitHub: GeorgeBatch π
- Kaggle: George Batchkala π