This work is published in the [American Journal of Pathology].
@article {Gindra2024,
author = {Rushin H. Gindra and Yi Zheng and Emily J. Green and Mary E. Reid and Sarah A. Mazzilli and Daniel T. Merrick and Eric J. Burks and Vijaya B. Kolachalama and Jennifer E. Beane},
title = {Graph perceiver network for lung tumor and bronchial premalignant lesion stratification from histopathology},
year = {2024},
doi = {10.1016/j.ajpath.2024.03.009},
URL = {https://doi.org/10.1016/j.ajpath.2024.03.009},
journal = {American Journal of Pathology}
}
Key Ideas & Main Findings
We hypothesize that computational methods can help capture tissue heterogeneity in histology whole slide images (WSIs) and stratify PMLs by histologic severity or their ability to progress to invasive carcinoma, providing an informative pipeline for assessment of premalignant lesions. Graph Perceiver Networks is a generalized architecture that integrates the graph module with the perceiver architecture, enabling sparse graph computations on visual tokens and computationally efficient modeling of the perceiver. The architecture reduces the computational footprint significantly compared to state-of-the-art WSI analysis architectures, thus allowing for extremely large WSIs to be processed efficiently.
As a bonus, the architecture is explainable and can be trained on large batch-sizes without the extreme computational head, thus making it a suitable candidate for further academic research lab centric projects. Based on Pytorch and Pytorch-Geometric.
- Pre-requisites and Installations
- Data Download and Preprocessing
- Data Tree Structure
- Pretrained Model Weights and Training Instructions
- Evaluation and Testing
- Explanatory Heatmaps
- Contact for Issues
- Acknowledgements, License, and Usage
Please follow this GitHub for more updates.
- [] Remove dead code in Repository
- [] Pre-requisites and installations (Conda env & docker container)
- [] Add data downlaod + preprocessing steps (python file)
- [] Also add data tree structure (For easy understanding)
- [] Add pretrained model weights + instructions for training & evaluation (python files)
- [] Add code for K-NN evaluation (jupyter notebook)
- [] Add code for visualization (jupyter notebook)
- [] Explanatory Heatmaps
- Contact for Issues
- Acknowledgements, License & Usage
Conda installation and Potentially a docker container at some point.
Resections
- TCGA-[LUAD|LUSC]: To download Tissue Slides WSIs (formatted as .svs files) and associated clinical metadata, please refer to the NIH Genomic Data Commons Data Portal. WSIs for each cancer type can be downloaded using the GDC Data Transfer Tool.
- CPTAC-[LUAD|LSCC]: To download the WSIs (formatted as .svs files) from the discovery cohort, and the associated clinical metaata, please refer to the The Cancer Imaging Archive Portal. WSIs from CPTAC can be downloaded using TCIA_Utils
Biopsies
- UCL: Lung biopsy samples from University College London. To download the biopsy WSIs (formatted as .ndpi) and associated clinical metadata, please refer to the Imaging Data Resources repository, IDR 0082. WSIs from repository can be downloaded using Aspera protocol
- Roswell: Lung biopsy samples from Roswell park comprehensive cancer institute.
Example Directory
└──TCGA_ROOT_DIR/
└── TCGA_train.txt
└── TCGA_test.txt
└── TCGA_plot.txt
└── WSIs/
├── slide_1.svs
├── slide_2.svs
└── ...
└── ctranspath_pt_features/
└── slide_1/
├── adj_s_ei.pt
├── adj_s.pt
├── c_idx.txt
├── edge_attr.pt
└── features.pt
└── slide_2/
└── ...
└── patches256
└── slide_1/
├── 20.0
├── x_y.png
└── ...
└── slide_2
└── ...
└──CPTAC_ROOT_DIR/
└── CPTAC_test.txt
└── CPTAC_plot.txt
└── WSIs/
└── ctranspath_pt_features/
└── patches256
└── ...
For preprocessing (patching, feature extraction and graph construction), see preprocessing/graph_construction.py
You can train your model on a multi-centric dataset with the following k-fold cross validation (k=5) scheme where --
(train) **
(val), and ##
(test).
[--|--|--|**|##]
[--|--|**|##|--]
[--|**|##|--|--]
[**|##|--|--|--]
[##|--|--|--|**]
Models were trained for 30 epochs with a batch size of 8 using 5-fold cross-validation. These models were evaluated using an internal TCGA test set per fold and the CPTAC external test set.
GPU Hardware used for training: Nvidia GeForce RTX 2080ti - 11GB.
Note: Ideally, longer training with larger batch sizes would demonstrate larger gains in the models performance.
- Links to download pretrained model weights.
Arch | SSL Method | Dataset | Epochs | Cross-Attn-Nodes | Performance(Acc) | Download |
---|---|---|---|---|---|---|
Graph Perceiver Network | CTransPath | TCGA | 30 | 200 | N/A | N/A |
Graph Perceiver Network | SimCLR-Lung | NLST-TMA | 30 | 200 | N/A | N/A |
- Instructions for training and evaluating the models (include python files).
- TCGA (internal test set)
- CPTAC (external test set)
- K-NN evaluation: Description of the K-NN evaluation process. Link or embedded Jupyter notebook for K-NN evaluation
- Details about the visualization techniques used.
- Link or embedded Jupyter notebook for visualization.
- Please open new threads or report issues directly (for urgent blockers) to rushin.gindra@helmholtz-munich.de Immediate response to minor issues may not be available.
- Credits and acknowledgements.
- License information.
- Usage guidelines and restrictions.