hist2RNA: Predicting Gene Expression from Histopathology Images [Paper]
hist2RNA is an efficient deep learning-based project that aims to predict gene expression from breast cancer histopathology images. This project employs a efficient architecture to unlock underlying genetic expression in breast cancer.
- A state-of-the-art deep learning model tailored for breast cancer histopathology images
- Efficient prediction of gene expression from histopathology images which means less training time
- User-friendly command-line interface
- Comprehensive documentation and tutorials
The following data sources have been used in this project:
- Genetic Data:
- Diagnostic Slide (DS): GDC Data Portal
- DS Download Guideline: Download TCGA Digital Pathology Images (FFPE)
- Python 3.9+
- Pytorch 2.0
-
Python implementation: Normalizing H&E Images
-
Actual Matlab implementation: Staining Normalization
-
Reference: Macenko et al. (2009) - A method for normalizing histology slides for quantitative analysis
-
Clone the repository:
git clone https://github.com/raktim-mondol/hist2RNA.git
-
Change directory to the cloned repository:
cd hist2RNA
-
Install the required packages:
pip install -r requirements.txt
-
Train the model:
python training_main.py --slides_dir ./data/slides/ --epochs 50 --batch_size 12 --lr 0.001
- Test the model:
python test_main.py --test_patient_id ./patient_details/test_patient_id.txt --checkpoint_file ./models/hist2RNA_model.pth
For most efficient way, use following code:
python step_1_feature_extraction.py
Then,
python step_2_model_training_.py
For detailed usage instructions, please refer to the documentation.
The following results show predictions for the PAM50 genes from histopathology test datatest images:
It leverages the overall patterns of gene expression for each patient. This allows for a more holistic understanding of gene behavior across the population.
This analysis focuses on the expression patterns of each gene individually. This reveals the significant variability in gene expression among different patients, which can lead to lower correlation coefficients.
We welcome contributions to improve and expand the capabilities of hist2RNA! Please follow the contributing guidelines to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this code useful in your research, please consider citing:
@Article{cancers15092569,
AUTHOR = {Mondol, Raktim Kumar and Millar, Ewan K. A. and Graham, Peter H. and Browne, Lois and Sowmya, Arcot and Meijering, Erik},
TITLE = {hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images},
JOURNAL = {Cancers},
VOLUME = {15},
YEAR = {2023},
NUMBER = {9},
ARTICLE-NUMBER = {2569},
URL = {https://www.mdpi.com/2072-6694/15/9/2569},
ISSN = {2072-6694},
DOI = {10.3390/cancers15092569}
}