Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning
RESEPT
is a deep-learning framework for characterizing and visualizing tissue architecture from spatially resolved transcriptomics.
Given inputs as gene expression or RNA velocity, RESEPT
learns a three-dimensional embedding with a spatial retained graph neural network from the spatial transcriptomics. The embedding is then visualized by mapping as color channels in an RGB image and segmented with a supervised convolutional neural network model for inferring the tissue architecture accurately.
Documentation: https://resept.readthedocs.io/
RESEPT
was trained on a workstation with a 64-core CPU, 20G RAM, and a GPU with 11G VRAM. The function of customizing the segmentation model only can run on GPU device now. Other functions for RESEPT need the minimum requirements of a CPU with 8 cores and 8G RAM.
RESEPT
can run on Linux. The package has been tested on the following systems:
- Linux: Ubuntu 20.04
RESEPT
mainly depends on the Python (3.6+) scientific stack.
scipy==1.6.2
networkx==2.5.1
opencv_contrib_python==4.5.1.48
tqdm==4.60.0
scikit_image==0.18.1
numpy==1.19.2
umap_learn==0.5.1
six==1.15.0
matplotlib==3.3.4
terminaltables==3.1.0
torch==1.5.0
scanpy==1.7.2
statsmodels==0.12.2
requests==2.25.1
munkres==1.1.4
mmcv_full==1.3.0
rpy2==3.1.0
pandas==1.2.3
numba==0.53.1
seaborn==0.11.1
anndata==0.7.6
cityscapesscripts==2.2.0
leidenalg==0.8.7
Pillow==8.3.1
python_igraph==0.9.6
scikit_learn==0.24.2
umap==0.1.1
-
Install
PyTorch 1.5.0
following the official guide. -
Install
mmcv-full 1.3.0
by running the following command:pip install mmcv-full==1.3.0 -f https://download.openmmlab.com/mmcv/dist/${CUDA}/torch1.5.0/index.html
where ${CUDA} should be replaced by the specific CUDA version (cpu, cu92, cu101, cu102).
-
Install other dependencies:
pip install -r requirements.txt
The above steps take 20-25 mins to install all dependencies.
git clone https://github.com/OSU-BMBL/RESEPT
cd RESEPT
- gene expression file: A HDF5 file stores raw gene expression data.
- tissue_positions_list file: A csv file contains meta information of spots including their connectivity and spatial coordinates.
- scalefactors_json file: A json file collects the scaling factors converting spots to different resolutions.
More details can be found here.
An annotation file should include spot barcodes and their corresponding annotations. It is used for evaluating predictive tissue architectures (e.g., ARI) and training user's segmentation models. The file should be named as:[sample_name]_annotation.csv. [example]
It is a pre-trained segmentation model file in the pth format, which should be provided in predicting the tissue architecture on the generative images.
The data schema to run our code is as follows:
[sample_name]/
|__spatial/
| |__tissue_positions_list file
| |__scalefactors_json file
|__gene expression file
|__annotation file: [sample_name]_annotation.csv (optional)
model/ (optional)
|__segmentation model file
The data schema to customize our segmentation model is as follows:
[training_data_folder]
|__[sample_name_1]/
| |__spatial/
| | |__tissue_positions_list file
| | |__scalefactors_json file|
| |__gene expression file
| |__annotation file: [sample_name_1]_annotation.csv
|__[sample_name_2]/
| |__spatial/
| | |__tissue_positions_list file
| | |__scalefactors_json file|
| |__gene expression file
| |__annotation file: [sample_name_2]_annotation.csv
| ...
|__[sample_name_n]/
| |__spatial/
| | |__tissue_positions_list file
| | |__scalefactors_json file|
| |__gene expression file
| |__annotation file: [sample_name_n]_annotation.csv
Run the following command line to construct RGB images based on gene expression from different embedding parameters. For demonstration, please download the example data from here and put the unzip folder '151669' in the source code folder.
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip
unzip 151669.zip
python RGB_images_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5 -meta 151669/spatial/tissue_positions_list.csv -scaler 151669/spatial/scalefactors_json.json -output Demo_result -embedding scGNN -transform logcpm
- -expression file path for raw gene expression data. [type: str]
- -meta file path for spatial meta data recording tissue positions. [type: str]
- -scaler file path for scale factors. [type: str]
- -output output root folder. [type: str]
- -embedding embedding method in use: scGNN, spaGCN, UMAP or SEDR. [type: str] [default: scGNN]
- -transform data pre-transform method: log, logcpm or None. [type: str] [default: logcpm]
RESEPT
stores the generative results in the following structure:
Demo_result/
|__RGB_images/
- The folder 'RGB_images' stores generated RGB images of tissue architectures from different embedding parameters.
This demo takes 25-30 mins to generate all results on the machine with a 64-core CPU.
Run the following command line to construct RGB images based on gene expression from different embedding parameters, segment the constructed RGB images to tissue architectures with top-5 Moran's I, and evaluate the tissue architectures (e.g., ARI). For demonstration, please download the example data from here and the pretrained model from here. Then put unzip folders '151669' and 'model_151669' in the source code folder.
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip
unzip 151669.zip
unzip model_151669.zip
python evaluation_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5 -meta 151669/spatial/tissue_positions_list.csv -scaler 151669/spatial/scalefactors_json.json -k 7 -label 151669/151669_annotation.csv -model model_151669/151669_scGNN.pth -output Demo_result_evaluation -embedding scGNN -transform logcpm -device cpu
- -expression file path for raw gene expression data. [type: str]
- -meta file path for spatial meta data recording tissue positions. [type: str]
- -scaler file path for scale factors. [type: str]
- -k the number of tissue architectures(setting -1 will recommend a K for you). [type: int] [default: 7]
- -label file path for labels recording spot barcodes and their annotations for calculating evaluation metrics. [type: str]
- -model file path for pre-trained model. [type: str]
- -output output root folder. [type: str]
- -embedding embedding method in use: scGNN, spaGCN, UMAP or SEDR. [type: str] [default: scGNN]
- -transform data pre-transform method: log, logcpm or None. [type: str] [default: logcpm]
- -device cpu/gpu device option: cpu or gpu. [type: str] [default: cpu]
RESEPT
stores the generated results in the following structure:
Demo_result_evaluation/
|__RGB_images/
|__segmentation_evaluation/
|__segmentation_map/
|__top5_evaluation.csv
|__predicted_tissue_architecture.csv
- The folder 'RGB_images' contains the generated RGB images of tissue architectures from different embedding parameters.
- The folder 'segmentation_map' stores the predicted tissue architectures with top-5 Moran's I.
- The file 'top5_evaluation.csv' records various evaluation metrics corresponding to the tissue architectures.
- The file 'predicted_tissue_architecture.csv' records the predicted tissue architecture values of spots with top-5 Moran's I.
This demo takes 30-35 mins to generate all results on the machine with a 64-core CPU.
Run the following command line to generate RGB images based on gene expression from different embedding parameters and predict tissue architectures with top-5 Moran's I. For demonstration, please download the example data from here and the pre-trained model from here. Then put unzip folders '151669' and 'model_151669' in the source code folder.
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/151669.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip
unzip model_151669.zip
unzip 151669.zip
python test_pipeline.py -expression 151669/151669_filtered_feature_bc_matrix.h5 -meta 151669/spatial/tissue_positions_list.csv -scaler 151669/spatial/scalefactors_json.json -k 7 -model model_151669/151669_scGNN.pth -output Demo_result_tissue_architecture -embedding scGNN -transform logcpm -device cpu
- -expression file path for raw gene expression data. [type: str]
- -meta file path for spatial meta data recording tissue positions. [type: str]
- -scaler file path for scale factors. [type: str]
- -k the number of tissue architectures(setting -1 will recommend a K for you). [type: int] [default: 7]
- -model file path for pre-trained model. [type: str]
- -output output root folder. [type: str]
- -embedding embedding method in use: scGNN, spaGCN, UMAP or SEDR. [type: str] [default: scGNN]
- -transform data pre-transform method: log, logcpm or None. [type: str] [default: logcpm]
- -device cpu/gpu device option: cpu or gpu. [type: str] [default: cpu]
RESEPT
stores the generative results in the following structure:
Demo_result_tissue_architecture/
|__RGB_images/
|__segmentation_test/
|__segmentation_map/
|__top5_MI_value.csv
|__predicted_tissue_architecture.csv
- The folder 'RGB_images' contains the generated RGB images of tissue architectures from different embedding parameters.
- The folder 'segmentation_map' stores the predicted tissue architectures with top-5 Moran's I.
- The file 'top5_MI_value.csv' records Moran's I value corresponding to the tissue architectures.
- The file 'predicted_tissue_architecture.csv' records the predicted tissue architecture values of spots with top-5 Moran's I.
This demo takes 30-35 mins to generate all the results on the machine with a 64-core CPU.
RESEPT
allows to segment a histological image according to predicted tissue architectures. It may help pathologists to focus on specific functional zonation. Run the following command line to predict tissue architectures with top-5 Moran's I and segment the histological image accordingly. For demonstration, please download the example data from here and the pre-trained model from here. Then put unzip folders 'cancer' and 'model_cancer' in the source code folder.
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/cancer.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_cancer.zip
unzip cancer.zip
unzip model_cancer.zip
python histological_segmentation_pipeline.py -expression ./cancer/Parent_Visium_Human_Glioblas_filtered_feature_bc_matrix.h5 -meta ./cancer/spatial/tissue_positions_list.csv -scaler ./cancer/spatial/scalefactors_json.json -k 7 -model ./model_cancer/cancer_model.pth -histological ./cancer/Parent_Visium_Human_Glioblast.tif -output Demo_result_HistoImage -embedding spaGCN -transform logcpm -device cpu
- -expression file path for raw gene expression data. [type: str]
- -meta file path for spatial meta data recording tissue positions. [type: str]
- -scaler file path for scale factors. [type: str]
- -k the number of tissue architectures(setting -1 will recommend a K for you). [type: int] [default: 7]
- -model file path for pre-trained model. [type: str]
- -histological file path for the corresponding histological image.[type: str]
- -output output root folder. [type: str]
- -embedding embedding method in use: scGNN, spaGCN, UMAP or SEDR. [type: str] [default: spaGCN]
- -transform data pre-transform method: log, logcpm or None. [type: str] [default: logcpm]
- -device cpu/gpu device option: cpu or gpu. [type: str] [default: cpu]
RESEPT
stores the generative results in the following structure:
Demo_result_HistoImage/
|__RGB_images/
|__segmentation_test/
| |__segmentation_map/
| |__top5_MI_value.csv
| |__predicted_tissue_architecture.csv
|__histological_segmentation/
|__category_1.png
|__category_2.png
…
|__category_n.png
- The folder 'RGB_images' stores generated RGB images of tissue architectures from different embedding parameters.
- The folder 'segmentation_map' provides predicted tissue architectures with top-5 Moran's I.
- The file 'top5_MI_value.csv' records Moran's I value corresponding to the tissue architectures.
- The file 'predicted_tissue_architecture.csv' records the predicted tissue architecture values of spots with top-5 Moran's I.
- The file 'category_
n
.png' refers to the histological image segmentation results, wheren
denotes the segmentation number.
This demo takes 30-35 mins to generate all results on the machine with the multi-core CPU.
RESEPT
supports fine-tuning our segmentation model by using users' 10x Visium data. Organize all samples and their annotations according to our pre-defined data schema and download our pre-trained model from here as a training start point. Each sample for the training model should be placed in an individual folder with a specific format (the folder structure can be found here). Then gather all the individual folders into one main folder (e.g., named “training_data_folder”). For demonstration, download the example training data from here, and then run the following command line to generate the RGB images of your own data and customized model.
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/model_151669.zip
wget https://bmblx.bmi.osumc.edu/downloadFiles/GitHub_files/training_data_folder.zip
unzip model_151669.zip
unzip training_data_folder.zip
python training_pipeline.py -data_folder training_data_folder -output Demo_result_model -embedding scGNN -transform logcpm -model model_151669/151669_scGNN.pth
- -data_folder a folder provides all training samples. The data including label file of each sample should follow our pre-defined schema in a sub-folder under this folder. [type: str]
- -model file path for pre-trained model file. [type: str]
- -output output root folder. [type: str]
- -embedding embedding method in use: scGNN, spaGCN, UMAP or SEDR. [type: str] [default: scGNN]
- -transform data pre-transform method: log, logcpm or None. [type: str] [default: logcpm]
RESEPT
stores the generative results in the following structure:
Demo_result_model/
|__RGB_images/
work_dirs/
|__config/
|__fine_tune_model.pth
- The folder 'RGB_images' contains generated RGB images of tissue architectures of all input 10x data from different embedding parameters.
- The file 'fine_tune_model.pth' is the customized model.
This demo takes about 3-5 hours to generate the model on the machine with 11G VRAM GPU.
- opencv - The image processing library used
- pytorch - The deep learning backend used
- scikit-learn - The machine learning library used
- mmSegmentation - The image segmentation library used
This project is licensed under the MIT License - see the LICENSE.md file for details
if you use RESEPT
, please cite our paper:
@article {Chang2021.07.08.451210,
author = {Chang, Yuzhou and He, Fei and Wang, Juexin and Chen, Shuo and Li, Jingyi and Liu, Jixin and Yu, Yang and Su, Li and Ma, Anjun and Allen, Carter and Lin, Yu and Sun, Shaoli and Liu, Bingqiang and Otero, Jose and Chung, Dongjun and Fu, Hongjun and Li, Zihai and Xu, Dong and Ma, Qin},
title = {Define and visualize pathological architectures of human tissues from spatially resolved transcriptomics using deep learning},
elocation-id = {2021.07.08.451210},
year = {2021},
doi = {10.1101/2021.07.08.451210},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.08.451210},
eprint = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.08.451210.full.pdf},
journal = {bioRxiv}
}