microDELTA is a deep learning method for tracing longitudinal changes in the human gut microbiome. The method is based on Neural Networks and Transfer Learning, and can be used to model dynamic patterns of gut microbial communities at different life stages, including infancy, middle age, and old age.
The overall framework of microDELTA. a. Base model is a Neural Network model that has been built based on thousands to millions of samples collected from public databases. b. Transfer model is built based on transferring the knowledge from the base model into context, by means of using a small proportion of samples in the context. c. In transfer step, microDELTA adapts base model to newly introduced context and reinitialize the contextual layers. In adaption step, microDELTA optimizes the parameters of contextual layers rapidly. In fine-tuning step, microDELTA further optimizes the parameters of whole network and then outputs the transfer model. d. For many samples in the context, the transferred model could be used to determine the life trajectory of the host such as age, disease status, etc.
The microDELTA method is based on EXPERT. Install the enviroment via conda:
conda env create -f enviroment.yaml
Use for the first time, you need to init EXPERT and install NCBI taxonomy database:
conda activate microDELTA # activate conda enviroment
expert init # init EXPERT
To run microDELTA analysis easily, just use python microDELTA.py
and set the parameters like:
python microDELTA.py -O overall_status.txt \
-l label.csv \
-S source.tsv \
-Q query.tsv \
-m base_model_directory \
-o output_directory \
-O
: A txt
file contains the overall status of the hosts. The content in this file include the class of host status like:
root:status1
root:status2
-l
: A csv
file contains the label of each host. The first column named SampleID
contains the index of each host and the second column named Env
contains the status of each host like:
SampleID | Env |
---|---|
host1 | status1 |
host2 | status2 |
host3 | status3 |
... | ... |
-S
and -Q
: Two tsv
files contain the abundance of gut microbial communities of training sample and testing sample. The columns represent hosts and the rows represents features. We consider SourceCM.tsv
for training and QueryCM.tsv
for validating. The format of these tsv
files looks like:
#OTU ID | host1 | host2 | ... |
---|---|---|---|
microbe1 | 0.01 | 0.05 | ... |
microbe2 | 0 | 0.02 | ... |
... | ... | ... | ... |
-m
: Base model directory. If not specified, an independent model will be trained. We put our base model in this directory aging/mst/model/base_model
.
-o
: Output directory to store the result.
We take a experiment of the Chinese traveler cohort as an example which consider the samples from traveler "MT1" as query and other samples as source to describe the pipline of microDELTA. More information about this cohort please refer to here.
You can perform this analysis via microDELTA.py and set the parameters as below:
python microDELTA.py -O traveler/microbiomes.txt \
-l traveler/experiments_repeat/exp_1/SourceMapper.csv \
-S traveler/experiments_repeat/exp_1/SourceCM.tsv \
-Q traveler/experiments_repeat/exp_1/QueryCM.tsv \
-m aging/mst/model/base_model \
-o traveler/experiments_repeat/exp_1
You can also perform this analysis by EXPERT along the microDELTA pipeline as below:
The microDELTA pipeline includes several steps. First, the ontology of the gut microbiome is constructed by creating a hierarchy of host statuses. This step is performed using the expert construct
command, which takes as input a text file microbiomes.txt containing the host statuses and produces an ontology file in the form of a pickle object.
expert construct -i traveler/microbiomes.txt \
-o traveler/ontology.pkl
Next, the abundance data are converted into a format that can be used by the model. This is done using the expert convert
command, which takes as input a directory containing the input files (SourceCM.tsv and QueryCM.tsv) and produces a binary data file in the h5 format.
ls traveler/experiments_repeat/exp_1/SourceCM.tsv > tmp
expert convert -i tmp --in-cm -o traveler/experiments_repeat/exp_1/SourceCM.h5
ls traveler/experiments_repeat/exp_1/QueryCM.tsv > tmp
expert convert -i tmp --in-cm -o traveler/experiments_repeat/exp_1/QueryCM.h5
rm tmp
The status of each sample can be mapped to the ontology using the expert map
command. This step associates each sample in the input data (SourceMapper.csv with a specific host status, based on the ontology.
expert map --to-otlg -t traveler/ontology.pkl \
-i traveler/experiments_repeat/exp_1/SourceMapper.csv \
-o traveler/experiments_repeat/exp_1/SourceLabels.h5
With the input data prepared, the model can be trained using the expert transfer
command. This step uses Transfer Learning to fine-tune a pre-trained base model to the specific input data. The resulting model can then be used to make predictions about the gut microbiome of new hosts.
expert transfer -i traveler/experiments_repeat/exp_1/SourceCM.h5 \
-t traveler/ontology.pkl \
-l traveler/experiments_repeat/exp_1/SourceLabels.h5 \
-o traveler/experiments_repeat/exp_1/Transfer_DM \
-m aging/mst/model/base_model --finetune --update-statistics
To performance Neural Network method, use
expert train
to train a independent model by source data.
expert train -i traveler/experiments_repeat/exp_1/SourceCM.h5 -t traveler/ontology.pkl \
-l traveler/experiments_repeat/exp_1/SourceLabels.h5 \
-o traveler/experiments_repeat/exp_1/NN
Once the model is trained, it can be used to make predictions about the gut microbiome of new hosts using the expert search command. This step takes as input the trained model and a set of unseen test data, and produces predictions about the gut microbiome of the test data.
expert search -i traveler/experiments_repeat/exp_1/QueryCM.h5 \
-m traveler/experiments_repeat/exp_1/Transfer_DM \
-o traveler/experiments_repeat/exp_1/Search_Transfer_DM