This workflow performs a gene targeted (xander) assembly of protein coding genes (rplB and rpsC included here) on samples, and generate OTU table and taxonomy table for further microbial diversity analysis.
This workflow uses conda as package installation tool and snakemake as workflow managment tool. Users just need to install conda and snakemake (via conda), and snakemake will install all dependecies as part of the workflow.
Install conda:
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
hash -r
conda config --set always_yes yes
conda update -q conda
conda info -a
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
Create an conda environment that has snakemake included (in env/xander.yaml):
conda env create -q --file envs/xander.yaml -n xander
source activate xander
git clone git@github.com:jiarong/rplB-metaG-pipeline.git
cd rplB-metaG-pipeline
Configure the workflow according to your needs via editing the file config.yaml
and the sheets metadata.tsv
.
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally using $N
cores:
snakemake --use-conda --cores $N
Alternatively, it can be run in cluster or cloud environments (see the docs for details).
After successful execution, you will see the OTU table at PROJECT/output/otu/GENE/otutable.tsv
, and taxonomy table at PROJECT/output/tax/GENE/taxonomy.tsv
(PRJECT and GENE are defined in config.yaml
).