-
Notifications
You must be signed in to change notification settings - Fork 1
ResMiCo SM tutorial
- Description
- Install
- Creating a training dataset
- Generate feature table for a real metagenome dataset
ResMiCo-SM can be used for the following applications:
- generate synthetic training and test datasets, as used for ResMiCo training/testing
- creating feature tables for real datasets, which can then be used for ResMiCo contig misassembly prediction
ResMiCo-SM utilizes snakemake for straight-forward large-scale dataset generation on high performance computational infrastructures.
For general info on running ResMiCo-SM, see the README.
See the ResMiCo-SM README.
Download a set of 10 reference genomes:
wget http://ftp.tue.mpg.de/ebio/projects/ResMiCo/genomes_n10.tar.gz
tar -pzxvf genomes_n10.tar.gz && rm -f genomes_n10.tar.gz
The config.yaml
in the ResMiCo-SM base directory should already be configured properly:
# Input table
## Table of genomes
genomes_file: genomes_n10/genomes.tsv
# Output directory
output_dir: tests/output/n10/
# Temporary output directory (/dev/shm/ for shared memory)
tmp_dir: /tmp/
[...]
You may want to change the tmp_dir:
or output_dir:
paths.
By default, the config.yaml is set to run many combinations of simulation parameters (see params:
), such as:
- community richness
- community abundance distribution
- read lengths (bp)
- sequencing depths (no. paired-end reads)
- metagenome assemblers
You may want to reduce the number of parameters to speed up the testing. For example, change:
reads:
length:
- 100
- 150
depth:
- 1000000
- 4000000
to the following:
reads:
length:
- 150
depth:
- 1000000
To get a preview of the ResMiCo-SM run:
snakemake --use-conda -j 4 -Fqn
See snakemake -h
for info on the parameters used (e.g., -Fqn
).
Make sure that the appropriate conda environment is activated in order to use snakemake!
To run the workflow:
snakemake --use-conda -j 4 -F
See the ResMiCo-SM README for info on the output.
Download an example dataset of MAGs from the UHGG and associated metagenome read files (Illumina paired-end reads).
wget http://ftp.tue.mpg.de/ebio/projects/ResMiCo/UHGG_n9.tar.gz
tar -pzxvf UHGG_n9.tar.gz && rm -f UHGG_n9.tar.gz
Update the config.yaml
file in the ResMiCo-SM base directory:
# Input table
## Table of genomes
genomes_file: UHGG_n9/genomes.tsv
# Output directory
output_dir: tests/output/UHGG_n9/
# Temporary output directory (/dev/shm/ for shared memory)
tmp_dir: /tmp/
[...]
You may want to change the tmp_dir:
or output_dir:
paths.
Only some of the parameters matter for generating feature tables from real data (versus simulating datasets); see the ResMiCo-SM README for more info.
To get a preview of the ResMiCo-SM run:
snakemake --use-conda -j 4 -Fqn
See snakemake -h
for info on the parameters used (e.g., -Fqn
).
Make sure that the appropriate conda environment is activated in order to use snakemake!
To run the workflow:
snakemake --use-conda -j 4 -F
See the ResMiCo-SM README for info on the output.
The feature tables (specifically the feature_files.tsv
file) can be used for misassembly prediction via ResMiCo.