Skip to content

ResMiCo SM tutorial

Nick Youngblut edited this page May 18, 2022 · 8 revisions

ResMiCo-SM tutorial

Description

ResMiCo-SM can be used for the following applications:

  • generate synthetic training and test datasets, as used for ResMiCo training/testing
  • creating feature tables for real datasets, which can then be used for ResMiCo contig misassembly prediction

ResMiCo-SM utilizes snakemake for straight-forward large-scale dataset generation on high performance computational infrastructures.

For general info on running ResMiCo-SM, see the README.

Install

See the ResMiCo-SM README.

Creating a training dataset

Reference genomes

Download a set of 10 reference genomes:

wget http://ftp.tue.mpg.de/ebio/projects/ResMiCo/genomes_n10.tar.gz
tar -pzxvf genomes_n10.tar.gz && rm -f genomes_n10.tar.gz

Config

The config.yaml in the ResMiCo-SM base directory should already be configured properly:

# Input table
## Table of genomes
genomes_file: genomes_n10/genomes.tsv

# Output directory
output_dir: tests/output/n10/

# Temporary output directory (/dev/shm/ for shared memory)
tmp_dir: /tmp/

[...]

You may want to change the tmp_dir: or output_dir: paths.

By default, the config.yaml is set to run many combinations of simulation parameters (see params:), such as:

  • community richness
  • community abundance distribution
  • read lengths (bp)
  • sequencing depths (no. paired-end reads)
  • metagenome assemblers

You may want to reduce the number of parameters to speed up the testing. For example, change:

    reads:
      length:
        - 100
        - 150
      depth:
        - 1000000
        - 4000000

to the following:

    reads:
      length:
        - 150
      depth:
        - 1000000

Run

To get a preview of the ResMiCo-SM run:

snakemake --use-conda -j 4 -Fqn

See snakemake -h for info on the parameters used (e.g., -Fqn).

Make sure that the appropriate conda environment is activated in order to use snakemake!

To run the workflow:

snakemake --use-conda -j 4 -F

Output

See the ResMiCo-SM README for info on the output.

Generate feature table for a real metagenome dataset

MAGs & associated reads

Download an example dataset of MAGs from the UHGG and associated metagenome read files (Illumina paired-end reads).

wget http://ftp.tue.mpg.de/ebio/projects/ResMiCo/UHGG_n9.tar.gz
tar -pzxvf UHGG_n9.tar.gz && rm -f UHGG_n9.tar.gz

Config

Update the config.yaml file in the ResMiCo-SM base directory:

# Input table
## Table of genomes
genomes_file: UHGG_n9/genomes.tsv

# Output directory
output_dir: tests/output/UHGG_n9/

# Temporary output directory (/dev/shm/ for shared memory)
tmp_dir: /tmp/

[...]

You may want to change the tmp_dir: or output_dir: paths.

Only some of the parameters matter for generating feature tables from real data (versus simulating datasets); see the ResMiCo-SM README for more info.

Run

To get a preview of the ResMiCo-SM run:

snakemake --use-conda -j 4 -Fqn

See snakemake -h for info on the parameters used (e.g., -Fqn).

Make sure that the appropriate conda environment is activated in order to use snakemake!

To run the workflow:

snakemake --use-conda -j 4 -F

Output

See the ResMiCo-SM README for info on the output.

The feature tables (specifically the feature_files.tsv file) can be used for misassembly prediction via ResMiCo.