Skip to content

Overview

supermaxiste edited this page Feb 19, 2020 · 9 revisions

Data requirements

ARPEGGIO is a Snakemake workflow to analyze Whole Genome Bisulfite Sequencing (WGBS) data coming from allopolyploid species. In order to use this workflow you will need the following:

  1. WGBS data (Paired-end or Single-end) from:
    a. An allopolyploid species and its two parental species OR
    b. An allopolyploid species in two different conditions/treatments OR
    c. A diploid species in two different conditions/treatments

Note: you will always need at least two samples per condition/species in order to obtain differentially methylated regions (DMRs).

  1. The assembled genomes from:
    1a. and 1b. The two allopolyploid's parent species
    1c. The diploid species

And that's it!

System requirements

ARPEGGIO was developed for linux systems and was tested on Debian and Ubuntu. Windows and macOS are not supported.

Skill requirements

Command line basics (cd, mv, cp, vim or nano, etc.) and patience :)

ARPEGGIO overview and aim

ARPEGGIO includes 7 main steps to analyze WGBS data and it's up to the user to select the desired steps:

  1. Conversion check
  2. Quality check
  3. Trimming
  4. Alignment
  5. Read sorting
  6. DMR analysis
  7. Downstream analyses

Next section

  1. Input files

Wiki index

Basic steps to run & get an idea of the workflow:

  1. Overview
  2. Input files
  3. Configuration file
  4. Running Snakemake
  5. Output structure

Advanced information to better understand & modify the workflow:

  1. Snakefile (rules and relationships)
  2. Scripts
  3. Software and tools