Skip to content

Preparing the input files

Charlotte Soneson edited this page Mar 6, 2019 · 10 revisions

Two types of input files are required for running the ARMOR workflow:

The compressed FASTQ files containing the sequencing reads

The Snakefile assumes that the FASTQ files are named according to the pattern:

  • <sample-name>.<fqsuffix>.gz for single-end reads
  • <sample-name>_<fqext1>.<fqsuffix>.gz and <sample-name>_<fqext2>.<fqsuffix>.gz for paired-end reads.

Please provide the path to the directory containing the FASTQ files, and the values of <fqsuffix> (and <fqext1>/<fqext2> if you have paired end reads) in the config.yaml file. See The config.yaml configuration file for more details.

A metadata text file containing all information about the samples

The metadata file should be a tab-separated text file, with at least two columns:

  • one named names, which contains all the values of <sample-name> from the FASTQ files
  • one named type which is either SE or PE depending on whether the samples were obtained with a single-end or paired-end protocol.

In addition, any number of columns can be included and used later in the analysis. All variables required for the differential expression analysis should be included as columns in the metadata text file. An example of a metadata text file can be seen here.