Team 3 Genome Assembly

This is a pipeline for assembling Listeria monocytogenes genomes.

Usage

python assembly_pipeline.py -i /home/projects/group-c/data -a SPADES

-a: --assembler. Choose the assembler to use: SPADES or SKESA. Default SPADES.
-i: --input_data_directory. Path to raw data.
--html: Output HTML reports in fastp.

The assembly_pipeline.py script takes in a single parameter, data_directory, which is a path to the raw fastq data. We assume that the input directory contains files that fit the pattern *.f*. We further assume that the read 1 and read 2 files for each sample start with the same sample ID, and the read 1 files end with *1.f* and the read 2 files end with *2.f*.

Quality Control and Trimming

Fastp (v 0.20.0)
MultiQC (v 1.8)

Quality Control

Fastp is a tool that combines both quality control and trimming into a single, rapidly-implemented step, increasing the speed and usability of our pipeline. Since we had 50 input files in our pipeline, we used MultiQC to consolidate the 50 separate quality control reports generated by fastp into a single report.

The output of the quality control step of the pipeline is a PDF report generated by MultiQC that displays the quality of all of the samples run through the pipeline.

Trimming Parameters

The following arguments were supplied to fastp in order to trim our data: -f 5 -F 30 -t 10 -e 28 -c -5 3 -M 27.

-f 5 - globally trims 5 bases from 5' end of mate 1
-F 30 - globally trims 30 bases from 5' end of mate 2
-t 10 - globally trims 10 bases from 3' end of both mates
-e 28 - discards reads with an average quality score under 28
-c - turns on paired-end base correction, which slightly increased the quality of the 3' end of mate 2
-5 5 - turns on sliding window trimming from 5' end with a window size of 5
-M 27 - sets a quality threshold of 27 for sliding window

Genome Assembly

SPAdes (v3.13.0)

-careful - tries to reduce the number of mismatches and short indels

SKESA (v2.3.0)

-Uses default setting.

Plasmid Assembly

plasmidSPAdes (v3.13.0)

generates plasmid assembly using --plasmid flag from SPAdes by default

Assembly Quality

Quast (v5.0.2)

Generates a combined quality report forN50 Quality metrics using Quast and generates a combined tsv file for all the samples as well as html files for easy viewing.

BUSCO (v4.0.2)

Generates Completeness score by benchmarking Universal Single Copy Orthologs with breaking it down in Single copy BUSCOs, double copy BUSCOs, missing and fragmented BUSCOs for each sample

Authors

Deepali Kundnani

Aparna Maddala

Swetha Gowri Singu

Yiqiong Xiao

Ruize Yang

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.idea		.idea
Scripts		Scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team 3 Genome Assembly

Usage

Quality Control and Trimming

Quality Control

Trimming Parameters

Genome Assembly

Plasmid Assembly

Assembly Quality

Authors

About

Releases

Packages

Contributors 2

Languages

compgenomics2020/Team3-GenomeAssembly

Folders and files

Latest commit

History

Repository files navigation

Team 3 Genome Assembly

Usage

Quality Control and Trimming

Quality Control

Trimming Parameters

Genome Assembly

Plasmid Assembly

Assembly Quality

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages