FluPipe

1. Introduction

FluPipe provides a fully automated, flexible and reproducible workflow for reconstructing genome sequences from Illumina NGS data. The pipeline is optimized for Influenza data.

2. Setup

The most convenient way to install the pipeline is by using git and conda:

# installing the pipeline using git
cd designated/path
git clone https://github.com/rki-mf1/FluPipe.git/
cd FluPipe
conda env create -f flupipe.yml -n FluPipe
conda activate FluPipe

3. Usage

As a minimum the pipeline needs the following input:

folder containing gz-compressed FASTQ files (-d)
output folder (-o), in which a subfolder named results is automatically created to store all results
a reference sequence (--ref) or an influenza segment database, containing representative fasta files for each genome segment (--segmentdb)

# activate conda environment once before using the pipeline 
conda activate FluPipe 


flupipe.py    -d path/to/myInputFolder \
              -o path/to/myOutputFolder \
              --segmentdb path/to/segmentdb

4. Options to customize the workflow

The manual page provides information on all options available.

flupipe.py --help

4.1 Adjusting Read Filtering

4.1.1 Read Length

Per default the minimum read length filter is set to 50.

(-l 50)

4.1.2 Read Quality

Qualitative read quality used by fastp to filter reads. By default the "--read_filter_qual" option uses a phredscore of 20 as a cutoff.

4.2 Taxonomic Read Filtering

If necessary, reads not derived from the Orthomyxoviridae family can be excluded. Read classification is based on corresponding k-mer frequencies using a defined kraken2 database (--kraken). A database containing Influenza A , Influenza B and human genome sequences is recommended.

4.3 Find a reference for each segment

For each segment, a multifasta file with any number of reference sequences can be provided. The pipeline compares the sequencing reads to the given references and determines the optimal reference sequence per segment for the given data based on read coverage, read depth, and uniformity of mapping.

4.4 Adapting variant calling

Sites considered for variant calling can be restricted based on the following parameters at the respective position.

the minimum sequencing depth (--vvar_mincov; default: 20)
the minimum number of reads supporting a variant (--var_call_count; default: 10)
the relative number of reads supporting a variant (--var_call_frac; default: 0.1)

CHECK PARAMETERS

4.5 Consensus generation

When generating the consensus sequence, all positions whose read coverage is below a defined threshold can be hard-masked by N (--cns_min_cov; default: 20). In addtion, genotypes can be adjusted meaning that variants supported by a given fraction of all reads covering the respective site are called explicitely (--cns_gt_adjust; default: 0.9). This means that a variant that shows a read fraction of 0.94 would be set to full alternate allele and variants showing only 0.03 readfraction are changed to reference.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
envs		envs
rules		rules
scripts		scripts
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
__version__.py		__version__.py
flupipe.Rmd		flupipe.Rmd
flupipe.Snakefile		flupipe.Snakefile
flupipe.py		flupipe.py
flupipe.yml		flupipe.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FluPipe

1. Introduction

2. Setup

3. Usage

4. Options to customize the workflow

4.1 Adjusting Read Filtering

4.1.1 Read Length

4.1.2 Read Quality

4.2 Taxonomic Read Filtering

4.3 Find a reference for each segment

4.4 Adapting variant calling

4.5 Consensus generation

About

Releases 3

Packages

Contributors 3

Languages

License

rki-mf1/FluPipe

Folders and files

Latest commit

History

Repository files navigation

FluPipe

1. Introduction

2. Setup

3. Usage

4. Options to customize the workflow

4.1 Adjusting Read Filtering

4.1.1 Read Length

4.1.2 Read Quality

4.2 Taxonomic Read Filtering

4.3 Find a reference for each segment

4.4 Adapting variant calling

4.5 Consensus generation

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 3

Languages

Packages