The script assembly_pipeline.py is a computational pipeline to perform de novo assembly of bacterial genomes from Illumina paired-end reads. The pipeline is based on the assember SPAdes and the improve_assembly pipeline designed to improve the SPAdes assembly by scaffolding and gap filling.
The easiest and recommended way to install and run assembly_pipeline.py is via its Docker implementation.
The Docker image is available on: https://hub.docker.com/r/francesccoll/assembly_pipeline/
assembly_pipeline.py is Python script that would work provided that all required dependencies below (both python modules and software) are installed in your local machine.
- fastqcheck version >= 1.1
- spades.py version >= v3.15.3
- improve_assembly
- quast.py version >= v5.1.0rc1
usage: assembly_pipeline.py [-h] -1 FASTQ1_FILE -2 FASTQ2_FILE -i SAMPLE_ID -r
RESULTS_DIR [-d DELETE_TMP] [--version]
[-t THREADS] [-s SPADES_DIR] [-m IMPROVED_DIR]
Pipeline for bacterial de novo assembly using Spades and improve_assembly from
paired Illumina data
optional arguments:
-h, --help show this help message and exit
required arguments:
-1 FASTQ1_FILE, --forward_reads FASTQ1_FILE
fastq file with forward reads
-2 FASTQ2_FILE, --reverse_reads FASTQ2_FILE
fastq file with reverse reads
-i SAMPLE_ID, --sample_id SAMPLE_ID
sample id used as prefix to name output files
-r RESULTS_DIR, --results_dir RESULTS_DIR
directory to store pipeline's final assembly
optional arguments:
-d DELETE_TMP, --delete_tmp DELETE_TMP
delete assembly files (except for contigs.fa)
--version show program's version number and exit
spades arguments (optional):
-t THREADS, --spades_threads THREADS
number of threads used by Spades
-s SPADES_DIR, --spades_dir SPADES_DIR
directory to store Spades resulting files
-m IMPROVED_DIR, --improved_dir IMPROVED_DIR
directory to store improve_assembly resulting files
docker run --volume=/path/to/fastq/files/:/data francesccoll/assembly_pipeline:amd64 assembly_pipeline.py --forward_reads /data/sampleId_1.fastq.gz --reverse_reads /data/sampleId_2.fastq.gz --sample_id sampleId --spades_threads 8 --results_dir /data/sampleId/
NOTE: --results_dir must be specified when running the Docker image for the output assembly files to be saved locally
assembly_pipeline.py is a free software, licensed under GNU General Public License v3.0
Use the issues page to report on installation and usage issues.
Not available yet