Decouphage: the art of decorating a Phage genome by gluing feature cutouts into it.

As the name suggests decouphage is a tool designed to annotate phage genomes. It only external dependency is ncbi-blast+ everything else is optional.

Relevant branches

main branch: stable version available in pypi and dockerhub.
dev branch: development branch with new features and bugs.

Highlights

Can be easily installed in Linux or Mac computers. Only requirement is ncbi-blast+.
Can be extended with prodigal, but as default it uses phanotate for ORF calling.
Decouphage is fast, using a Macbook most phage genomes can be annotated in less than a minute.
Uses ncbi NR database containing non-identical sequences from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF.
Allow manual curation using the web interface.

Validation

Decouphage validation was made in comparison to RAST(Rapid Annotation using Subsystem Technology), a tool that is often praised for its good Prokaryotic annotation capabilities.

Decouphage outperforms RAST when calling some of the most relevant product categories:

The CDS annotation agreement between Decouphage and RAST is high, reaching up to 94% for some products:

Enzyme	Agreement rate with RAST
endonuclease	94%
exonuclease	58%
helicase	70%
hydrolase	73%
kinase	86%
ligase	94%
methyltransferase	65%
polymerase	76%
primase	78%
protease	85%
recombinase	28%
reductase	90%
synthase	84%
terminase	94%
transferase	60%

A precise comparison of product-to-position is difficult given differences in spelling, typos, synonyms, and interchangeable names, but the table above can give a good idea of the similarities.

To corroborate the surplus of annotations that decouphage achieves, the amount of "hypothetical protein" and "Phage protein" was also checked:

Product	Decouphage	Rast	Agreement rate with RAST
hypothetical protein	3945	6302	53%
phage protein	0	1626	N/A¹
Total products	9692	9692	N/A²

Decouphage does not include products containing "phage protein" as they usually are a noise source.
The genbank file generated by RAST was used as input for decouphage to ensure no difference in the number of CDS.

This table shows that Decouphage potentially assigns 2x more meaningful products than RAST when annotating a phage genome.

How can I use decouphage

Options

Usage: decouphage [OPTIONS] INPUT_FILE

Options:
  --prodigal             Use prodigal for orf calling instead of phanotate.
  -d, --db PATH
  -o, --output TEXT
  -t, --threads INTEGER  [default: 1]
  --tmpdir TEXT          Folder for intermediate files.
  --no_orf_calling       Annotate CDS from genbank file.
  --locus_tag TEXT       Locus tag prefix.
  --download_db          Download default database.
  -v, --verbose          More verbose logging for debugging purpose.
  --help                 Show this message and exit.

I want to discover and annotate a lot of ORFs

decouphage genome.fasta -o genome.gb

I want to use prodigal to find my genes

decouphage genome.fasta -o genome.gb --prodigal

I have a genbank with poor annotation and want more

In this mode decouphage will reuse the genbank ORFs and just run the annotation procedure.

decouphage genome.gbk -o genome.gb --no-orf-calling

Installation

You have multiple options to install and run decouphage:

Ubuntu

Install decouphage:

pip install decouphage

(Required) Install ncbi-blast+

apt install ncbi-blast+

(Optional) Install dependencies:

apt install prodigal trnascan-se

Docker

Run with docker (Already includes dependencies and databases):

docker run decouphage/decouphage

Databases

Decouphage database is derived from NCBI NR database clustered at 90% identity and 90% sequence length.

Downloading database

Download database to default location in $HOME/.decouphage/db/

decouphage --download_db

Making custom databases

Make blast database

makeblastdb -in database.fa -parse_seqids -blastdb_version 5 -dbtype prot

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
assets		assets
src		src
tests		tests
validation		validation
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
decouphage		decouphage
requirements.txt		requirements.txt
run_local_test.sh		run_local_test.sh
run_tests.sh		run_tests.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decouphage: the art of decorating a Phage genome by gluing feature cutouts into it.

Table of contents

Highlights

Validation

How can I use decouphage

Options

I want to discover and annotate a lot of ORFs

I want to use prodigal to find my genes

I have a genbank with poor annotation and want more

Installation

Ubuntu

Docker

Databases

Downloading database

Making custom databases

About

Releases

Packages

Languages

License

voorloopnul/decouphage

Folders and files

Latest commit

History

Repository files navigation

Decouphage: the art of decorating a Phage genome by gluing feature cutouts into it.

Table of contents

Highlights

Validation

How can I use decouphage

Options

I want to discover and annotate a lot of ORFs

I want to use prodigal to find my genes

I have a genbank with poor annotation and want more

Installation

Ubuntu

Docker

Databases

Downloading database

Making custom databases

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages