Converts GFF3 files from Prokka into a format suitable for submission to EMBL.
Submitting annoated genomes to EMBL is a very difficult and time consuming process. This software converts GFF3 files from the most commonly use prokaryote annotation tool Prokka into a format that is suitable for submission to EMBL. It has been used to prepare more than 30% of all annotated genomes in EMBL/GenBank.
N.B. This implements some EMBL specific conventions and is not a generic conversion tool. It is also not a validator, so you need to pass in parameters which are acceptable to EMBL.
GFF3toEMBL has the following dependencies:
There are a number of ways to install GFF3toEMBL and details are provided below. If you encounter an issue when installing GFF3toEMBL please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk.
A docker container is provided with all of the dependancies setup and installed. To install the container:
docker pull sangerpathogens/gff3toembl
To run the script from within the container on test data (substituting /home/ubuntu/data for your own directory):
docker run --rm -it -v /home/ubuntu/data:/data sangerpathogens/gff3toembl gff3_to_embl --output_filename /data/output_file.embl ABC 123 PRJ1234 ABC /opt/gff3toembl-1.1.0/gff3toembl/tests/data/single_feature.gff
This is for advanced users. The homebrew recipe, Dockerfile and the TravisCI install dependancies script all contain steps to setup depenancies and install the software so might be worth looking at for hints.
- Install genometools including python bindings
- git clone git@github.com:sanger-pathogens/gff3toembl.git
- python setup.py install
Run python setup.py test
usage: gff3_to_embl [-h] [--authors AUTHORS] [--title TITLE]
[--publication PUBLICATION] [--genome_type GENOME_TYPE]
[--classification CLASSIFICATION]
[--output_filename OUTPUT_FILENAME]
[--locus_tag LOCUS_TAG]
[--translation_table TRANSLATION_TABLE]
[--chromosome_list CHROMOSOME_LIST] [--version]
organism taxonid project_accession description file
Converts prokaryote GFF3 annotations to EMBL for ENA submission. Cite
http://dx.doi.org/10.21105/joss.00080
positional arguments:
organism Organism
taxonid Taxon id
project_accession Accession number for the project
description Genus species subspecies strain of organism
file GFF3 filename
optional arguments:
-h, --help show this help message and exit
--authors AUTHORS, -i AUTHORS
Authors (in the EMBL RA line style)
--title TITLE, -m TITLE
Title of paper (in the EMBL RT line style)
--publication PUBLICATION, -p PUBLICATION
Publication or journal name (in the EMBL RL line
style)
--genome_type GENOME_TYPE, -g GENOME_TYPE
Genome type (linear/circular)
--classification CLASSIFICATION, -c CLASSIFICATION
Classification (PROK/UNC/..)
--output_filename OUTPUT_FILENAME, -f OUTPUT_FILENAME
Output filename
--locus_tag LOCUS_TAG, -l LOCUS_TAG
Overwrite the locus tag in the annotation file
--translation_table TRANSLATION_TABLE, -n TRANSLATION_TABLE
Translation table
--chromosome_list CHROMOSOME_LIST, -d CHROMOSOME_LIST
Create a chromosome list file, and use the supplied
name
--version show program's version number and exit
An example:
gff3_to_embl --authors 'John' --title 'Some title' --publication 'Some journal' \
--genome_type 'circular' --classification 'PROK' \
--output_filename /tmp/single_feature.embl --translation_table 11 \
Organism 1234 'My project' 'My description' gff3toembl/tests/data/single_feature.gff
The directory 'example_data' contains an input GFF file and the output file along with the command.
GFF3toEMBL is free software, licensed under GPLv3.
Please report any issues to the issues page or email path-help@sanger.ac.uk.
If you use this software please cite:
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL
Andrew J. Page, Sascha Steinbiss, Ben Taylor, Torsten Seemann, Jacqueline A. Keane
The Journal of Open Source Software, 1 (6) 2016. doi: 10.21105/joss.00080
This doesn't work with some versions of Genometools on Mac OS X; it appears to work with Genometools 1.5.4