Skip to content

Eukaryotic genome annotation pipeline implemented in Nextflow

License

Notifications You must be signed in to change notification settings

rduque1/annot-nf

 
 

Repository files navigation

annot-nf

A portable, scalable eukaryotic genome annotation pipeline implemented in Nextflow.

This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks:

  • Fast generation of pseudomolecules from scaffolds by ordering and orientating against a reference
  • Accurate transfer of highly conserved gene models from the reference
  • De novo gene finding as a complement to the gene transfer
  • Non-coding RNA detection (tRNA, rRNA, sn(o)RNA, ...)
  • Pseudogene detection
  • Functional annotation (GO, products, ...)
    • ...by transferring reference annotations to the target genome
    • ...by inferring GO terms and products from Pfam pHMM matches
  • Consistent gene ID assignment
  • Preparation of validated GFF3, GAF and EMBL output files for jump-starting manual curation and quick turnaround time to submission

It supports parallelized execution on a single machine as well as on large cluster platforms (LSF, SGE, ...).

The pipeline is built on Nextflow as a workflow engine, so it needs to be installed first:

curl -fsSL get.nextflow.io | bash

With Nextflow installed, the easiest way to use the pipeline is to use the prepared Docker container (https://registry.hub.docker.com/u/satta/annot-nf) which contains all external dependencies.

docker pull satta/annot-nf

Here's how to start an example run using Docker (using the example dataset and parameterization included in the distribution):

$ nextflow run nextflow-io/annot-nf -profile docker

For your own runs, provide your own file names, paths, parameters, etc. as defined in the nextflow.config file.

The reference annotations used in the pipeline need to be pre-processed before they can be used. TODO: add documentation on how to prepare references.

Sascha Steinbiss (ss34@sanger.ac.uk)

###Build status Travis status

About

Eukaryotic genome annotation pipeline implemented in Nextflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Roff 79.2%
  • Perl 8.9%
  • Lua 8.2%
  • Shell 2.4%
  • Other 1.3%