Skip to content
/ SeqPig Public
forked from HadoopGenomics/SeqPig

SeqPig is a library for Apache Pig for the distributed analysis of large sequencing datasets. It provides import and export functions for file formats commonly used for sequencing data, as well as a collection of Pig user-defined-functions (UDF’s) to help process aligned and unaligned sequence data.

Notifications You must be signed in to change notification settings

aim11/SeqPig

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SeqPig is a library of import and export functions for file formats
commonly used in bioinformatics for Apache Pig. Additionally, it
provides a collection of Pig user-defined functions (UDF's) that allow
for processing of aligned and unaligned sequence data. Currently
SeqPig supports BAM/SAM, FastQ and Qseq input and output and FASTA
input. It is built on top of the Hadoop-BAM library. Fore more
information see

http://seqpig.sourceforge.net/

and the documentation that comes with the release.

Releases of SeqPig come bundled with Picard/Samtools, which were developed at
the Wellcome Trust Sanger Institute, and Biodoop/Seal, which were developed
at the Center for Advanced Studies, Research and Development in Sardinia. See

http://samtools.sourceforge.net/
http://biodoop-seal.sourceforge.net/

Installation with precompiled Seal library
  > mvn install:install-file -Dfile=lib/seal-0.4.0-with-hadoop-bam-7.4.0.jar -DgroupId=it.crs4 -DartifactId=seal -Dversion=0.4.0-with-hadoop-bam-7.4.0 -Dpackaging=jar -DgeneratePom=true
  > mvn package -DskipTests


About

SeqPig is a library for Apache Pig for the distributed analysis of large sequencing datasets. It provides import and export functions for file formats commonly used for sequencing data, as well as a collection of Pig user-defined-functions (UDF’s) to help process aligned and unaligned sequence data.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 80.4%
  • PigLatin 7.1%
  • Shell 6.7%
  • Perl 4.2%
  • Python 1.4%
  • R 0.2%