-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Sixgill (Six-frame Genome-Inferred Libraries for LC-MS/MS) is a tool for using shotgun metagenomics sequencing reads to construct databases of 'metapeptides': short protein fragments for database search of LC-MS/MS metaproteomics data. A comprehensive set of resources for building and using metapeptide databases is here.
The main Sixgill command is sixgill_build, which builds a metapeptide database from files containing shotgun metagenomic sequencing reads and optional filtering criteria. We have also provided utilities for filtering metapeptide databases, merging multiple databases together, and building FASTA files from metapeptide databases.
Metapeptide databases are tab-delimited files. By default, they are bgzipped because the databases can be quite large. All of the commands described below take these databases as input or produce them as output.
Whether or not an input database file is gzipped is automatically detected. Output database files will be gzipped unless the "--nogzipout" flag is specified.
We recommend using a '.tsv.gz' extension for metapeptide database files for clarity, e.g., 'metapeptides_BSt.tsv.gz'.
- sixgill_build: build a metapeptide database
- sixgill_filter: filter a metapeptide database according to specified criteria
- sixgill_merge: merge multiple metapeptide databases into a single database
- sixgill_makefasta: make a FASTA file from a metapeptide database
Build a metapeptide database and save it to a bgzipped tab-separated value file. Optionally, also output a FASTA database with the amino acid sequences, with each entry having its sequence as its name.
- --out: output metapeptide database file
- fastq files containing reads to translate
- --minlength: minimum length of a metapeptide to keep
- --minqualscore: minimum base-call quality score over the whole coding sequence
- --minorflength: minimum length of the open reading frame
- --minlongesttryppeplen: minimum length of the longest tryptic peptide
- --maxreads: stop early if we hit this many reads (for testing)
- --minreadcount: minimum read count for each output
- --metagenefile: input MetaGene Annotator output file. Records must be in same linear order as reads in fastqfiles
- --minmetagenescore: minimum MetaGene score
- --outfasta: output protein fasta database file
- --nogzipout: write uncompressed database
- --debug: Enable debug logging
Filter a metapeptide database.
- input database file
- --out: output metapeptide database file
- --minorflength: minimum ORF length
- --minaaseqlength: minimum AA sequence length
- --minreadcount: minimum read count
- --minquality: minimum basecall quality
- --minlongesttryppeplen: minimum length of the longest tryptic peptide
- --maxmetapeptides: maximum number of metapeptides to write
- --minmetagenescore: minimum MetaGene score (-1 for none)
- --nogzipout: write uncompressed database
- --debug: Enable debug logging
Merge multiple metapeptide databases into a single database.
- database files to merge
- --out: output metapeptide database file
- --nogzipout: write uncompressed database
- --debug: Enable debug logging
Make a FASTA database containing the sequences (amino acid or nucleotide) from the metapeptides in a metapeptide database. The name of each entry is the same as the sequence.
- input database file
- --out: output fasta file
- --type: type of fasta file to write. Must be one of:
- aa: amino acid
- ntsingle: a single nucleotide coding sequence for each entry
- ntmulti: all coding sequences for each entry
- --debug: Enable debug logging