Sort features in a gff3 file by according to their order on a scaffold, their coordinates on a scaffold, and parent-child relationships.
- GFF3 file: Specify the file name with the -g argument
- Sorted GFF3 file: Specify the file name with the -og argument
- All related features (with parent-child relationships) are separated by
###
directives for easier downstream parsing
- All related features (with parent-child relationships) are separated by
- Specify the input, output file names and options using short arguments:
gff3_sort -g example_file/example.gff3 -og example_file/example_sorted.gff
- Specify the input, output file names and options using long arguments:
gff3_sort --gff_file example_file/example.gff3 --output_gff example_file/example_sorted.gff
- -h, --help
- show this help message and exit
- -g GFF_FILE, --gff_file GFF_FILE
- GFF3 file that you would like to sort.
- -og OUTPUT_GFF, --output_gff OUTPUT_GFF
- Sorted GFF3 file
- -t, SORT_TEMPLATE, --sort_template SORT_TEMPLATE
- A file that indicates the sorting order of features within a gene model
- -i, --isoform_sort
- Sort multi-isoform gene models by feature type (default: False)
- -v, --version
- show program's version number and exit
- -r, --reference
- Sort scaffold (seqID) by order of appearance in gff3 file (default is by number)
- example command:
gff3_sort --gff_file example.gff3 --output_gff example_sort.gff3
- Input gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
- Output gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
- sort template file: A file that indicates the sorting order of features within a gene model. Feature type with the same sorting order should be in the same line and split by space.
gene pseudogene
mRNA
exon
CDS
- example command:
gff3_sort --gff_file example.gff3 --sort_template sort_template.txt --output_gff example_sort.gff3
- Output gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
Note:
If not all the feature type are documented in the sort template file. gff3_sort will sort features by level(1st-level, 2nd-level, and etc) and then by the order in sort template file.
- sort template file:
gene pseudogene
CDS
- Output gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
- example command:
gff3_sort --gff_file example.gff3 --sort_template sort_template.txt --isoform_sort --output_gff example_sort.gff3
- Output gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
Note:
If not all the feature type are documented in the sort template file. gff3_sort will sort features by the order in sort template file and then by level(1st-level, 2nd-level, and etc).
- sort template file:
gene pseudogene
CDS
- Output gff3 file:
LGIB01000001.1 Gnomon gene 52056 58768 . + . ID=gene1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna1;Parent=gene1
LGIB01000001.1 Gnomon CDS 52056 52096 . + 0 ID=cds1;Parent=rna1
LGIB01000001.1 Gnomon exon 52056 52096 . + . ID=id4;Parent=rna1
LGIB01000001.1 Gnomon mRNA 52056 58768 . + . ID=rna2;Parent=gene1
LGIB01000001.1 Gnomon CDS 52100 53000 . + 0 ID=cds2;Parent=rna2
LGIB01000001.1 Gnomon exon 52056 53000 . + . ID=id19;Parent=rna2
- Any features without a Parent attribute are 'root' features - the program will insert directives (lines beginning with ##) above these features.
- All child features occur after their respective Parent feature, but before new Parent features.