gff3_QC.py [-h] [-g GFF] [-f FASTA] [-noncg] [-i] [-n ALLOWED_NUM_OF_N]
[-t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES ...]]] [-o OUTPUT] [-v] [-s STATISTIC]
Python 3.x
- GFF3: Specify the file name with the -g or --gff argument. Please note that this program requires gene/pseudogene and mRNA/pseudogenic_transcript to have an ID attribute in column 9.
- Fasta file: Specify the file name with the -f or --fasta argument. This file must be the Fasta file that the GFF3 seqids and coordinates refer to. For more information, refer to the GFF3 specification.
- Error report for the input GFF3 file
- Line_num: Line numbers of the found problematic models in the input GFF3 file.
- Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
- Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Statistic report for the output files
- Error_code: Error codes for the found problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
- Number of problematic models: Calculate the type and number of error_code.
- Error_level: Severity levels of the error codes. Three levels were defined: Error (violates the GFF3 specification), Warning (might violate the GFF3 specification), and Info (likely not an error, but worth checking).
- Error_tag: Detail of the found errors for the problematic models. Please refer to lib/ERROR/ERROR.py to see the full list of Error_code and the corresponding Error_tag.
gff3_QC -g example_file/example.gff3 -f example_file/reference.fa -o test -s statistic.txt
or
gff3_QC --gff example_file/example.gff3 --fasta example_file/reference.fa --output test --statistic statistic.txt
- -h, --help
- show this help message and exit
- -g GFF, --gff GFF
- Genome annotation file, gff3 format
- -f FASTA, --fasta FASTA
- Genome sequences, fasta format
- -noncg, --noncanonical_gene
- gff3 file is not formatted in the canonical gene model format.
- -i, --initial_phase
- Check whether initial CDS phase is 0 (default - no check)
- -n ALLOWED_NUM_OF_N, --allowed_num_of_n ALLOWED_NUM_OF_N
- Max number of Ns allowed in a feature, anything more will be reported as an error (default: 0)
- -t [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES ...]], --check_n_feature_types [CHECK_N_FEATURE_TYPES [CHECK_N_FEATURE_TYPES ...]]
- Count the number of Ns in each feature with the type specified, multiple types may be specified, ex: -t CDS exon (default: "CDS")
- -o OUTPUT, --output OUTPUT
- output file name (default: report.txt)
- -s STATISTIC, --statistic STATISTIC
- statistic file name (default: statistic.txt
- -v, --version
- show program's version number and exit