Skip to content

A downstream variant annotation program that can effectively classify variants by region, predict amino acid change type, and prioritize mutation effects.

License

Notifications You must be signed in to change notification settings

nicolexxuu/SNPAAMapper-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNPAAMapper-Python

Ma, K., N. Xu, A. He, and Y. Bai, “SNPAAMapper-Python: A highly efficient genome-wide SNP variant analysis pipeline for next-generation sequencing data.” Poster Presentation at the 18th Annual Conference for the Mid-South Computational Biology and Bioinformatics Society (MCBIOS 2022)

SNPAAMapper is a downstream variant annotation program that can effectively classify variants by region (e.g. exon, intron, etc.), predict amino acid change type (e.g. synonymous, non-synonymous mutation, etc.), and prioritize mutation effects (e.g. CDS versus 5'UTR, etc.).

Features

  • The pipeline accepts a VCF input file in tab-delimited format and processes the vcf input file containing all cases (G5, lowFreq, and novel)
  • The variant mapping step allows users to select whether they want to report the base pair distance between each identified intron variant and its nearby exon
  • Compatibility with VCF files called by different SAMTools versions (0.1.18 and older) and/or generated using SAMTools with two or three samples
  • The spreadsheet result file contains full protein sequences for both reference and alternative alleles, which makes it easier for downstream protein structure/function analysis tools to use

Requirements

  • python 3.x
  • sys
  • csv
  • re
  • shutil
  • Git LFS

Instructions

If you haven't yet, initialize Git LFS by running

git lfs install

Clone this repo as follows

git clone https://github.com/nicolexxuu/SNPAAMapper-Python
cd ./SNPAAMapper-Python

and download hg19_CDSIntronWithSign.txt.out to your local repository.

Next, type

./run_SNPAAMapper-Python.sh config.txt

OR run the following steps in sequential order (Note: the first two steps were compiled for the human hg19 genome and output files have already been generated):

  1. Process exon annotation files and generate feature start and gene mapping files:

    python3 Algorithm_preprocessing_exon_annotation_RR.py ChrAll_knownGene.txt.exon
  2. Classify variants by regions (CDS, Upstream, Downstream Intron, UTRs...)

    python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon VCF_input_file_in_tab_delimited_format.vcf

    OR

    python3 Algorithm_mapping_variants_reporting_class_intronLocation_updown.py ChrAll_knownGene.txt.exon VCF_input_file_in_tab_delimited_format.vcf IntronExon_boundary_in_bp
  3. Predict amino acid change type

    python3 Algorithm_predicting_full_AA_change_samtools_updown.py VCF_input_file_in_tab_delimited_format.vcf.append kgXref.txt hg19_CDSIntronWithSign.txt.out ChrAll_knownGene.txt >VCF_input_file_in_tab_delimited_format.vcf.out.txt
  4. Prioritize mutation effects

    python3 Algorithm_prioritizing_mutation_headerTop_updown.py VCF_input_file_in_tab_delimited_format.vcf.append.out.txt

The final output file is *.append.out.txt.prioritzed_out.

References

  1. “The Human Genome Project.” Genome.gov, www.genome.gov/human-genome-project.
  2. Nature News, Nature Publishing Group, www.nature.com/articles/d42473-021-00030-9.
  3. Lewis, Tanya. “Human Genome Project Marks 10th Anniversary.” LiveScience, Purch, 14 Apr. 2013, www.livescience.com/28708-human-genome-project-anniversary.html.
  4. Barba, Marina, Czosnek, Henryk, Hadidi, Ahmed. “Historical Perspective, Development and Applications of next-Generation Sequencing in Plant Virology.” Viruses, MDPI, 6 Jan. 2014, www.ncbi.nlm.nih.gov/pmc/articles/PMC3917434/.
  5. Bai, Yongsheng, and James Cavalcoli. “SNPAAMapper: An Efficient Genome-Wide SNP Variant Analysis Pipeline for next-Generation Sequencing Data.” Bioinformation, Biomedical Informatics, 16 Oct. 2013, www.ncbi.nlm.nih.gov/pmc/articles/PMC3819573/.
  6. “UCSC Genome Browser Project History.” Genome Browser History, https://genome.ucsc.edu/goldenPath/history.html.
  7. “The Perl Programming Language.” TIOBE, https://www.tiobe.c

License

MIT

About

A downstream variant annotation program that can effectively classify variants by region, predict amino acid change type, and prioritize mutation effects.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages