Skip to content

LordGenome/phigis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

phigis

PCR primer design

Exome and genome-scale genetic diagnostics has increased the demand for PCR confirmations in the proband and for prenatal and cascade testing in relatives. Genetics Service Exome-based tests, identifying tens to hundreds of targets per week, many of which require the design of cognate PCR amplicons for further testing in other at risk individuals or for prenatal testing. Existing tools are not readily scalable for high throughput primer design: it takes 15 to 30 minutes to use web-driven tools like Primer 3 to design each primer pair, which must then be checked using SNPchecker, an application that is no longer supported. Other high throughput designer tools like SNP Box and Primer Mapper are not optimised for high-throughput, exon-focussed work using human genome reference tools. Simply feed phigis a list of variants in bed file format and primers will be designed in a few minutes using commodity hardware such as a Macbook.

NAME phigis

   Primer3 Helper for Indexed Genomes with Ipcress and Samtools

VERSION 0.1

DESCRIPTION This is a wrapper to call shell, samtools and primer3 to make primers using a samtools indexed genome, dbSNP and RefFlat co-ordinates and gene names. It reads all variants in a bedfile, finds the corresponding exon in RefFlat, designs primers that flank the exon, avoiding common SNPs being withn 8 bases of the 3' end of the primers. If the exon is >450 bases then only 60 base flanking sequences of the variant are used. Primers are checked using the ipcress in silico PCR script.

   NB make sure the genome builds are consistent

INPUT

   o   bed file as text file with unix line endings describing variants

   o   genome build with samtools index Repeat sequence marked as lower
       case

   o   Common SNP file in bed format

   o   exon file in bed format

OUTPUT

   o   log file, including ipcress in silico PCR result

   o   primer file contained designed primer specifications

   o   primer3 output file containing additional primers and report on
       primer design

OTHER REQUIREMENTS Bash shell, samtools in path, primer3 in path, exonerate (EBI) in path, hg19 as samtools indexed fasta, common SNP bed file, exons bed file, Perl >=5.12 primer3 uses themodynamic parameters stored somewhere like /usr/local/Cellar/primer3/2.3.7/share/primer3/primer3_config/ The path may vary depending how you installed primer3, and so may the version number

Additional relative file paths not included geneme/genome_fasta.fa : the fasta genome file with lower case repeats geneme/genome_fasta.fa.fai : the samtools indexed fasta file SNP/SNP_147.bed : the SNP bed file RefSeq/RefFlat_coding_exons.bed : the exons file test.bed : the bed file describing the variant needing primer design

References

Bioinformatics. 2009 25:2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 The Sequence Alignment/Map format and SAMtools Li H et al.

Nucleic Acids Res. 2012 e115 Primer3—new capabilities and interfaces Untergasser,A et al.

BMC Bioinformatics 2005 6:31 Automated generation of heuristics for biological sequence comparison Slater G & Birney E

Program Description Global Variables There are probably too many, not all are used but all are commented. It is easier/lazier to pass them between subroutines as globals.

SETUP

   o   Setup file locations, hard coded but with options to change to STDN

   o   Takes hard coded input of date

   o   Setup output and log files

   o   Setup path to indexed genome fasta

   o   Read dbSNP, RefFlat and Variant locus bed file

   o   Setup padding distance (increments until primers designed)

MAIN LOOP

   For each padding value, try to make primers using primer3

   o   find_exon_co_ords: find the exon that includes the SNP of interest

   o   define_target_range: this is the exon + 15 bases + the padding.
       The region of interest.

   o   write_primer3_file: use samtools with backtick system call to
       extract the ROI as fasta,remove the fasta header then identify any
       SNPs in the region populate the boulderIO file used by primer3 with
       the ROI, excluding the exon + 15 bases and exclude SNPs within 8
       bases of the 3' end of the primer

   o   run_primer3 with backtick system call

       The values $target_exon, $primer3_template, $primer_space,
       $exon_plus and exclusion list are interpoloated.  Primer3 is
       forbidden to use primers with SNPs 8 bases or less from 3' Lower
       case (repeat) is excluded within 10 of the 3'

       SEQUENCE_ID\=$target_exon SEQUENCE_TEMPLATE\=$primer3_template
       PRIMER_TASK\=pick_pcr_primers PRIMER_PICK_LEFT_PRIMER\=1
       PRIMER_PICK_INTERNAL_OLIGO\=0 PRIMER_PICK_RIGHT_PRIMER\=1
       PRIMER_OPT_SIZE\=22 PRIMER_MIN_SIZE\=18 PRIMER_MAX_SIZE\=28
       PRIMER_LOWERCASE_MASKING=10 PRIMER_MAX_NS_ACCEPTED\=1
       PRIMER_MIN_THREE_PRIME_DISTANCE=3
       PRIMER_PRODUCT_SIZE_RANGE\=100\-600 P3_FILE_FLAG\=1
       SEQUENCE_TARGET\=$primer_space,$exon_plus
       SEQUENCE_EXCLUDED_REGION=$exclusion_list

   o   check_SNPs: check location of SNPs witihn ROI

   o   IF primer3 succeeds product size will be > 0
       extract_primers_and_amplicon will write out primers and other
       details to tsv file

   o   ELSE add 30 to the padding and try again up to padding = 185.

   o   Primers are checked using the in silico PCR program ipcress and an
       ipcress file is added to the log  The amplicon site should be
       unique and the product size should be identical to the predicted
       product size.

Releases

No releases published

Packages

No packages published

Languages