Variant Annotation

GRCh37

SNVs and indels

Basic annotation of merged vcf files from the individual variants callers is carried out in two steps. First, the combined vcf is annotated with information from RepeatMasker and the ENCODE consortium. These files are retrieved from the UCSC genome browser and parsed as such:

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz
gunzip rmsk.txt.gz
cut -f6-8,12 rmsk.txt | \
    grep -e "Low_complexity" -e "Simple_repeat" | \
    sed 's/^chr//g'> rmsk_mod.bed
bgzip rmsk_mod.bed
tabix --preset bed rmsk_mod.bed.gz

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusExcludable.bed.gz
gunzip wgEncodeDacMapabilityConsensusExcludable.bed.gz
sed -i 's/^chr//g' wgEncodeDacMapabilityConsensusExcludable.bed
bgzip wgEncodeDacMapabilityConsensusExcludable.bed
tabix --preset bed wgEncodeDacMapabilityConsensusExcludable.bed.gz

Subsequently, vcf2maf is used to annotate functional effects of mutations as well as other metadata using VEP. The --custom-enst argument to vcf2maf takes a list of preferred gene transcript isoforms which to map mutations onto. We supply a consensus list of isoform_overrides_at_mskcc and isoform_overrides_uniprot, generated as such:

t1 = readr::read_tsv('isoform_overrides_at_mskcc')
t2 = readr::read_tsv('isoform_overrides_uniprot')
t2 %>%
    dplyr::filter(gene_name %nin% t1$gene_name) %>%
    dplyr::bind_rows(., t1) %>%
    readr::write_tsv('isoforms')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant Annotation

GRCh37

SNVs and indels

Clone this wiki locally