Skip to content

HudsonAlpha/vep-configs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Best Practices: Running VEP on HudsonAlpha Cluster

Overview

This will describe how to run VEP on the HudsonAlpha cluster. This repository is intended for internal use and documentation. VEP (latest version: v115) is most easily run using Singularity and the Docker image published by ensembl at https://hub.docker.com/r/ensemblorg/ensembl-vep, which removes the need to install dependencies on the cluster. Recent VEP caches have already been downloaded for hg38.

This repository is deployed at /cluster/lab/gcooper/hg38/resources/vep and is on github at https://github.com/HudsonAlpha/vep-configs

Requirements

You need to have singularity module loaded:

module load cluster/singularity

Cache data has been downloaded to /cluster/lab/gcooper/hg38/resources/vep/cache for the following versions, including the standard, refseq, and merged versions. (We typically use merged):

  • 108_GRCh38
  • 110_GRCh38
  • 115_GRCh38

We will set this directory using the --dir_cache vep config, instead of the default value in the user's home directory (~/.vep/)

Quick Start

Use example_run_vep.sh and config_files/vep115.ini. Copy and modify example_run_vep.sh to have your desired input and output VCF files. Run with

sbatch example_run_vep.sh

What's new?

  • 2025-10-01
    • Created this repository as a documentation brain dump
    • Updated CADD from v1.6 to v1.7
    • Added AlphaMissense
    • Added SpliceAI
    • Added GERP elements
    • historical VEP configurations (vep108.ini and vep110.ini have been tweaked to work properly with docker/singularity instead of plugins and data installed in /cluster/home/jlawlor/.vep)

Config Settings You Should Know About

Set in the config /cluster/lab/gcooper/hg38/resources/vep/config_files/vep115.ini:

  • pick_allele TRUE - restrict VEP to output ONE consequence per input variant.
    • This will usually be the most severe on the primary trascript, but may or may not be the consequence you want when there are overlapping genes or many transcripts. But it's usually the best and most straightforward option.
    • If you set this to FALSE, VEP will output every consequence it can find, which will give you several. This can make defining variant filters more challenging. In the tab-separated output, this will result in several rows per input variant.
  • vcf TRUE - Ouput VCF format. replace with tab TRUE to get a true tab-separated table output. (Remove to get the default VEP output, which is not fully tab-separated)
  • fork 4 and buffersize 500000 run with 4 processors and a big variant buffer. Tweak these if you need it to go faster and adjust SBATCH processors and memory accordingly.
  • Custom annotations added:
    • gerp scores (GerpRS)
    • gerp element scores (GerpElementS)
    • alphamissense scores (v1.3) (am_class, am_pathogenicity, am_genome, am_protein_variant am_transcript_id am_uniprot_id)
    • BRAVO TopMED allele frequencies (Freeze 8), (topmed_AC, topmed_AN, topmed_AF, topmed_Het, topmed_Hom)
    • CADD scores (v1.7) (CADD_PHRED, CADD_RAW)
  • What about gnomAD?
    • Gnomad v4 is now included with the default VEP annotations instead of being added as a custom annotation. See gnomADe_AF (and similar) for exome AF and gnomADg_AF (and similar) for genome.
    • In the past, we used to add gnomad annotations as fields like gnomad3_AC. See the archive folder for more info on how to dow this, for example vep108.ini

Vep Command

The essential command to run VEP is:

singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep

Running that command will display the help information for vep.

$ singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep
INFO:    Using cached SIF image
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 115.266b84d
  ensembl-compara      : 115.ae48a7a
  ensembl-funcgen      : 115.57f7061
  ensembl-io           : 115.25061d3
  ensembl-variation    : 115.b7c2637
  ensembl-vep          : 115.2

Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl

http://www.ensembl.org/info/docs/tools/vep/script/index.html

Usage:
./vep [--cache|--offline|--database] [arguments]

Basic options
=============

--help                 Display this message and quit

-i | --input_file      Input file
-o | --output_file     Output file
--force_overwrite      Force overwriting of output file
--species [species]    Species to use [default: "human"]

--everything           Shortcut switch to turn on commonly used options. See web
                       documentation for details [default: off]
--fork [num_forks]     Use forking to improve script runtime

For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

Breaking down the different parts:

  • singularity exec - we are telling singularity to execute a command inside a container
  • -B /cluster -B /scratch - we are letting the container see /cluster and /scratch so we can interact with our input/output files
  • docker://ensemblorg/ensembl-vep:release_115.2 we're telling singularity to go find the docker image on dockerhub. This is where we would modify the VEP version if needed
  • /opt/vep/src/ensembl-vep/vep This is the path inside the container to the vep executable. It doesn't exist on the cluster, it only makes sense inside of the singularity exec command

So a fully-formed VEP command might look like this when adding in the config settings, input, and output:

singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep --config /cluster/lab/gcooper/hg38/resources/vep/config_files/vep115.ini -i input.vcf.gz -o output.vcf.gz

Similarly, to run filter_vep, we would use

singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/filter_vep

From there, we would build the vep command as indicated in [http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html] (Ensembl's instructions).

Plugins

All the standard VEP plugins (i.e., CADD, AlphaMissense, SpliceAI) are included in the docker container and the --dir_plugins option does not need to be set. Plugin data is downloaded and set as usual. Loftee is not included. If you need loftee, you would need to manually download all the plugin scripts including loftee and set the plugin directory.

How to install other ensembl genomes (specific version or other species)

Run singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/INSTALL.pl to launch the VEP installation script that will interactively download the genomes you want. They will go into your home directory at ~/.vep/ and you will need to remove the dir_cache option from the config and potentially modify the settings assembly fasta use_given_ref and merged and xref_refseq.

Troubleshooting

Note that the first time you run a VEP command, singularity will pull and convert the container, so you will see command output about copying blobs and starting builds. You may see warnings about warn rootless which are harmless.

what about the cluster/vep module

Note: there is a module installed (module load cluster/vep/112). When you load this module, singularity will be loaded and an alias will be made, so that when you run vep it will substitute an equivalent to singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_112.0 /opt/vep/src/ensembl-vep/vep. This will run version 112 of the VEP software. Technically ensembl doesn't recommend mixing and matching VEP versions, but it generally works fine with nearby versions. We are not using it here because that makes it harder to update and change versions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages