This will describe how to run VEP on the HudsonAlpha cluster. This repository is intended for internal use and documentation. VEP (latest version: v115) is most easily run using Singularity and the Docker image published by ensembl at https://hub.docker.com/r/ensemblorg/ensembl-vep, which removes the need to install dependencies on the cluster. Recent VEP caches have already been downloaded for hg38.
This repository is deployed at /cluster/lab/gcooper/hg38/resources/vep and is on github at https://github.com/HudsonAlpha/vep-configs
You need to have singularity module loaded:
module load cluster/singularity
Cache data has been downloaded to /cluster/lab/gcooper/hg38/resources/vep/cache for the following versions, including the standard, refseq, and merged versions. (We typically use merged):
- 108_GRCh38
- 110_GRCh38
- 115_GRCh38
We will set this directory using the --dir_cache vep config, instead of the default value in the user's home directory (~/.vep/)
Use example_run_vep.sh and config_files/vep115.ini. Copy and modify example_run_vep.sh to have your desired input and output VCF files. Run with
sbatch example_run_vep.sh
- 2025-10-01
- Created this repository as a documentation brain dump
- Updated CADD from v1.6 to v1.7
- Added AlphaMissense
- Added SpliceAI
- Added GERP elements
- historical VEP configurations (
vep108.iniandvep110.inihave been tweaked to work properly with docker/singularity instead of plugins and data installed in/cluster/home/jlawlor/.vep)
Set in the config /cluster/lab/gcooper/hg38/resources/vep/config_files/vep115.ini:
pick_allele TRUE- restrict VEP to output ONE consequence per input variant.- This will usually be the most severe on the primary trascript, but may or may not be the consequence you want when there are overlapping genes or many transcripts. But it's usually the best and most straightforward option.
- If you set this to FALSE, VEP will output every consequence it can find, which will give you several. This can make defining variant filters more challenging. In the tab-separated output, this will result in several rows per input variant.
vcf TRUE- Ouput VCF format. replace withtab TRUEto get a true tab-separated table output. (Remove to get the default VEP output, which is not fully tab-separated)fork 4andbuffersize 500000run with 4 processors and a big variant buffer. Tweak these if you need it to go faster and adjust SBATCH processors and memory accordingly.- Custom annotations added:
- gerp scores (
GerpRS) - gerp element scores (
GerpElementS) - alphamissense scores (v1.3) (
am_class,am_pathogenicity,am_genome,am_protein_variantam_transcript_idam_uniprot_id) - BRAVO TopMED allele frequencies (Freeze 8), (
topmed_AC,topmed_AN,topmed_AF,topmed_Het,topmed_Hom) - CADD scores (v1.7) (
CADD_PHRED,CADD_RAW)
- gerp scores (
- What about gnomAD?
- Gnomad v4 is now included with the default VEP annotations instead of being added as a custom annotation. See
gnomADe_AF(and similar) for exome AF andgnomADg_AF(and similar) for genome. - In the past, we used to add gnomad annotations as fields like
gnomad3_AC. See the archive folder for more info on how to dow this, for examplevep108.ini
- Gnomad v4 is now included with the default VEP annotations instead of being added as a custom annotation. See
The essential command to run VEP is:
singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep
Running that command will display the help information for vep.
$ singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep
INFO: Using cached SIF image
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : 115.266b84d
ensembl-compara : 115.ae48a7a
ensembl-funcgen : 115.57f7061
ensembl-io : 115.25061d3
ensembl-variation : 115.b7c2637
ensembl-vep : 115.2
Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl
http://www.ensembl.org/info/docs/tools/vep/script/index.html
Usage:
./vep [--cache|--offline|--database] [arguments]
Basic options
=============
--help Display this message and quit
-i | --input_file Input file
-o | --output_file Output file
--force_overwrite Force overwriting of output file
--species [species] Species to use [default: "human"]
--everything Shortcut switch to turn on commonly used options. See web
documentation for details [default: off]
--fork [num_forks] Use forking to improve script runtime
For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.htmlBreaking down the different parts:
singularity exec- we are telling singularity to execute a command inside a container-B /cluster -B /scratch- we are letting the container see/clusterand/scratchso we can interact with our input/output filesdocker://ensemblorg/ensembl-vep:release_115.2we're telling singularity to go find the docker image on dockerhub. This is where we would modify the VEP version if needed/opt/vep/src/ensembl-vep/vepThis is the path inside the container to the vep executable. It doesn't exist on the cluster, it only makes sense inside of thesingularity execcommand
So a fully-formed VEP command might look like this when adding in the config settings, input, and output:
singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/vep --config /cluster/lab/gcooper/hg38/resources/vep/config_files/vep115.ini -i input.vcf.gz -o output.vcf.gz
Similarly, to run filter_vep, we would use
singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/filter_vep
From there, we would build the vep command as indicated in [http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html] (Ensembl's instructions).
All the standard VEP plugins (i.e., CADD, AlphaMissense, SpliceAI) are included in the docker container and the --dir_plugins option does not need to be set. Plugin data is downloaded and set as usual. Loftee is not included. If you need loftee, you would need to manually download all the plugin scripts including loftee and set the plugin directory.
Run singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_115.2 /opt/vep/src/ensembl-vep/INSTALL.pl to launch the VEP installation script that will interactively download the genomes you want. They will go into your home directory at ~/.vep/ and you will need to remove the dir_cache option from the config and potentially modify the settings assembly fasta use_given_ref and merged and xref_refseq.
Note that the first time you run a VEP command, singularity will pull and convert the container, so you will see command output about copying blobs and starting builds. You may see warnings about warn rootless which are harmless.
Note: there is a module installed (module load cluster/vep/112). When you load this module, singularity will be loaded and an alias will be made, so that when you run vep it will substitute an equivalent to singularity exec -B /cluster -B /scratch docker://ensemblorg/ensembl-vep:release_112.0 /opt/vep/src/ensembl-vep/vep. This will run version 112 of the VEP software. Technically ensembl doesn't recommend mixing and matching VEP versions, but it generally works fine with nearby versions. We are not using it here because that makes it harder to update and change versions.