Skip to content

Commit 5734f06

Browse files
authored
Merge pull request #8 from UniOfLeicester/releaseCandidate0-1
Release candidate 0.1
2 parents 3d7107c + 29e601e commit 5734f06

29 files changed

+3954
-2981
lines changed

INSTALLATION.md

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Variant Validator installation instructions
2+
3+
In these instructions, lines that must be entered at the command prompt are preceded with >, like so:
4+
> ls
5+
6+
These instructions will allow you to configure the software on Linux. Mac OS X computers operate similarly.
7+
8+
There are several steps involved in setting up variant validator:
9+
* The application files themselves must be installed from SVN.
10+
* The python environment must be set up. On a LAMP, only a custom version of Python will do.
11+
* Protobuf must be compiled and installed
12+
* Required python packages need to be installed, too.
13+
* The databases must be downloaded and set up
14+
* The configuration files must be changed to point the validator at those databases.
15+
16+
## Virtual environment
17+
18+
Variant validator currently requires python 2.7.
19+
20+
When installing Variant Validator it is wise to use a virtual environment, as it requires specific versions of several libraries.
21+
First, download and set up conda (in this case miniconda as we don't need all packages)
22+
> wget https://repo.anaconda.com/miniconda/Miniconda2-latest-Linux-x86_64.sh
23+
> bash Miniconda2-latest-Linux-x86_64.sh
24+
> echo ". /local/miniconda2/etc/profile.d/conda.sh" >> ~/.bashrc
25+
> source ~/.bashrc
26+
Then create the conda environment and install the necessary programs (this should be done in an environment.yml file eventually). Note, installing biotools downgrades the version of setuptools so that needs to be reinstalled before the pip command to install hgvs=1.1.3
27+
> conda create -n VVenv
28+
> conda activate VVenv
29+
> conda install -c conda-forge sqlite python=2.7 protobuf=3.5.1 docutils python-daemon httplib2 mysql-connector-python mysql-python
30+
> conda install -c auto biotools
31+
> conda install -c bioconda pyliftover pysam
32+
> conda install setuptools numpy
33+
> conda install -c anaconda pytest
34+
> pip install hgvs==1.1.3
35+
The packages required for variant validator to function are now set up in the environment "VVenv".
36+
37+
## Installing validator code
38+
39+
To clone this software from GIT, use:
40+
> git clone https://github.com/pjdp2/variantValidator.git
41+
This'll create a variantValidator folder in the directory you run it in.
42+
> cd variantValidator
43+
Run the installation script to integrate variant validator with python's site packages.
44+
> python setup.py install
45+
For development purposes, you can use
46+
> pip install -e .
47+
to ensure any changes you make in the local variant validator folder is reflected in your python site-packages.
48+
49+
## Setting up MySQL
50+
51+
This step is not optional for getting variant validator to work. Install packages with:
52+
> sudo apt-get install mysql-server
53+
54+
This will install everything you need and start the database server. Make sure you note down the root account password that you're prompted for during installation!
55+
Check it runs with:
56+
> sudo service mysql status
57+
If it's not running, use
58+
> sudo service mysql start
59+
to boot it up.
60+
Enter mysql from any user's shell prompt with
61+
> mysql -u root -p
62+
This will prompt you for the root password you made earlier. Within MySQL, create the variant validator user:
63+
> CREATE USER 'vvadmin'@'localhost' IDENTIFIED BY 'var1ant';
64+
You should create the database too
65+
> CREATE DATABASE validator;
66+
> USE validator;
67+
Grant access rights to the vvadmin user:
68+
> GRANT SELECT,INSERT,UPDATE,DELETE ON validator.* TO vvadmin;
69+
Quit mysql with
70+
> \q
71+
Bye indeed.
72+
73+
In the VariantValidator/data folder is a copy of the empty mysql database needed by Variant Validator to run. The software will populate it as variants are run. You need to upload it to the running MySQL database with:
74+
> mysql -u root -p validator < data/emptyValidatorDump.sql
75+
- adjusting the path depending on where your empty database is.
76+
You should log into MySQL and check to see if the database uploaded correctly. Login with vvadmin, password "var1ant".
77+
Then:
78+
> USE validator;
79+
> SHOW TABLES;
80+
which should show:
81+
> +---------------------+
82+
> | Tables_in_validator |
83+
> +---------------------+
84+
> | LRG_RSG_lookup |
85+
> | LRG_proteins |
86+
> | LRG_transcripts |
87+
> | refSeqGene_loci |
88+
> | transcript_info |
89+
> +---------------------+
90+
if it's set up correctly.
91+
92+
## Setting up PostGreSQL
93+
94+
It's recommended for performance reasons to use a local varsion of the UTA database. To do this, first install the required packages with:
95+
> sudo apt-get install postgresql postgresql-contrib
96+
You need to switch to the "postgres" user to make anything work initially.
97+
> sudo -i -u postgres
98+
Create a new user with a name matching your user account. In my case - pjdp2. When prompted, make yourself a superuser.
99+
> createuser --interactive
100+
The postgres user doesn't have a unix password, so you'll need to use exit to get your account back.
101+
> exit
102+
Enter the database with psql. You'll be signed by default into the "postgres" database, which serves as a kind of master database for controlling user accounts.
103+
> psql postgres
104+
Inside psql, create the uta_admin role, and set the password when prompted to "uta_admin".
105+
> CREATE ROLE uta_admin WITH CREATEDB;
106+
> ALTER ROLE uta_admin WITH LOGIN;
107+
> \password uta_admin
108+
Create an empty uta database
109+
> CREATE DATABASE uta WITH OWNER=uta_admin TEMPLATE=template0;
110+
That's enough setting up. Quit psql with:
111+
> \q
112+
Now you're back to your own prompt, download the gzipped uta genetics database, and upload it into psql. You'll be prompted for your password.
113+
> wget http://dl.biocommons.org/uta/uta_20180821.pgd.gz
114+
> gzip -cdq uta_20180821.pgd.gz | psql -U uta_admin -v ON_ERROR_STOP=0 -d uta -Eae
115+
The database should now be uploaded. Don't worry, you can access the database uta with uta_admin if it's uploaded by someone else.
116+
If the database returns errors when the validator runs, you will need to change the postgresql authentication methods, by editing
117+
> pg_hba.conf
118+
This file lives, on linux, in /etc/postgresql/9.3/main/pg_hba.conf but on other systems you may need to search for it.
119+
Inside the file, you should change all instances of "peer" to "md5".
120+
121+
## Setting up Seqrepo
122+
123+
Similarly, things run much faster with a local SeqRepo database. You've installed the seqrepo package with pip, but you'll need to download an actual sequence repository. These instructions assume you are using your home directory; you can put it anywhere so long as you modify the config.ini file accordingly.
124+
> mkdir seqrepo
125+
Then make a cup of tea while this command runs:
126+
> seqrepo --root-directory ~/seqrepo pull -i 2018-08-21
127+
After it finishes downloading, check it installed correctly:
128+
> seqrepo --root-directory ~/seqrepo list-local-instances
129+
130+
## Configuration
131+
132+
See the file MANUAL.md for configuration instructions.

MANUAL.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Variant Validator Operation Manual
2+
3+
## Configuration
4+
5+
Presently Variant Validator uses a combination of environment variables to configure itself. The configuration file is in /VariantValidator/configuration/config.ini and should be edited with the current user's details. Specifically, the section:
6+
> [mysql]
7+
> host = 127.0.0.1
8+
> database = validator
9+
> user = vvadmin
10+
> password = var1ant
11+
needs to be changed if the variant validator database login details are different.
12+
13+
The section
14+
> [logging]
15+
contains a single variable:
16+
> string = error file console trace
17+
which can be changed to alter the level of verbosity of the validator output. Alternatively you can set the environment variable VALIDATOR_DEBUG to a string of the same format.
18+
The string should contain any of the following words:
19+
* file - Writes the logging output to the "vvLog.txt" file. Without the word "file" in the environment variable, the logs will be posted instead to the console.
20+
* debug - Logs all events, including debugging.
21+
* trace - Used for diagnosis during development.
22+
* info - Information events on the decisions the validator is making are logged.
23+
* warning - Warnings indicate malformed variants. This is the default logging level.
24+
* error - Variants that produce errors are nonsensical to the point where they cannot be validated.
25+
* critical - Fatal errors that crash the validator are logged at this level.
26+
During a test, this is set to maximum verbosity.
27+
28+
The validator itself will set environment variables to allow for the correct operation of HGVS software.
29+
30+
## Operation
31+
32+
Python scripts importing variant validator will have to set up a last few configuration variables before they can proceed. These variables must be set in such a way that they don't go out of scope - otherwise the validator won't work.
33+
34+
This example script will validate the variant NM_000088.3:c.589G>T and then print the output as a json file. You might need to change it to point to the correct seqrepo directory.
35+
36+
> import json
37+
> import os
38+
> seqrepo_current_version = '2018-08-21'
39+
> HGVS_SEQREPO_DIR = '~/seqrepo/' + seqrepo_current_version
40+
> os.environ['HGVS_SEQREPO_DIR'] = HGVS_SEQREPO_DIR
41+
> uta_current_version = 'uta_20180821'
42+
> UTA_DB_URL = 'postgresql://uta_admin:uta_admin@127.0.0.1/uta/' + uta_current_version
43+
> os.environ['UTA_DB_URL'] = UTA_DB_URL
44+
> from VariantValidator import variantValidator
45+
> variantValidator.my_config()
46+
47+
From this point onward,
48+
> variant = 'NM_000088.3:c.589G>T'
49+
> select_transcripts = 'all'
50+
> selected_assembly = 'GRCh37'
51+
> validation = variantValidator.validator(variant, selected_assembly, select_transcripts)
52+
> print json.dumps(validation, sort_keys=True, indent=4, separators=(',', ': '))
53+
54+
Much of the script is currently reladed to setting up environment variables. In future versions, this information will be stored in a local configuration file.
55+
56+
The accepted formats for variants include:
57+
> NM_000088.3:c.589G>T
58+
> NC_000017.10:g.48275363C>A
59+
> NG_007400.1:g.8638G>T
60+
> LRG_1:g.8638G>T
61+
> LRG_1t1:c.589G>T
62+
> 17-50198002-C-A (GRCh38)
63+
> chr17:50198002C>A (GRCh38)
64+
65+
Possible assemblies are:
66+
> GRCh37
67+
> hg19
68+
> hg38
69+
70+
You can select all transcripts by passing 'all', or use multiple transcripts with:
71+
> select_transcripts = 'NM_022356.3| NM_001146289.1| NM_001243246.1'
72+
73+
Variant validator produces a dictionary output that contain all possible interpretations of the input variant.
74+
75+
View supported transcripts for a gene example: HGNC gene symbol https://www.genenames.org/
76+
> variantValidator.validator.gene2transcripts ('HTT')
77+
RefSeq Transcript
78+
> variantValidator.validator.gene2transcripts (' NM_002111.8')
79+
Get reference sequence for HGVS variant description
80+
> variantValidator.validator.hgvs2ref('NM_000088.3:c.589_594del')
81+
82+
## Unit testing
83+
84+
Variant Validator is written to be pytest-compatible. Run
85+
> pytest
86+
in the variant validator root folder, the same as that in which this file resides. The test will take several minutes to complete, but runs through over three hundred common and malformed variants.
87+
88+

README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# About Variant Validator
2+
3+
VariantValidator is a user-friendly software tool designed to validate the syntax and
4+
parameters of DNA variant descriptions according to the HGVS Sequence Variant
5+
Nomenclature.
6+
7+
VariantValidator ensures that users are guided through the intricacies of the HGVS
8+
nomenclature, e.g. if the user makes a mistake, VariantValidator automatically corrects
9+
the mistake if it can, or provides helpful guidance if it cannot. In addition,
10+
VariantValidator accurately interconverts between transcript variant descriptions and
11+
genomic variant descriptions in HGVS and Variant Call Format (VCF)
12+
13+
VariantValidator interfaces with the hgvs package to parse, format, and manipulate
14+
biological sequence variants. See https://github.com/biocommons/hgvs/ for details of the
15+
hgvs package
16+
17+
VariantValidator is a highly functional platform enabling high-throughput and embeddable
18+
utilisation of functionality of https://variantvalidator.org/
19+
20+
## Features
21+
22+
The basic functionality of https://variantvalidator.org/ and VarinantValidator is documented here https://www.ncbi.nlm.nih.gov/pubmed/28967166
23+
24+
VariantValidator simultaneously and accurately projects genomic sequence variations onto all overlapping transcript reference sequences, and vice-versa
25+
26+
Alternatively, genomic sequence variation can be projected onto a specified single, or specified subset of transcript reference sequences for any given gene
27+
28+
Projection of sequence variations between reference sequences takes account of discrepancies between genomic and transcript reference sequences, thus ensuring an accurate prediction of the effect on encoded proteins for every gene
29+
30+
For sequence variations falling within the open reading frames of genes, VariantValidator automatically projects sequence variants via the transcript reference sequence onto genome builds GRCh38, GRCh37, hg38 and hg19 (HGVS format and VCF components), including projection onto relevant Alternative genomic reference sequences, the composition of which varies between patched GRC genome builds and static hg genome builds
31+
32+
## Pre-requisites
33+
34+
Variant Validator will work on Mac OS X or Linux-compatiable computers.
35+
36+
Required software:
37+
* MySQL
38+
* Python 2.7
39+
Optional software:
40+
* Postgres version 9.5 or above, Postgres 10 is not supported.
41+
* SQLite version 3.8.0 or above
42+
43+
For installation instructions please see INSTALLATION.md
44+
45+
# Operation and configuration
46+
47+
Please see MANUAL.md
48+
49+
## License
50+
51+
Please see LICENSE.txt
52+
53+
## Cite us
54+
55+
Hum Mutat. 2017 Oct 1. doi: 10.1002/humu.23348
56+
57+
VariantValidator: Accurate validation, mapping and formatting of sequence variation descriptions.
58+
59+
Freeman PJ, Hart RK, Gretton LJ, Brookes AJ, Dalgleish R.
60+
61+
> Copyright (C) 2018 Peter Causey-Freeman, University of Leicester
62+
>
63+
> This program is free software: you can redistribute it and/or modify
64+
> it under the terms of the GNU Affero General Public License as
65+
> published by the Free Software Foundation, either version 3 of the
66+
> License, or (at your option) any later version.
67+
>
68+
> This program is distributed in the hope that it will be useful,
69+
> but WITHOUT ANY WARRANTY; without even the implied warranty of
70+
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
71+
> GNU Affero General Public License for more details.
72+
>
73+
> You should have received a copy of the GNU Affero General Public License
74+
> along with this program. If not, see <https://www.gnu.org/licenses/>.
75+
> </LICENSE>
76+
77+

0 commit comments

Comments
 (0)