forked from tomclegg/get-evidence
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathINSTALL
313 lines (229 loc) · 11.1 KB
/
INSTALL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
Install prerequisites:
sudo apt-get install \
apache2 apache2-threaded-dev ca-certificates cron curl git-core \
libapache2-mod-php5 libapache2-mod-python libmysqlclient15-dev \
make mysql-client mysql-server patch \
php5 php5-curl php5-dev php5-gd php5-mysql php-db \
python-biopython python-dev python-mysqldb python-pyrex \
rsync unzip wget zip at --fix-missing
sudo a2enmod php5
sudo a2enmod rewrite
sudo a2enmod expires
sudo a2enmod negotiation
sudo /etc/init.d/apache2 restart
Remove www-data from /etc/at.deny
sudo perl -ni~ -e 'print unless /www-data/' /etc/at.deny
If apt-get python-biopython fails, ensure that you have universe in
your apt source list.
If your python is older than 2.6, install "multiprocessing" from
http://pypi.python.org/pypi/multiprocessing/#downloads
wget http://pypi.python.org/packages/source/m/multiprocessing/multiprocessing-2.6.2.1.tar.gz
tar xzf multiprocessing-2.6.2.1.tar.gz
cd multiprocessing-2.6.2.1
sudo python setup.py install
Clone get-evidence from github:
git clone git://git.clinicalfuture.com/get-evidence.git
Download and extract php-openid-2.1.3 and textile-2.0.0 and apply patch(es):
cd ~/get-evidence
make install
Set MySQL server character set:
sudo perl -pi~ -e '
s:\n:\ndefault-character-set = utf8\n: if m:\[(client|mysqld)\]:;
' /etc/mysql/my.cnf
sudo /etc/init.d/mysql restart
Create MySQL db and user (change "shakespeare" to be your own password,
note that it will be used later in scripts):
mysql -u root -p
[type in MySQL root password]
create database evidence character set = utf8;
create user evidence@localhost identified by 'shakespeare';
grant all privileges on evidence.* to evidence@localhost;
exit
Create a directory where we store uploaded genomes and analysis, store
environment variables for associated subdirectories, and create them:
sudo mkdir /home/trait
sudo chown www-data:www-data /home/trait
Point apache DocumentRoot to public_html and turn on .htaccess support (replace
/path/to/get-evidence here with the real path to your local git repo, and
/path/to/your/trait/data/directory with the path to the directory where you
will store uploaded data and analysis data - /home/trait in the example above):
DocumentRoot /path/to/get-evidence/public_html
<Directory /path/to/get-evidence/public_html>
AllowOverride All
# Restrict PHP access to the html directory of this user!
php_admin_value open_basedir "/path/to/get-evidence:/path/to/your/trait/data/directory:/usr/share/php:/tmp:/dev/urandom"
php_value include_path ".:/path/to/get-evidence/public_html:/usr/share/php"
</Directory>
The latest version of Ubuntu disables php interpretation in home
directories. Open /etc/apache2/mods-available/php5.conf and if you see
"To re-enable php in user directories..." then go ahead and comment out
the lines specified.
Put real database password and data directory path (see below) in
public_html/config.php like this (but make sure there is no leading space or
anything else before "<?php")
<?php
$gDbPassword = "shakespeare";
$gBackendBaseDir = "/home/trait"; // (can omit if using default)
?>
Visit http://{host}/install.php to create tables.
Download and import GET-Evidence's SQL dump:
wget http://evidence.personalgenomes.org/get-evidence.sql.gz
mysql -u root -p evidence < get-evidence.sql
NOTE (MPB 2010-09-19): Why aren't dbSNP and GeneTests data in the SQL dump?
Add GeneTests data:
cd ~/get-evidence
mkdir tmp
sudo wget -O/home/trait/data/genetests-data.txt \
ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
sudo chown www-data /home/trait/data/genetests-data.txt
./import_genetests_data.php /home/trait/data/genetests-data.txt
Add dbSNP data (newer versions of dbSNP should work just as well):
wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/b130_archive/b130_SNPChrPosOnRef_36_3.bcp.gz
./import_dbsnp.php tmp/dbsnp.bcp.gz
Add OMIM data:
cd ~/get-evidence
make import_omim
Make sure genome analysis server is executable:
cd ~/get-evidence/server/
chmod +x genome_analyzer.py
Modify php.ini settings to enable genome uploads and extend idle
session timeouts beyond the default 24 minutes. In ubuntu you can
either edit /etc/php5/apache2/php.ini or create
/etc/php5/conf.d/get-evidence.ini with the following values:
magic_quotes_gpc = Off
max_input_time = 600
post_max_size = 512M
upload_max_filesize = 512M
memory_limit = 128M
session.gc_maxlifetime = 172800
If you use debian PHP modifications, make sure your
/usr/lib/php5/maxlifetime script checks all *.ini files in config
directories (like PHP itself does), not just php.ini. You might need
to make this change:
-for ini in /etc/php5/*/php.ini; do
+for ini in /etc/php5/*/*.ini; do
Alternatively, you can create a php.ini file in a new
/etc/php5/get-evidence/ directory, and put the session.gc_maxlifetime
setting in there as well. PHP will ignore it but the debian script
will pay attention to it, even if your modified maxlifetime script
gets overwritten by an upgrade.
sudo mkdir /etc/php5/get-evidence
echo 'session.gc_maxlifetime = 172800' | sudo tee /etc/php5/get-evidence/php.ini
Populate the upload directory with its initial directory structure:
cd ~/get-evidence/server/script/
USER=www-data SOURCE=$HOME/get-evidence CORE=$HOME/get-evidence/server \
CONFIG=/home/trait/config TMP=/home/trait/tmp \
DATA=/home/trait/data UPLOAD=/home/trait/upload LOG=/home/trait/log \
BASE_URL=http://localhost/ ./configure.sh
source ~/get-evidence/server/script/config-local.sh
sudo -u $USER mkdir -p $TMP $UPLOAD $LOG $CONFIG $DATA
Log in as root, load environment variables and set up genome analysis server:
cd ~/get-evidence/server/script
sudo su
source defaults.sh
perl -p -e 's/%([A-Z]+)%/$ENV{$1}/g' \
< $SOURCE/server/script/genome-analyzer.in \
> /etc/init.d/genome-analyzer.tmp
chmod 755 /etc/init.d/genome-analyzer.tmp
chown 0:0 /etc/init.d/genome-analyzer.tmp
mv /etc/init.d/genome-analyzer.tmp /etc/init.d/genome-analyzer
update-rc.d genome-analyzer start 20 2 3 4 5 . stop 80 0 1 6 .
exit
Run install-user.sh as www-data (this includes some file downloads):
cd ~/get-evidence/server/script/
source config-local.sh
sudo -u $USER ./install-user.sh
Build python extensions:
cd ~/get-evidence/server
python setup.py build_ext --inplace
Start genome analysis server:
sudo /etc/init.d/genome-analyzer start
Set up cron job to run "make" periodically.
echo "12 3 * * * $USER cd $HOME/get-evidence && make daily" | sudo tee /etc/cron.d/get-evidence
Run through the daily make once to set up the flat files, some of which
GET-Evidence will expect to find.
cd ~/get-evidence
make daily
------
The following are old instructions created prior to Trait-o-matic
integration. Some of it may still be useful, so it's left here in case
it may be informative (e.g. providing a record on how to make the
tables found in the SQL dump). It is *not* required to use the current
version of GET-Evidence.
Run "make" to import genomes from Trait-o-matic.
make
Import dbSNP:
wget -Otmp/dbsnp.bcp.gz ftp://ftp.ncbi.nih.gov/snp/database/organism_data/human_9606/b130_SNPChrPosOnRef_36_3.bcp.gz
gunzip dbsnp.bcp.gz
./import_dbsnp.php tmp/dbsnp.bcp.gz
wget -Otmp/snp130.txt.gz http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.txt.gz
wget -Otmp/snp130.sql http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/snp130.sql
mysql -uevidence -p evidence < tmp/snp130.sql
if [ -e tmp/fifo ]; then rm tmp/fifo; fi
mkfifo tmp/fifo
gunzip < tmp/snp130.txt.gz > tmp/fifo &
echo "load data local infile 'tmp/fifo' into table snp130 fields terminated by '\t' lines terminated by '\n'" | mysql -uevidence -p evidence
Import PharmGKB data:
wget -Otmp/variantAnnotations.zip http://www.pharmgkb.org/commonFileDownload.action?filename=variantAnnotations.zip
(cd tmp && unzip variantAnnotations.zip)
./import_pharmgkb.php tmp/variantAnnotations.tsv
Import OMIM data using omim.tsv from Trait-o-matic import process:
./import_omim.php omim.tsv
Import gwas data using spreadsheet downloaded from web site (first
convert from proprietary format to comma-separated, optionally
doublequoted, fields):
** IMPORTANT: the ordering of the following import steps is relevant.
**
** Run import_genomes.php first (see above)
** Then import_variant_locations.php
** Then import_gwas.php
** (relies on variant_locations to look up chr,pos->AA and add variants)
** Then import_1000genomes.php
** (discards too many allele freqs if import_gwas hasn't added variants)
** Then update_variant_frequency.php
** (merges frequencies from hapmap/import_genomes and import_1000genomes)
Look up gene/aa changes for GWAS SNPs:
1. perl -ne 'print "$1\n" while /\brs(\d+)\b/g' < gwas.csv \
| sort -u > /tmp/gwas.rs
2. on trait-o-matic host, using dbsnp database:
CREATE TEMPORARY TABLE acgt (allele CHAR(1) PRIMARY KEY);
INSERT INTO acgt VALUES ('A'),('C'),('G'),('T');
CREATE TEMPORARY TABLE gwas_rs (gwas_snp_id INT UNSIGNED PRIMARY KEY);
LOAD DATA LOCAL INFILE '/tmp/gwas.rs' INTO TABLE gwas_rs;
ALTER TABLE gwas_rs ADD chr CHAR(6), ADD chr_pos INT UNSIGNED;
UPDATE gwas_rs
LEFT JOIN SNPChrPosOnRef dbsnp
ON snp_id=gwas_snp_id
SET gwas_rs.chr=dbsnp.chr,
gwas_rs.chr_pos=dbsnp.pos+1;
SELECT * FROM gwas_rs INTO '/tmp/gwas.chr';
SELECT CONCAT('chr',chr),'gwas','SNP',chr_pos,chr_pos,'.','+','.',
CONCAT('alleles ',allele,';dbsnp rs',gwas_snp_id)
FROM gwas_rs
LEFT JOIN acgt ON 1=1
WHERE chr IS NOT NULL AND chr NOT LIKE 'Multi%'
INTO OUTFILE '/tmp/gwas.gff.txt';
3. upload /tmp/gwas.gff to Trait-o-matic
4. download nsSNPs from Trait-o-matic results page and save to /tmp/gwas.ns.gff
5. ./gwas_gff2tsv /tmp/gwas.ns.gff > /tmp/gwas.ns.tsv
6. ./import_variant_locations.php /tmp/gwas.ns.tsv
7. copy ns.json from Trait-o-matic output directory and save to /tmp/gwas.ns.json
8. ./import_hapmap_ns_json.php /tmp/gwas.ns.json
Import the gwas comments for "other external references"
./import_gwas.php gwas.csv
Download 1000genomes data (*.hap.*) from
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/2009_04/
Import 1000genomes data:
./import_1000genomes.php /tmp/*.hap.2009_04.gz
Import EVS frequency data. See ./evs-import.txt, then:
./evs-getev-reformat.pl tmp/ESP5400_getev-aa-changes_allele_freqs.txt | ./import_variant_frequency.php /dev/stdin EVS
Merge variant frequencies from multiple databases:
./update_variant_frequency.php
Import genenames database
mkdir tmp
wget -O./tmp/genenames.txt 'http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids'
./import_genenames.php ./tmp/genenames.txt
Import genetests database
wget -O./tmp/genetests-data.txt \
ftp://ftp.ncbi.nih.gov/pub/GeneTests/data_to_build_custom_reports.txt
./import_genetests_data.php ./tmp/genetests-data.txt