-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
225 lines (158 loc) · 8.12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
____ _ _____
/ ___| ___| | ___ __|_ _|
\___ \ / _ \ |/ _ \/ __|| |
___) | __/ | __/ (__ | |
|____/ \___|_|\___|\___||_|
Created by RidgeLab Group, BYU Bioinformatics
Table of Contents
-----------------
I. Introduction
II. Installation Instructions
III. Usage Instructions and Examples
IV. Funding and Acknowledgements
V. Contact
I. Introduction
---------------
SelecT is a software tool developed to ___.
Please see our paper in __journal__ for further information:
http://sub-domain.domain.tld/some/path/to/resource
II. Installation Instructions
-----------------------------
To install SelecT, first ensure the Java Runtime Environment (JRE) is installed
on your machine. Second, download the software from the git repository as
follows:
git clone https://github.com/ridgelab/SelecT.git
III. Usage Instructions and Examples
-------------------------------------
Instructions are created for use on a high-performance computing cluster.
Modifications for individual setup may be necessary. The pipeline has been
divided into three phases.
Please note, a log file will be created in the current working directory when
any part of SelecT is run. Should the same part of SelecT be run again in the
same directory, a number (first `1', then `2', etc.) will be appended to the new
logfile name so as to avoid collisions.
--------------------------------
| PHASE 1 -- Environment Setup |
--------------------------------
Required Positional Arguments:
[1] Data Directory Directory should contain all phased VCF or
HAP/LEGEND file required for selection analysis. File names must
contain proper flags and file extensions. Include optional (but
highly recommended) Ancestral data embedded in VCF files or as
separate LEGEND/EMF file
[2] Map Directory Directory that contains all required genetic map
files for SelecT analysis. File names must contain proper
chromosome flags.
[3] Start Chromosome Must be a number between 1-22; sex chromosomes not
yet supported.
[4] End Chromosome Must be a number between 1-22 and greater or equal
to Start Chromosome; sex chromosomes not yet supported.
[5] Target Population Population identifier for experimental population
TST can be used if no standard indentifier exists
[6] Cross Population Population identifier for cross population TST can
be used if no standard indentifier exists. Cross Population
cannont be the same as Target Population.
Optional Arguments:
--out_pop Outgroup Population Population identifier for outgroup
population. TST can be used if no standard indentifier
exists. Outgroup Population cannont be the same as Target
Population.
--working_dir Working Directory Defines the directory where SelecT will
create a new working directory. Default is current
directory.
--win_size Window Size For changing SelecT analysis window size (in
megabases. Default is 0.5Mb.
Examples:
java -Xmx[MB]m -jar EnviSetup.jar [1] [2] [3] [4] [5] [6] \
--working_dir=path/to/directory
java -Xmx3000m -jar EnviSetup.jar example/haplegend_data example/map 21 21 CEU YRI \
--working_dir=example
-----------------------------------
| PHASE 2 -- Calculate Statistics |
-----------------------------------
Required Positional Arguments:
[1] Working Directory SelecT working directory created in Phase 1.
Working Directory name can be changed but subdirectory names
must be unchanged.
[2] Simulation Directory Directory where simulations can be found.
Must contain simulation file neutral_simulation.tsv and
selection_simulation.tsv. These can be found here:
https://github.com/ridgelab/SelecT/tree/master/example/sim
[3] Chromosome Chromosome number where window can be found
[4] Window Number Window index number as defined by SelecT evironment
setup. See SelecT_workspace/envi_files/all_wins for window
ranges.
Optional Arguments:
-inon Non-absolute iHS Runs iHS score probabilities where large
negative scores ONLY are associated with selection (replicate
CMS_local). Defualt is large positive AND negative iHS scores
equate to greater selection (Voight).
--prior_prob Prior Probability Set Prior Probability to a custom value
between 0.0 and 1. Defaults to (1 / actual number of
variants within window).
-pp Prior Probability Flag Sets Prior Probability to 1/10,000
(replicate CMS_local).
--daf_cutoff DAF Cutoff Defines the derived dllele frequency cutoff
for compose score. Defaults to a DAF value of 0.2. Special
case: DAF value of 0.0 indicates incomplete MoP score
calculation, PoP is unchanged.
Exmaples:
java -Xmx[MB]m -jar StatsCalc.jar [1] [2] [3] [4]
java -jar StatsCalc.jar example/SelecT_workspace example/sim 21 2
Note, this could be run by hand on a local machine, but we imagine automating
the process a bit. A simple example SLURM script is provided (`selection.slurm')
for running StatsCalc.jar for a single window. A possible bash script for
submitting this SLURM script for each window to a high-performance computing
cluster running SLURM might look like this:
#! /bin/bash
chromosome=21
for window in {0..3}
do
sbatch selection.slurm $chromosome $window
done
exit 0
-----------------------------------
| PHASE 3 -- Analyze Significance |
-----------------------------------
Required Positional Arguments:
[1] Working Directory SelecT working directory created in Phase 1.
Working Directory name can be changed but subdirectory names
must be unchanged.
[2] Chromosome Chromosome number where window can be found
Optional Arguments:
-co Combine Only Only runs first half of analysis where windows are
combined into one file.
--combine_fltr Combine Filter Uses a specific filter for printing specific
combination of stats in output. Can only be used in conjunction
with the -co flag i=iHS, x=XPEHH, h=iHH, dd=dDAF, d=DAF, f=Fst,
up=unstd_PoP, um=unstd_MoP, p=PoP, m=MoP. Each tag should be
separated by a colon (i.e. i:x:h:dd:d:f:up:um:p:m).
--p_value p_Value Sets the p-value cutoff for significance check on
composite scores. Defaults to 0.01
-wc Write Combine Similar to -co, but also runs significance filtering
-rn Normalization Runs normalization step across the entire
dataset/chromosome. Normalizes by standardization (mean 0;
standard deviation 1).
-ui Use Incomplete Use incomplete data when analyzing MoP scores
-im Ignore MoP Ignores all MoP scores and finds significance based
upon PoP only.
-ip Ignore PoP Ignores all PoP scores and finds significance based
upon MoP only. If both -im and -ip flags are present
significance is found by looking at either PoP or MoP. If
neither -im or -ip flag is present significance is found by
looking at both PoP and MoP.
Exmaples:
java -Xmx[MB]m -jar SignificanceAnalyzer.jar [1] [2]
java -jar SignificanceAnalyzer.jar example/SelecT_workspace 21
IV. Funding and Acknowledgements
-------------------------------
Funding for the research and production of this software was provided by
startup funds to Perry Ridge, Ph.D.
V. Contact
-----------
For questions, comments, concerns, feature requests, suggestions, etc., please
contact:
Hayden Smith -- smithinformatics@gmail.com
Pery Ridge, Ph.D. -- perry.ridge@byu.edu
Note: For usage questions, please consult section `III. Usage Instructions and
Examples' first.