-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathREADME
127 lines (97 loc) · 5.82 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
Call the inferelator script from the base directory (the one containing this
README) with a job config file as argument.
Example call: Rscript inferelator.R jobs/dream4_cfg.R
--------------------------------------------------------------------------------
Default parameters and a brief explanation of each one
--------------------------------------------------------------------------------
PARS$input.dir <- 'input/dream4' # path to the input files
PARS$exp.mat.file <- 'expression.tsv' # required; see definition below
PARS$tf.names.file <- 'tf_names.tsv' # required; see definition below
PARS$meta.data.file <- 'meta_data.tsv' # assume all steady state if NULL
PARS$priors.file <- 'gold_standard.tsv' # no priors if NULL
PARS$gold.standard.file <- 'gold_standard.tsv' # no evaluation if NULL
PARS$leave.out.file <- NULL # file with list of conditions that will be ignored
PARS$randomize.expression <- FALSE # whether to scramble input expression
PARS$job.seed <- 42 # random seed; can be NULL
PARS$save.to.dir <- file.path(PARS$input.dir, date.time.str) # output directory
PARS$num.boots <- 20 # number of bootstraps; no bootstrapping with a value of 1
PARS$max.preds <- 10 # max number of predictors based on CLR to pass to model
# selection method
PARS$mi.bins <- 10 # number of bins to use for mutual information calculation
PARS$cores <- 8 # number of cpu cores
PARS$delT.max <- 110 # max number of time units allowed between time series
# conditions
PARS$delT.min <- 0 # min number of time units allowed between time series
# conditions
PARS$tau <- 45 # constant related to half life of mRNA (see Core model)
PARS$perc.tp <- 0 # percent of true priors that will be used; can be vector
PARS$perm.tp <- 1 # number of permutations of true priors
PARS$perc.fp <- 0 # percent of false priors (100 = as many false priors as
# there are true priors); can be vector
PARS$perm.fp <- 1 # number of permutations of false priors
PARS$pr.sel.mode <- 'random' # prior selection mode: 'random' or 'tf'
# if 'random', the true priors are randomly chosen
# from all priors edges, if 'tf',
# PARS$perc.tp is interpreted as the percent of
# TFs to use for true priors and all interactions
# for the chosen TFs will be used
PARS$eval.on.subset <- FALSE # whether to evaluate only on the part of the
# network that has connections in the gold
# standard; if TRUE false priors will only be
# drawn from that part of the network
PARS$method <- 'BBSR' # which method to use; either 'MEN' or 'BBSR'
PARS$prior.weight <- 1 # the weight for the priors; has to be larger than 1
# for priors to have an effect
PARS$use.tfa <- FALSE # whether to estimate transcription factor activities and
# use those in the regression models
# if TRUE, interactions in priors file shoud be signed,
# i.e. -1 for repression and +1 for activation
PARS$prior.ss <- FALSE # whether to also sub-sample from the prior matrix during
# each bootstrap; if TRUE, priors are sampled randomly with
# replacement; if FALSE, all priors are used as is
PARS$output.summary <- TRUE # write a summary tsv and RData file of network
PARS$output.report <- TRUE # create html network report
PARS$output.tf.plots <- TRUE # create png files with plots of TFs and targets
--------------------------------------------------------------------------------
Required Input Files
--------------------------------------------------------------------------------
expression.tsv
--------------
expression values; must include row (genes) and column (conditions) names
tf_names.tsv
------------
one TF name on each line; must be subset of the row names of the expression data
--------------------------------------------------------------------------------
Optional Input Files
--------------------------------------------------------------------------------
meta_data.tsv
-------------
the meta data describing the conditions; must include column names;
has five columns:
isTs: TRUE if the condition is part of a time-series, FALSE else
is1stLast: "e" if not part of a time-series; "f" if first; "m" middle; "l" last
prevCol: name of the preceding condition in time-series; NA if "e" or "f"
del.t: time in minutes since prevCol; NA if "e" or "f"
condName: name of the condition
priors.tsv
----------
matrix of 0 and 1 indicating whether we have prior knowledge in
the interaction of one TF and a gene; one row for each gene, one column for
each TF; must include row (genes) and column (TF) names
gold_standard.tsv
-----------------
needed for validation; matrix of 0 and 1 indicating whether there is an
interaction between one TF and a gene; one row for each gene, one column for
each TF; must include row (genes) and column (TF) names
--------------------------------------------------------------------------------
Output Files
--------------------------------------------------------------------------------
One or more betas_frac_tp_X_perm_X--frac_fp_X_perm_X_X.RData files. One file
per true and false prior and prior weight combination. Each RData file contains
two lists of length PARS$num.boots where every entry is a matrix of betas and
confidence scores (rescaled betas) respectively.
One or more combinedconf_frac_tp_X_perm_X--frac_fp_X_perm_X_X.RData files with
one matrix each. The matrix is the rank-combined version of the confidence
scores of all bootstraps.
A params_and_input.RData file with data objects holding the user set parameters,
and input and input-derived objects.