-
Notifications
You must be signed in to change notification settings - Fork 2
Estimate centrality
The gpseqc_estimate
script allows to estimate regional nuclear centrality based on a multi-condition GPSeq experiment. Run gpseqc_estimate -h
for more details on the parameters, and gpseqc_estimate -H
for a more readable description of the pipeline.
The minimum input of the gpseqc_estimate
script are an output directory (specified with the -o
option) and at least 2 (or 3 when using -n
) bedfiles that should be provided in order of increasing conditions of digestion.
- For chromosome wide bins: use the default, no need to change anything.
- For sub-chromosomal non-overlapping bins: set the size of your bins using the
-s
option. - For sub-chromosomal overlapping bins: set the size and step of your bins with the
-s
and-p
options respectively. - To estimate centrality of a custom list of bins: provide a bed files with the bins of interest using the
-b
option.
If you do not sequence with deep coverage, it is advisable to group the reads into bins (groups) that will be used as cutsites. Specify the size of the groups with the -g
option.
Use the -n
option to automatically normalize over the last condition (expected to be at least over-night, i.e., assumed to approximate accessibility).
Outliers (cutsites with an abnormal number of unique reads) in the input bed files can be automatically removed.
Use the --bed-outlier
flag to specify the outlier detection method: Z
(Z-score), t
(t-student, as Z*sqrt(n-2)/sqrt(n-1-Z^2)), chi2
(chi-square, as Z^2, default), IQR
or MAD
. Specify the significance level with --bed-alpha
(default 0.01) or the limit, for the IQR
mode only, with --bed-lim
(default: 1.5). Other flags:
-
-k
to turn off the outlier detection/filter. -
-C
to remove only outliers common to all bed files, not all outliers.
It is possible to mask both input or output. To mask the input bed files, use the -m
option to provide a bed file with the regions to be masked. To mask the output files, provide a similar bed file with the -M
option. In the latter case, the mask is applied only to the estimated.*.tsv
, ranked.*.tsv
and rescaled.*.tsv
output files, not to the combined.*.tsv
one.
To change the cutsites considered in the analysis, use the -c
option and select one of the following:
-
Universe: all cutsites in the genome are considered. Requires a bed file containing the cutsite locations, provided with the
-l
option. - Union: cutsites restricted in any of the conditions are considered.
- Separated (default): the cutsite domain is condition specific and includes all the cutsites restricted in that condition.
- Intersection: only cutsites restricted in all conditions are considered.
-
-t
to provide a number of threads to be used for parallelization (default: 1). -
-r
and-u
to proived respectively a prefix/suffix to the output files. -
-d
to trigger debug mode. -
-e
to specify a list of scores not to be calculated. -
-i
to specify the only scores that should be calculated. -
-T
to provide the path to the temporary folder. If the temporary folder is located on a solid state drive the pipeline will take shorter times than with normal drives. -
-O
to select the outlier detection method. By default, only outliers common to all the provided bed files are removed.
GPSeqC v2.3.3
is published under the MIT License - Copyright (c) 2017-18 Gabriele Girelli