-
Notifications
You must be signed in to change notification settings - Fork 2
Cutsite domain
We know that a GPSeq experiment comprises multiple conditions and that not all restriction events are capture by the technique (some are lost during: sequencing, sequencing loading, ligation, purification,...). This results in non-overlapping sets of restricted sites across conditions. In other words, a cutsite might have reads in one condition and be empty in another one ( with
).
Then, four different approaches can be applied, which consider different sets of cutsites. Each approach has different pros and cons. Choosing the appropriate approach is essential to estimating the centrality, as the set of considered cutsites can be considered as the domain of the analysis.
When all genomic cutsites () are considered, then:
which is the number of genomic cutsites included in . Notice how in this case
is independent of the considered condition.
While this approach uses the same domain for every condition, it includes a large number of empty cutsites. Including empty cutsites flattens the mean and variance of to 0, rendering the variance-based scores less effective in estimating centrality.
When all cutsites restricted in the experiment are considered, then:
which is the number of cutsited included in that are restricted in at least one condition. Notice how in this case
is independent of the considered condition, as this approach is comparable to a union of the restricted cusite sets from each condition. In other words, this approach considers the union of the domain of each condition:
Still, including empty cutsites flattens the mean and variance of to 0, rendering the variance-based scores less effective in estimating centrality.
When all restricted cutsites in a condition are considered, then:
This approach includes only the non-empty cutsites of each condition. This makes condition-dependent, as each condition has a different domain. While having different domain complicates comparing the conditions, disregarding empty sites allows to use the full power of variance-based scores.
When all cutsites restricted in all conditions are considered, then:
This approach include only those cutsites that are non-empty in all conditions. In other words, it consider the intersection of the domain of each condition:
GPSeqC v2.3.3
is published under the MIT License - Copyright (c) 2017-18 Gabriele Girelli