Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Cutsite domain

Gabriele Girelli edited this page Aug 23, 2018 · 4 revisions

We know that a GPSeq experiment comprises multiple conditions and that not all restriction events are capture by the technique (some are lost during: sequencing, sequencing loading, ligation, purification,...). This results in non-overlapping sets of restricted sites across conditions. In other words, a cutsite might have reads in one condition and be empty in another one (NRcond with ij).

Then, four different approaches can be applied, which consider different sets of cutsites. Each approach has different pros and cons. Choosing the appropriate approach is essential to estimating the centrality, as the set of considered cutsites can be considered as the domain of the analysis.

I) Consider all genomic cutsites

When all genomic cutsites (S_G) are considered, then:

NSdef

which is the number of genomic cutsites included in w. Notice how in this case N_s is independent of the considered condition.

While this approach uses the same domain for every condition, it includes a large number of empty cutsites. Including empty cutsites flattens the mean and variance of N_R to 0, rendering the variance-based scores less effective in estimating centrality.

II) Consider all cutsites restricted in the experiment

When all cutsites restricted in the experiment are considered, then:

NSdef2

which is the number of cutsited included in w that are restricted in at least one condition. Notice how in this case N_s is independent of the considered condition, as this approach is comparable to a union of the restricted cusite sets from each condition. In other words, this approach considers the union of the domain of each condition:

SUdef

Still, including empty cutsites flattens the mean and variance of N_R to 0, rendering the variance-based scores less effective in estimating centrality.

III) Consider all cutsites restricted in a condition

When all restricted cutsites in a condition are considered, then:

NSdef3

This approach includes only the non-empty cutsites of each condition. This makes N_s condition-dependent, as each condition has a different domain. While having different domain complicates comparing the conditions, disregarding empty sites allows to use the full power of variance-based scores.

IV) Consider all cutsites restricted in all conditions

When all cutsites restricted in all conditions are considered, then:

NSdef4

This approach include only those cutsites that are non-empty in all conditions. In other words, it consider the intersection of the domain of each condition:

SIdef