GPSeq in a nutshell

Data description

Let's focus on a GPSeq experiment comprising n+1 conditions :

Edef

where with D_0 we indicate a negative condition with no restricted cutsites and, thus, no reads. Each condition D_i is characterized by different number of cells N_C(D_i) and reads N_R(D_i) , with N_C(D_0)>0 and N_R(D_0)=0 .

It is important to notice how the maximum resolution of a GPSeq experiment is at single-cutsite, as the cutsite is where de-duplicated reads are located. In other words, the cutsite is GPSeq's unit of measure.

Depending on the restriction enzyme, e.g., 4 bp or 6 bp cutter, cutsites are more or less sparse, allowing to achieve a higher or lower theoretical maximum resolution ( TMR ). The real maximum resolution ( RMR ) is always lower than the theoretical maximum resolution ( RMR<TMR ), as some reads are lost during sequencing and some cutsites are never digested.

Also, notice that each cutsite can have up to N_C(D_i) reads as, after de-duplication, a read represents a digestion event occurring in one cell.

Probability of restriction

Let's define a genomic region wdef located on chromosome between the genomic coordinates (first) and (last). Taking into account that GPSeq captures restriction events, and that we focus only on restricted cutsite, we can define the probability of digesting as:

Pdef

where N_R(w,D_i) is the number of de-duplicated reads mapping to in condition D_i , and N_s(w,D_i) is the number of cutsites in in condition D_i considered in the analysis.

In other words, we normalize the number of restriction events in a genomic region by the number of restriction events in the condition and the number of considered cutsites in the region itself; this makes P(w,D_i) comparable across different regions and different conditions.

The number of reads in the region in condition D_i is:

NRdef

where s_i is the ![i]-th cutsite in . Remember that a cutsite sdef is, essentially, a small region; then:

sderiv

In other words, we consider a cutsite ![s] as belonging to region when region and site are on the same chromosome and the site start position is included in the region.

Additionally, we degine a non-empty cutsite as a cutsite with at least a mapped de-duplicated read, corresponding to a cutsite being restricted in a cell and being sampled during the sequencing run.

NRcond

Restriction event count: average and variance

Let's consider a genomic region comprising units (single/grouped-cutsites, see 2.d), we can define a mean () and variance () of the restriction events count in the units of the window.

Edef

Vdef

Please, not that is a sample variance (normalized over k-1 ).

Introduction
Background
- Centrality estimation
  - Cutsite domain
  - Centrality scores
- Ranks comparison
  - Distances
Installation
Usage
- Estimate centrality
- Compare ranks
Output
- gpsqec_estimate
- gpsqec_compare
Known issues
Contributing
- Contributing Guidelines
- Code of conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPSeq in a nutshell

Data description

Probability of restriction

Restriction event count: average and variance

Clone this wiki locally