Ranks comparison

The gpseqc_compare script calculates the distance between a pair of centrality rankings, obtained with gpseqc_estimate. The workflow behind it is the following:

Read and parse ranking tables.
Subset ranking tables to the same genomic regions.
Calculate the pair-wise distance between all the scores of the two tables.
Shuffle N times the tables to generate a random distribution of distances.
Use the random distribution to calculate a p-value of the original distance.
Write output and plot.

Use the --no-test option to run only steps 1-3 with minor output and computation time.

Distance calculation

Three distance types are available in gpseqc_compare through the -d option:

kt: Kendall tau distance.
ktw: weighted Kendall tau distance.
emd: Earth Mover's Distance.

More details on the distances and how to choose at the Distances page.

Building random distribution

The random distribution is used to calculate a p-value of the original distance, which gives the probability of having a more extreme (similar or different) rank when randomly shuffling the ranks.

To obtain a proper p-value, the size of the sample used to build the random distribution must be large enough. You can change it with the -n (or --niter) option, which is 5000 by default.

Introduction
Background
- Centrality estimation
  - Cutsite domain
  - Centrality scores
- Ranks comparison
  - Distances
Installation
Usage
- Estimate centrality
- Compare ranks
Output
- gpsqec_estimate
- gpsqec_compare
Known issues
Contributing
- Contributing Guidelines
- Code of conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ranks comparison

Distance calculation

Building random distribution

Clone this wiki locally