Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Ranks comparison

Gabriele Girelli edited this page Aug 23, 2018 · 2 revisions

The gpseqc_compare script calculates the distance between a pair of centrality rankings, obtained with gpseqc_estimate. The workflow behind it is the following:

  1. Read and parse ranking tables.
  2. Subset ranking tables to the same genomic regions.
  3. Calculate the pair-wise distance between all the scores of the two tables.
  4. Shuffle N times the tables to generate a random distribution of distances.
  5. Use the random distribution to calculate a p-value of the original distance.
  6. Write output and plot.

Use the --no-test option to run only steps 1-3 with minor output and computation time.

Distance calculation

Three distance types are available in gpseqc_compare through the -d option:

  • kt: Kendall tau distance.
  • ktw: weighted Kendall tau distance.
  • emd: Earth Mover's Distance.

More details on the distances and how to choose at the Distances page.

Building random distribution

The random distribution is used to calculate a p-value of the original distance, which gives the probability of having a more extreme (similar or different) rank when randomly shuffling the ranks.

To obtain a proper p-value, the size of the sample used to build the random distribution must be large enough. You can change it with the -n (or --niter) option, which is 5000 by default.