Skip to content
This repository has been archived by the owner on Oct 15, 2020. It is now read-only.

Distances

Gabriele Girelli edited this page May 16, 2018 · 24 revisions

A ranking definition

Let tau be the ranking of N weighted elements:

tau_def

  • l_i is the label of the i-th item in tau
  • w_i is the weighte of the i-th item in tau
  • iIN indicates that tau has N elements
  • w_order indicates that the rank is based on the elements weight, and ties are allowed
  • l_unique indicates that each label l can be present only once in tau

Then:

  • Let Ifun be the index of label l in tau, such that Ifun_eq.
    Thus tau_alt_def.
  • Let Lfun denote the set of labels in tau: Lfun_eq,
    and L_i_fun denote the i-th label in rank tau.
  • Let Wfun denote the set of weights in tau: Wfun_eq,
    and W_i_fun denote the i-th weight in rank tau.

Distance between rankings

A distance d between two ranking tau1 and tau2, respectively of N_1 and N_2 elements, can be calculated only when all the labels of one rank are ranked in the other:

d_exists

In this case, eq_size.

Kendall tau (K): consider only the order

The Kendall tau distanc1 K indicates the number of swaps need to convert ![tau_1] in ![tau_2].

Kdef

Where:

Kbar_def

Weighted Kendall tau (Kw): consider the weights

While K considers only the order (i.e., counts the swaps), Kw takes into account the difference in weight of the swapped elements.

This formulation is invariant when summing the ranks to a constant (shifting), while it is affected by multiplying them for a constant (scaling). Also, it is not affected by the presence of 0 s in the ranks.

Let's define the sum of the absolute value of the pair-wise difference of rank tau as:

Dn_def

Then, we can use it for normalizing:

Kw_def

Also, given that:

range_demo_1

range_demo_2

range_demo_3

Then:

Kw_range

Please, note that if no_weight_diff, then the weight difference will always be null no_weight_diff_2 and so the calculated distance Kw_null. In such cases, please use the normal Kendall tau distance (K) instead of the weighted one.

The sum of the absolute value of the pair-wise difference of rank tau can also be defined based the distance of adjacent elements:

Dn_adjacent

Which recapitulates how each possible swap is considered only once, WHILE the distance between adjacent items is considered more (higher weight) for central items than for extreme ones.

Earth Mover's Distance (EMD)

Another way to take into account both weights and order is to use the Earth Mover's Distance2. EMD is provided in gpseqc_compare via the pyEMD package.

References

1 Chan, Timothy M., and Mihai Pătraşcu. "Counting inversions, offline orthogonal range counting, and related problems." Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2010.
2 Rubner, Yossi, and Carlo Tomasi. "The earth mover’s distance." Perceptual Metrics for Image Database Navigation. Springer, Boston, MA, 2001. 13-28.