-
Notifications
You must be signed in to change notification settings - Fork 2
Distances
Let be the ranking of weighted elements:
- is the label of the -th item in
- is the weighte of the -th item in
- indicates that has elements
- indicates that the rank is based on the elements weight, and ties are allowed
- indicates that each label can be present only once in
Then:
- Let be the index of label in , such that .
Thus . - Let denote the set of labels in : ,
and denote the -th label in rank . - Let denote the set of weights in : ,
and denote the -th weight in rank .
A distance between two ranking and , respectively of and elements, can be calculated only when all the labels of one rank are ranked in the other:
In this case, .
The Kendall tau distanc1 indicates the number of swaps need to convert ![tau_1] in ![tau_2].
Where:
While considers only the order (i.e., counts the swaps), takes into account the difference in weight of the swapped elements.
This formulation is invariant when summing the ranks to a constant (shifting), while it is affected by multiplying them for a constant (scaling). Also, it is not affected by the presence of s in the ranks.
Let's define the sum of the absolute value of the pair-wise difference of rank as:
Then, we can use it for normalizing:
Also, given that:
Then:
Please, note that if , then the weight difference will always be null and so the calculated distance . In such cases, please use the normal Kendall tau distance () instead of the weighted one.
The sum of the absolute value of the pair-wise difference of rank can also be defined based the distance of adjacent elements:
Which recapitulates how each possible swap is considered only once, WHILE the distance between adjacent items is considered more (higher weight) for central items than for extreme ones.
Another way to take into account both weights and order is to use the Earth Mover's Distance2. EMD is provided in gpseqc_compare
via the pyEMD
package.
1 Chan, Timothy M., and Mihai Pătraşcu. "Counting inversions, offline orthogonal range counting, and related problems." Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2010.
2 Rubner, Yossi, and Carlo Tomasi. "The earth mover’s distance." Perceptual Metrics for Image Database Navigation. Springer, Boston, MA, 2001. 13-28.
GPSeqC v2.3.3
is published under the MIT License - Copyright (c) 2017-18 Gabriele Girelli