Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #10

ichorid · 2024-10-03T19:36:22Z

Line 201 in efc336f

CMD_MUTUAL_SCORES => {

This command is used in Tentura to show mutual rankings of the ego and peers, e.g., if the ego is Alice, what is the score of Bob from the standpoint of Alice (normal direct MR), and what is the score of Alice from the standpoint of Bob (reverse MR). The idea is that the user is able to evaluate the level of "reciprocity" with their peers. In Tentura, we use a special "dumbbell" representation to show it visually:

The widget indicates the "intensity" of interaction by the color of the "dumbbell", and the "weights" of personal preferences by sizes of the circles 🟢-🟢.

The normalization problem

The problem is that MR scores can have a very high dynamic range [0,1], which makes them very hard to compare due to their very different magnitudes. For instance, Bob's score from Alice's standpoint could be 0.01, while Alice's score from Bob's standpoint could be 0.00001—which translates to "dumbbells" being always the same, minimal size.
The mutual scores could be described as a kind of "scatter plot" with all the points concentrated near zero and highly skewed from diagonal

So, we need to come up with a way to normalize those in the [0.0-1.0] range in a way that will make a visual difference when visualized with the "dumbbell" method.

ichorid · 2024-10-11T17:01:18Z

One possible approach, as indicated by @alexandrim0 is to apply k-means to group a fixed number (e.g., first N) score results into a fixed number of groups (k) and assign "grades" to nodes based on its group number. Something like:

get_scores(N=10, k=3) - "get first N=10 scores and group them into k=3 grades"
result is:

"alice": 1
"bob": 1
"carol": 2
"sybil": 3

ichorid · 2024-11-04T17:25:43Z

Recalculating k-means groupings for all the scores from an ego's standpoint is too much: we need to optimize/approximate the process. One way to do it is as follows:

only do full k-means grouping rarely, and
cache "group bounds" after each full grouping (e.g. |---g1--|-g2---|----g3|) - it is possible due to our data being 1-dimensional
on object retrieval compare the score of the object to the bounds and assign it to the corresponding group.

Of course, the algorithm is an approximation, and it is only valid under the hypothesis that the distribution of scores for an ego changes slowly. (I suggest talking to ChatGPT about devising proper formulae for calculating the group boundaries, and for estimating when to do a full calculation).

vberezhnev self-assigned this Oct 4, 2024

This was referenced Oct 21, 2024

Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #13

Closed

Fix mutual scores #15

Closed

Feature/mutual scores rustfmt #20

Closed

vberezhnev linked a pull request Oct 31, 2024 that will close this issue

Implement kmeans with kmeans++ #24

Draft

ichorid assigned automainint Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #10

Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #10

ichorid commented Oct 3, 2024 •

edited

Loading

ichorid commented Oct 11, 2024

ichorid commented Nov 4, 2024

Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #10

Add normalization of mutual stats to meritrank-service "MUTUAL_SCORES" #10

Comments

ichorid commented Oct 3, 2024 • edited Loading

The normalization problem

ichorid commented Oct 11, 2024

ichorid commented Nov 4, 2024

ichorid commented Oct 3, 2024 •

edited

Loading