Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: NumberOfTrianglesMetric computes wrong results in some cases #3

Open
BlackHawkLex opened this issue Jan 8, 2018 · 2 comments
Open
Assignees
Labels

Comments

@BlackHawkLex
Copy link

The current NumberOfTriangleMetric (https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/NumberOfTrianglesMetric.java#L41-L75) (edge iterator approach) does not compute the correct amount of triangles on all graphs.

We extended the current unit tests in our fork and apart from more simple examples, we added a test case based on a graph (see https://snap.stanford.edu/data/email-Eu-core.html) from the SNAP dataset. The actual test implementation can be found here:
https://github.com/BlackHawkLex/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/triangle/AbstractNumberOfTrianglesMetricTest.java

For running the tests associated with the NumberOfTriangleMetric in this fork, the following test case has to be run:
https://github.com/BlackHawkLex/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/triangle/EdgeIteratorNumberOfTrianglesMetricTest.java

The algorithm outputs a triangle count of 1439598, although the graph actually contains 105461 triangles according to the SNAP graph detail page: https://snap.stanford.edu/data/email-Eu-core.html.

Furthermore, from what we have tested so far it seems that it also outputs wrong counts for most on the graphs on the SemanticWebDogFood assuming that our implementation of the "Forward" algorithm is correct (which according to our test cases seems to be the case). In order to compare the results of the edge-iterator approach and our forward approach, one can add the EdgeIteratorNumberOfTrianglesMetric and the ForwardNumberOfTriangleMetric to the metrics list in the EvaluationRunner (https://github.com/BlackHawkLex/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/tools/EvaluationRunner.java) and use the getSemanticDogFoodGraphs() method to choose the right graphs for running an evaluation.

@MichaelRoeder MichaelRoeder self-assigned this Jan 9, 2018
@ghost
Copy link

ghost commented Jan 9, 2018

We did some bugfixes for our approaches, where we encountered the following problems, which seem to be the same problems for the edge iterator approach, as of the results of the according (extended) test (testOnSimpleSelfloopTriangle()
of https://github.com/BlackHawkLex/Lemming/blob/master/src/test/java/org/aksw/simba/lemming/metrics/single/triangle/EdgeIteratorNumberOfTrianglesMetricTest.java)

We use a very simple graph containing one selfloop. simple_selfloop_triangle.txt

We think that there is a check missing, so that v1 < v2 < v3 to ensure that no triangle is counted multiple times.

Another problem we had with our approaches is that a self loop together with one additional neighbor is wrongly counted as a triangle, which seems to be also the case for this approach.

@MichaelRoeder
Copy link
Member

The main difference between the triangles that are calculated in Lemming and the triangle statistics from Stanford for the email-Eu-core network is the definition of a triangle. Lemming is counting edge triangles while the statistics from Stanford seem to use node triangles.

Using the simple loop example referenced above, we have 3 nodes and 5 edges (edge: node -> node):

e_0: n_0 -> n_1
e_1: n_0 -> n_2
e_2: n_1 -> n_2
e_3: n_2 -> n_1
e_4: n_0 -> n_0

If we count node triangles (as for the statistics from Stanford), the example is easy because there is only one of them (the only one possible): (n_0, n_1, n_2).

However, in Lemming we would count 2 triangles: (e_0, e_1, e_2),(e_0, e_1, e_3).
The latter makes more sense in the world of RDF, simply because we may have a lot of nodes have more than one single edge connecting each other. That is why we decided very early in our project to use the "edge triangles".

Implementing the (simpler) node triangle counting (as done in https://github.com/dice-group/Lemming/blob/master/src/main/java/org/aksw/simba/lemming/metrics/single/SingleValueMetric.java), the result matches the number of triangles listed on the webpage of Stanford.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants