-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of reconnectOrphanedNodes #359
Conversation
…nnection targets to nodes that were reachable by the entry node at the start of the pass. Instead of using exclusion bits for connection targets, perform several rounds of resumes and post-filter for connectionTargets. Log basic debugging information when reconnecting orphaned nodes by introducing slf4j-api.
no need for resuming the search again add backlinking of new edges from search
48ac258
to
c47b907
Compare
Looks good overall. I'm not sure about switching from excludebits to search/resume, especially since we end up with code that uses both. Here's my attempt to make excludebits less painful. |
I'm seeing poor reconnect behavior via searches with the added commits. I'll update once I figure out what's going on. |
Try without the backlinking? |
…ch already matches connectionTargets
It was a simple fix once I cleared my mind a bit -- no need to invert the bits on the search. I ran some tests with the changes, as I like the simplicity of not having both resumes/bits to include, but it negates most of the performance benefit. For the graphs I was testing with (approximately 1 million nodes, with hundreds of thousands of disconnected nodes), on my test machine, my original PR can complete reconnection with 0 disconnected nodes in around 12 minutes. With the revised PR, it is about an order of magnitude slower at around 2 hours. I think this is because the searches using include bits walk a very significant portion of the graph, as they need beamWidth connected results. The resume approach only needs one connected result to not have to resume, so most connections happen with a small number of resumes. |
Okay, I added back the split search. It retains most of the simplicity I think. If it's still slow then again I blame backlink. :) |
(I changed the |
…hbors found via search by the connected set
Pushed another commit to recover some performance, as the recent round still left things 50% slower than the original commit. But, it looks like we get to keep backlink! The split between connected nodes/global connection targets appears to be important. When they're unified, initialized to the first pass of connected nodes, we don't benefit from improved connectivity on future passes, slowing down their performance meaningfully. I've also found that filtering candidates discovered via search using connectedNodes is harmful, as we already have connectivity by virtue of being discovered via search. |
okay, what's your preferred version at this point? |
@jbellis -- tip of this branch as-is. I think your other changes are good, so this is somewhat simplified relative to the initial PR + speed of the original + backlinking. |
LGTM nit: can we combine the two connectToClosestNeighbors by passing Bits.ALL as connectedNodes to the four-parameter version? |
…hen connecting through search
Good call on the nit. Pushed and merging when CI is clean. |
reconnectOrphanedNodes's ability to produce connected graphs was improved by #335. One change in particular produces a performance regression when reconnecting large (1M+ node graphs) with a sizeable partitioning (e.g., only ~600K nodes are reachable from the entry point, meaning 400k nodes need to be reconnected).
The change is as follows:
This works very well when a small number of nodes need to be reconnected, as the set of connection targets is small. When it's large, like when reconnecting 400k nodes, the performance of the searches is extremely poor.
This PR proposes several changes: