Skip to content

Conversation

@sam-herman
Copy link
Contributor

@sam-herman sam-herman commented Oct 15, 2025

Description

  • Harden test for the cases where heap graph reconstruction and ravv to graph ordinal mappings are not the identity mapping.
  • Fix bug when graph ord mapping is used with the remmaped RAVV
  • Move to remmaped interface for buildAndMerge to avoid bugs of remmpaing ords twice
  • Add interface for providing the executor pools for the buildAndMerge method -important for downstream dependencies with centralized resource management

@github-actions
Copy link
Contributor

Before you submit for review:

  • Does your PR follow guidelines from CONTRIBUTIONS.md?
  • Did you summarize what this PR does clearly and concisely?
  • Did you include performance data for changes which may be performance impacting?
  • Did you include useful docs for any user-facing changes or features?
  • Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
  • Did you trigger and review regression testing results against the base branch via Run Bench Main?
  • Did you adhere to the code formatting guidelines (TBD)
  • Did you group your changes for easy review, providing meaningful descriptions for each commit?
  • Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

@sam-herman sam-herman force-pushed the harden-tests-for-heap-graph-reconstruction branch from e3fd3be to ae4120c Compare October 29, 2025 02:24
fix minor bug in construction

Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@sam-herman sam-herman force-pushed the harden-tests-for-heap-graph-reconstruction branch from ae4120c to 98507a5 Compare October 29, 2025 02:35
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@marianotepper
Copy link
Contributor

There seems to be a failure from one of the tests that this PR is adding. In testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping

…l calculation

Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@sam-herman
Copy link
Contributor Author

There seems to be a failure from one of the tests that this PR is adding. In testIncrementalInsertionFromOnDiskIndex_withNonIdentityOrdinalMapping

Yeah, it is now more robust, the recall calculation need to be more statistically significant so I increased number of vectors + beamsize and number of query vectors participating in average recall calculation.

Copy link
Contributor

@marianotepper marianotepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here and there.

return reallyBuild(ravv);
}

public ImmutableGraphIndex build(RandomAccessVectorValues ravv, int[] graphToRavvOrdMap) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not add this method here. It should be the responsibility of the caller to use the remapped RAVV, not a conversion that we want done by JV. Just like the change in buildAndMergeNewNodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will remove it!

Signed-off-by: Samuel Herman <sherman8915@gmail.com>
@sam-herman
Copy link
Contributor Author

One of the tests failed due to a different test not related to this change:

Error: 1,882 [ERROR] io.github.jbellis.jvector.quantization.TestCompressedVectors
Error: 1,882 [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:643)
Error: 1,882 [ERROR] 	at org.apache.maven.plugin.surefire.booterclient.ForkStarter.lambda$null$3(ForkStarter.java:350)
Error: 1,882 [ERROR] 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
Error: 1,882 [ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
Error: 1,882 [ERROR] 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
Error: 1,882 [ERROR] 	at java.base/java.lang.Thread.run(Thread.java:1570)
Error: 1,882 [ERROR] -> [Help 1]

I am re-running this test so we can have a whole green panel

@marianotepper marianotepper self-requested a review October 29, 2025 17:18
Copy link
Contributor

@marianotepper marianotepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants