Skip to content

Conversation

@jshook
Copy link
Contributor

@jshook jshook commented Aug 28, 2025

This extends the examples/bench capabilities in jvector:

  • The DataSet type is virtualized behind an interface
  • The way DataSets are loaded is more modular
    • A DataSet loader has been added to support the vectordata API

The net effect is that bench can no access vectordata hosted datasets.
The benefits of this are several:

  • remote vector test data hosting
  • uniform API for finding and using vector datasets
  • merkle-based automatic download of chunks for dynamic access
  • efficient download and automatic caching of data locally on test nodes
  • mapping and ranging subsets of data, such as "first 1M", "first 10M" and so on under profile names
  • packing various vector data views in a consistent API: base, query, ...
  • management of datasets via catalogs, orthogonal to access control

Most of the core wiring of these capabilities is provided by another library which is part of the nosqlbench project. The changes to jvector are to adapt it to use these APIs.

There are a couple issues yet to resolve with the Java version configs and GHA.

@jshook jshook changed the title Jshook/streamer 07 Vectordata streaming support for examples/bench Aug 28, 2025
@jshook jshook marked this pull request as draft August 29, 2025 18:23
jshook and others added 8 commits September 2, 2025 18:40
Create partial sums for PQ codebook for use during diversity checks (#511)

* Create partial sums for PQ codebook for use during diversity checks of graph building

Signed-off-by: Jake Luciani <jake@datastax.com>
@jshook jshook marked this pull request as ready for review September 4, 2025 22:47
@jshook jshook marked this pull request as draft September 19, 2025 22:53
@jshook
Copy link
Contributor Author

jshook commented Sep 19, 2025

Due to conflicts with the executable jar approach, I'll need to refactor this a bit, so I moved it to a draft.
My plan for this is to combine the commands so that autobench is included and update the GHA wiring.

@jshook
Copy link
Contributor Author

jshook commented Oct 10, 2025

replacing this with a clean lineage

@jshook jshook closed this Oct 10, 2025
@jshook jshook deleted the jshook/streamer-07 branch October 10, 2025 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants