Skip to content

Conversation

@mkaruza
Copy link
Contributor

@mkaruza mkaruza commented Nov 24, 2025

  • Third-party library hnswlib now uses dragonfly forked project where
    internal id is changed to uint64_t to support GlobalDocId.
  • Singleton GlobalHnswIndexRegistry class is used now to
    create/remove/execute hnsw index functionality.
  • Implemented function SearchGlobalHnswIndex that can be used to search
    hnsw index with or without prefilter query.

@mkaruza mkaruza marked this pull request as draft November 24, 2025 12:13
@mkaruza mkaruza force-pushed the mkaruza/global-hnsw-vector-index branch from d5e9d5e to 3e6442d Compare November 26, 2025 20:33
@mkaruza mkaruza changed the title Global HSNW vector index feat(search): Global HSNW vector index Nov 26, 2025
@mkaruza mkaruza requested review from dranikpg and romange November 26, 2025 20:34

auto& [ptr, size] = vector.value();

if (ptr && size != dim_) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if !ptr? can it happen and what does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!ptr is when we key doesn't have field that we index so null value.

Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's much better, thanks!

@romange
Copy link
Collaborator

romange commented Nov 26, 2025

please fix build/test errors tomorrow, and lets submit. I want to continue experimenting on this

Copilot finished reviewing on behalf of romange November 26, 2025 21:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a global HNSW (Hierarchical Navigable Small World) vector index for search functionality, centralizing vector index management across all shards to improve KNN (K-Nearest Neighbors) query performance and consistency.

Key changes:

  • Introduces GlobalHnswIndexRegistry singleton to manage HNSW vector indices globally across all shards
  • Updates hnswlib dependency to a Dragonfly fork that uses uint64_t for internal IDs to support cross-shard global document IDs
  • Implements SearchGlobalHnswIndex function for executing KNN searches with optional prefilter support across the global index

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/core/search/base.h Adds GlobalDocId type and helper functions for composing/decomposing global document IDs from shard IDs and local doc IDs
src/core/search/indices.h Refactors HnswVectorIndex to work with global IDs; adds placeholder ShardHnswVectorIndex for shard-local indices
src/core/search/indices.cc Implements thread-safe HNSW index resizing and updates index operations to use global document IDs
src/core/search/search.h Adds methods to SearchAlgorithm for KNN query detection and node extraction
src/core/search/search.cc Implements logic to separate HNSW KNN nodes from query tree and manage score sort options
src/core/search/ast_expr.h Adds HasPreFilter() method to AstKnnNode
src/core/search/ast_expr.cc Implements prefilter detection logic for KNN nodes
src/server/search/global_hnsw_index.h Defines GlobalHnswIndexRegistry singleton for managing global HNSW indices
src/server/search/global_hnsw_index.cc Implements thread-safe registry operations for creating, removing, and accessing global HNSW indices
src/server/search/search_family.h Adds Shutdown() method to SearchFamily
src/server/search/search_family.cc Implements global HNSW index lifecycle management in FT.CREATE/DROPINDEX and adds SearchGlobalHnswIndex function
src/server/search/doc_index.h Refactors document addition/removal to return DocIds and adds global vector index management methods
src/server/search/doc_index.cc Implements integration between shard-local indices and global HNSW indices
src/server/search/CMakeLists.txt Adds global_hnsw_index.cc to build
src/server/main_service.cc Calls SearchFamily::Shutdown() during service shutdown to clean up global indices
src/external_libs.cmake Updates hnswlib dependency to Dragonfly fork with uint64_t support
src/core/search/search_test.cc Comments out HNSW KNN tests (temporarily disabled)

Comment on lines +1246 to +1254
for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) {
if (field_info.type == search::SchemaField::VECTOR &&
!(field_info.flags & search::SchemaField::NOINDEX)) {
const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params);
if (vparams.use_hnsw &&
!GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) {
cmd_cntx.tx->Conclude();
return builder->SendError("Index already exists");
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If GlobalHnswIndexRegistry::Instance().Create() fails (returns false) at line 1251, the function returns an error but doesn't clean up any previously created global HNSW indices for other vector fields in the same index.

If an index has multiple HNSW vector fields and creation fails for the second field, the first field's global HNSW index will remain in the registry, causing a resource leak and potential inconsistency.

Consider rolling back all created global HNSW indices for this index when any creation fails.

Suggested change
for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) {
if (field_info.type == search::SchemaField::VECTOR &&
!(field_info.flags & search::SchemaField::NOINDEX)) {
const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params);
if (vparams.use_hnsw &&
!GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) {
cmd_cntx.tx->Conclude();
return builder->SendError("Index already exists");
std::vector<std::string> created_hnsw_fields;
for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) {
if (field_info.type == search::SchemaField::VECTOR &&
!(field_info.flags & search::SchemaField::NOINDEX)) {
const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params);
if (vparams.use_hnsw) {
if (!GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) {
// Rollback any previously created HNSW indices for this index
for (const auto& field_name : created_hnsw_fields) {
GlobalHnswIndexRegistry::Instance().Remove(idx_name, field_name);
}
cmd_cntx.tx->Conclude();
return builder->SendError("Index already exists");
} else {
created_hnsw_fields.push_back(field_info.short_name);
}

Copilot uses AI. Check for mistakes.
std::string key = MakeKey(index_name, field_name);
std::shared_lock<std::shared_mutex> lock(registry_mutex_);
auto it = indices_.find(key);
return it != indices_.end() ? it->second : nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return it != indices_.end() ? it->second : nullptr;
return it != indices_.end() ? it->second : {};

* Third-party library hnswlib now uses dragonfly forked project where
  internal id is changed to uint64_t to support GlobalDocId.
* Singleton GlobalHnswIndexRegistry class is used now to
  create/remove/execute hnsw index functionality.
* Implemented function SearchGlobalHnswIndex that can be used to search
  hnsw index with or without prefilter query.

Signed-off-by: mkaruza <mario@dragonflydb.io>
@mkaruza mkaruza force-pushed the mkaruza/global-hnsw-vector-index branch from 3e6442d to ce5d639 Compare November 27, 2025 13:39
@mkaruza mkaruza marked this pull request as ready for review November 27, 2025 20:04
Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

ResizeIfFull();
continue;
}
LOG(ERROR) << "HnswlibAdapter::Add exception: " << e.what();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing the while (true) loop after logging unexpected exceptions in HnswlibAdapter::Add can lead to a tight infinite loop if the condition persists. Consider terminating the loop or surfacing the error when it’s not the capacity-resize case.

🤖 Was this useful? React with 👍 or 👎

mkaruza and others added 3 commits November 27, 2025 21:12
* Github PR review changes
* Fixed python test
* Disabled unit test for hnsw
@mkaruza mkaruza force-pushed the mkaruza/global-hnsw-vector-index branch from d5439f0 to 2bb1738 Compare November 27, 2025 20:12
@mkaruza mkaruza changed the title feat(search): Global HSNW vector index feat(search): Global HNSW vector index Nov 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants