-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(search): Global HNSW vector index #6103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d5e9d5e to
3e6442d
Compare
|
|
||
| auto& [ptr, size] = vector.value(); | ||
|
|
||
| if (ptr && size != dim_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if !ptr? can it happen and what does it mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!ptr is when we key doesn't have field that we index so null value.
romange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's much better, thanks!
|
please fix build/test errors tomorrow, and lets submit. I want to continue experimenting on this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a global HNSW (Hierarchical Navigable Small World) vector index for search functionality, centralizing vector index management across all shards to improve KNN (K-Nearest Neighbors) query performance and consistency.
Key changes:
- Introduces
GlobalHnswIndexRegistrysingleton to manage HNSW vector indices globally across all shards - Updates hnswlib dependency to a Dragonfly fork that uses
uint64_tfor internal IDs to support cross-shard global document IDs - Implements
SearchGlobalHnswIndexfunction for executing KNN searches with optional prefilter support across the global index
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/search/base.h | Adds GlobalDocId type and helper functions for composing/decomposing global document IDs from shard IDs and local doc IDs |
| src/core/search/indices.h | Refactors HnswVectorIndex to work with global IDs; adds placeholder ShardHnswVectorIndex for shard-local indices |
| src/core/search/indices.cc | Implements thread-safe HNSW index resizing and updates index operations to use global document IDs |
| src/core/search/search.h | Adds methods to SearchAlgorithm for KNN query detection and node extraction |
| src/core/search/search.cc | Implements logic to separate HNSW KNN nodes from query tree and manage score sort options |
| src/core/search/ast_expr.h | Adds HasPreFilter() method to AstKnnNode |
| src/core/search/ast_expr.cc | Implements prefilter detection logic for KNN nodes |
| src/server/search/global_hnsw_index.h | Defines GlobalHnswIndexRegistry singleton for managing global HNSW indices |
| src/server/search/global_hnsw_index.cc | Implements thread-safe registry operations for creating, removing, and accessing global HNSW indices |
| src/server/search/search_family.h | Adds Shutdown() method to SearchFamily |
| src/server/search/search_family.cc | Implements global HNSW index lifecycle management in FT.CREATE/DROPINDEX and adds SearchGlobalHnswIndex function |
| src/server/search/doc_index.h | Refactors document addition/removal to return DocIds and adds global vector index management methods |
| src/server/search/doc_index.cc | Implements integration between shard-local indices and global HNSW indices |
| src/server/search/CMakeLists.txt | Adds global_hnsw_index.cc to build |
| src/server/main_service.cc | Calls SearchFamily::Shutdown() during service shutdown to clean up global indices |
| src/external_libs.cmake | Updates hnswlib dependency to Dragonfly fork with uint64_t support |
| src/core/search/search_test.cc | Comments out HNSW KNN tests (temporarily disabled) |
| for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) { | ||
| if (field_info.type == search::SchemaField::VECTOR && | ||
| !(field_info.flags & search::SchemaField::NOINDEX)) { | ||
| const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params); | ||
| if (vparams.use_hnsw && | ||
| !GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) { | ||
| cmd_cntx.tx->Conclude(); | ||
| return builder->SendError("Index already exists"); |
Copilot
AI
Nov 26, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If GlobalHnswIndexRegistry::Instance().Create() fails (returns false) at line 1251, the function returns an error but doesn't clean up any previously created global HNSW indices for other vector fields in the same index.
If an index has multiple HNSW vector fields and creation fails for the second field, the first field's global HNSW index will remain in the registry, causing a resource leak and potential inconsistency.
Consider rolling back all created global HNSW indices for this index when any creation fails.
| for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) { | |
| if (field_info.type == search::SchemaField::VECTOR && | |
| !(field_info.flags & search::SchemaField::NOINDEX)) { | |
| const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params); | |
| if (vparams.use_hnsw && | |
| !GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) { | |
| cmd_cntx.tx->Conclude(); | |
| return builder->SendError("Index already exists"); | |
| std::vector<std::string> created_hnsw_fields; | |
| for (const auto& [field_ident, field_info] : idx_ptr->schema.fields) { | |
| if (field_info.type == search::SchemaField::VECTOR && | |
| !(field_info.flags & search::SchemaField::NOINDEX)) { | |
| const auto& vparams = std::get<search::SchemaField::VectorParams>(field_info.special_params); | |
| if (vparams.use_hnsw) { | |
| if (!GlobalHnswIndexRegistry::Instance().Create(idx_name, field_info.short_name, vparams)) { | |
| // Rollback any previously created HNSW indices for this index | |
| for (const auto& field_name : created_hnsw_fields) { | |
| GlobalHnswIndexRegistry::Instance().Remove(idx_name, field_name); | |
| } | |
| cmd_cntx.tx->Conclude(); | |
| return builder->SendError("Index already exists"); | |
| } else { | |
| created_hnsw_fields.push_back(field_info.short_name); | |
| } |
| std::string key = MakeKey(index_name, field_name); | ||
| std::shared_lock<std::shared_mutex> lock(registry_mutex_); | ||
| auto it = indices_.find(key); | ||
| return it != indices_.end() ? it->second : nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return it != indices_.end() ? it->second : nullptr; | |
| return it != indices_.end() ? it->second : {}; |
* Third-party library hnswlib now uses dragonfly forked project where internal id is changed to uint64_t to support GlobalDocId. * Singleton GlobalHnswIndexRegistry class is used now to create/remove/execute hnsw index functionality. * Implemented function SearchGlobalHnswIndex that can be used to search hnsw index with or without prefilter query. Signed-off-by: mkaruza <mario@dragonflydb.io>
3e6442d to
ce5d639
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. 2 suggestions posted.
Comment augment review to trigger a new review at any time.
| ResizeIfFull(); | ||
| continue; | ||
| } | ||
| LOG(ERROR) << "HnswlibAdapter::Add exception: " << e.what(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing the while (true) loop after logging unexpected exceptions in HnswlibAdapter::Add can lead to a tight infinite loop if the condition persists. Consider terminating the loop or surfacing the error when it’s not the capacity-resize case.
🤖 Was this useful? React with 👍 or 👎
* Github PR review changes * Fixed python test * Disabled unit test for hnsw
d5439f0 to
2bb1738
Compare
internal id is changed to uint64_t to support GlobalDocId.
create/remove/execute hnsw index functionality.
hnsw index with or without prefilter query.