-
Notifications
You must be signed in to change notification settings - Fork 604
Fix testcases #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix testcases #439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to fix test cases and address several critical issues in the SPANN index implementation, including error handling, data integrity validation, and index persistence logic.
Key Changes:
- Fixed index cloning and saving logic by uncommenting critical SaveIndex calls
- Enhanced checksum validation and error detection throughout the codebase
- Refactored RefineIndex to reload the index after checkpoint operations
- Updated Visual Studio platform toolset from v142 to v143
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| Test/src/SPFreshTest.cpp | Removed RefineTestIdx test case, commented out cache test section, minor whitespace changes |
| AnnService/src/Core/VectorIndex.cpp | Refactored SPANN directory copying logic, moved quantizer saving, uncommented SaveIndex in Clone |
| AnnService/src/Core/SPANN/SPANNIndex.cpp | Added stream shutdown calls and reordered RefineIndex operations |
| AnnService/inc/Core/SPANN/IExtraSearcher.h | Changed RefineIndex parameter from const to non-const reference |
| AnnService/inc/Core/SPANN/ExtraDynamicSearcher.h | Multiple improvements: re-enabled checksum validation, added merge detection, enhanced error handling, replaced &front() with .data() |
| AnnService/inc/Core/Common/Dataset.h | Simplified Load logic and added incRows initialization |
| AnnService/*.vcxproj | Updated platform toolset to v143 for all Visual Studio projects |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| virtual void InitWorkSpace(ExtraWorkSpace* p_exWorkSpace, bool clear = false) = 0; | ||
|
|
||
| virtual ErrorCode GetPostingDebug(ExtraWorkSpace* p_exWorkSpace, std::shared_ptr<VectorIndex> p_index, SizeType vid, std::vector<SizeType>& VIDs, std::shared_ptr<VectorSet>& vecs) = 0; | ||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter type changed from std::shared_ptr<VectorIndex> (const reference) to std::shared_ptr<VectorIndex>& (non-const reference). This is a breaking API change that affects the interface contract. The change appears intentional to allow the index to be reloaded inside RefineIndex, but this should be clearly documented or the method should be renamed to indicate it modifies the index reference.
| // Note: This method may modify (reassign) the p_index shared pointer. | |
| // Callers should be aware that the index reference may be changed inside this method. |
| Checkpoint(m_opt->m_indexDirectory); | ||
| p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order of Checkpoint() and SaveIndex() has been swapped. Previously SaveIndex was called first, then Checkpoint. Now Checkpoint is called before SaveIndex. This ordering change could affect the consistency of saved state - if SaveIndex fails after Checkpoint succeeds, the checkpoint may reference an invalid or incomplete index state.
| Checkpoint(m_opt->m_indexDirectory); | |
| p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder); | |
| p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder); | |
| Checkpoint(m_opt->m_indexDirectory); |
| if ((finalcode = | ||
| VectorIndex::LoadIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder, | ||
| p_index)) != ErrorCode::Success) | ||
| return finalcode; |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block reloads the entire index and related data structures after checkpoint. However, the error handling for these operations only returns on failure - there's no check to ensure p_index is valid after reloading. If VectorIndex::LoadIndex succeeds but returns a null pointer, subsequent use of p_index could cause crashes. Additionally, this reload logic appears to unconditionally replace the input index reference, which may be unexpected behavior given the API signature change.
| return finalcode; | |
| return finalcode; | |
| if (!p_index) { | |
| SPTAGLIB_LOG(Helper::LogLevel::LL_Error, "LoadIndex succeeded but p_index is null\n"); | |
| return ErrorCode::Fail; | |
| } |
| if ((ret=db->Get(headID, &postingList, MaxTimeout, &(p_exWorkSpace->m_diskRequests))) != | ||
| ErrorCode::Success)// || !m_checkSum.ValidateChecksum(postingList.c_str(), (int)(postingList.size()), *m_checkSums[headID])) | ||
| ErrorCode::Success || !m_checkSum.ValidateChecksum(postingList.c_str(), (int)(postingList.size()), *m_checkSums[headID])) | ||
| { | ||
| SPTAGLIB_LOG(Helper::LogLevel::LL_Error, "Split fail to get oversized postings: key=%d size=%d\n", headID, m_postingSizes.GetSize(headID)); | ||
| SPTAGLIB_LOG(Helper::LogLevel::LL_Error, | ||
| "Split fail to get oversized postings: key=%d required size=%d read size=%d checksum " | ||
| "issue=%d\n", | ||
| headID, (int)(m_postingSizes.GetSize(headID) * m_vectorInfoSize), | ||
| (int)(postingList.size()), (int)(ret == ErrorCode::Success)); | ||
| return ret; |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checksum validation has been re-enabled. However, when the checksum validation fails (but ret == ErrorCode::Success), the function returns ret which is ErrorCode::Success, not an error code. This means a checksum failure won't be reported as an error to the caller, potentially allowing corrupted data to be processed.
| { | ||
| std::string indexFolder = GetParameter("IndexDirectory", "Base"); | ||
| //ret = SaveIndex(indexFolder); | ||
| ret = SaveIndex(indexFolder); |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The SaveIndex call has been uncommented, which means the index will be saved to the original folder before copying to the clone location. This is likely the correct behavior (as indicated by the fix in the PR title), but it introduces a potential performance issue: if the index is large, saving it twice (once to the original location, once via copy to clone) could be inefficient. Consider whether a direct save to the clone location would be more appropriate.
| finalcode = ret; | ||
| return; | ||
| finalcode = ErrorCode::Fail; | ||
| //return; |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed from setting finalcode = ret and returning immediately to setting finalcode = ErrorCode::Fail and commenting out the return. This error handling change is problematic because it continues processing after detecting a validation failure instead of early-exiting. The loop will continue even though an error was detected, which could lead to processing corrupted data and make debugging harder.
| //return; | |
| return; |
| if (m_pQuantizer) | ||
| { | ||
| ret = m_pQuantizer->SaveQuantizer(handles.back()); | ||
| } |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quantizer saving logic has been moved from after metadata operations (lines 477-479 in original) to before the NeedRefine() check (lines 458-461). This could be problematic because:
- The quantizer is now saved even if RefineIndex or SaveIndexData fails later
- The quantizer file handle is accessed via
handles.back()which assumes the quantizer file is the last in the handles vector, but this may not be true if metadata files are added after quantizer files in the indexfiles list (lines 436-444)
| for (int i = 0; i < p_indexStreams.size(); i++) | ||
| { | ||
| p_indexStreams[i]->ShutDown(); | ||
| } | ||
|
|
||
| if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success) | ||
| return ret; | ||
|
|
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The streams are shut down immediately after metadata refinement but before the m_extraSearcher->RefineIndex() call. If RefineIndex needs to access or modify these streams, this will fail. The shutdown should occur after all operations that might need the streams are complete.
| for (int i = 0; i < p_indexStreams.size(); i++) | |
| { | |
| p_indexStreams[i]->ShutDown(); | |
| } | |
| if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success) | |
| return ret; | |
| // Move stream shutdown after all refinement operations are complete | |
| if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success) | |
| return ret; | |
| for (int i = 0; i < p_indexStreams.size(); i++) | |
| { | |
| p_indexStreams[i]->ShutDown(); | |
| } |
| int QuantifyAssumptionBroken(VectorIndex* p_index, SizeType headID, std::string& postingList, SizeType SplitHead, std::vector<SizeType>& newHeads, std::set<int>& brokenID, int topK = 0, float ratio = 1.0) | ||
| { | ||
| int assumptionBrokenNum = 0; | ||
| int postVectorNum = postingList.size() / m_vectorInfoSize; |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Changed from &postingList.front() to postingList.data(). While both are generally equivalent for non-empty strings, .data() is safer because calling .front() on an empty string is undefined behavior. However, verify that postingList cannot be empty at this point, otherwise dereferencing the pointer will still cause issues.
| int postVectorNum = postingList.size() / m_vectorInfoSize; | |
| int postVectorNum = postingList.size() / m_vectorInfoSize; | |
| if (postingList.empty()) { | |
| return 0; | |
| } |
| } | ||
| auto compEnd = std::chrono::high_resolution_clock::now(); | ||
| if (realNum <= m_mergeThreshold) MergeAsync(p_index.get(), curPostingID); // TODO: Control merge | ||
| //if (realNum <= m_mergeThreshold) MergeAsync(p_index.get(), curPostingID); // TODO: Control merge |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The automatic merge call for postings with low vector counts has been commented out. This means postings that fall below the merge threshold will no longer be automatically merged during search operations. This could lead to performance degradation over time as small postings accumulate. The comment "TODO: Control merge" suggests this is intentional but incomplete - there should be an alternative mechanism or configuration to control when merges occur.
No description provided.