Skip to content

Conversation

@MaggieQi
Copy link
Member

No description provided.

@MaggieQi MaggieQi requested a review from Copilot November 25, 2025 22:19
Copilot finished reviewing on behalf of MaggieQi November 25, 2025 22:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix test cases and address several critical issues in the SPANN index implementation, including error handling, data integrity validation, and index persistence logic.

Key Changes:

  • Fixed index cloning and saving logic by uncommenting critical SaveIndex calls
  • Enhanced checksum validation and error detection throughout the codebase
  • Refactored RefineIndex to reload the index after checkpoint operations
  • Updated Visual Studio platform toolset from v142 to v143

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
Test/src/SPFreshTest.cpp Removed RefineTestIdx test case, commented out cache test section, minor whitespace changes
AnnService/src/Core/VectorIndex.cpp Refactored SPANN directory copying logic, moved quantizer saving, uncommented SaveIndex in Clone
AnnService/src/Core/SPANN/SPANNIndex.cpp Added stream shutdown calls and reordered RefineIndex operations
AnnService/inc/Core/SPANN/IExtraSearcher.h Changed RefineIndex parameter from const to non-const reference
AnnService/inc/Core/SPANN/ExtraDynamicSearcher.h Multiple improvements: re-enabled checksum validation, added merge detection, enhanced error handling, replaced &front() with .data()
AnnService/inc/Core/Common/Dataset.h Simplified Load logic and added incRows initialization
AnnService/*.vcxproj Updated platform toolset to v143 for all Visual Studio projects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

virtual void InitWorkSpace(ExtraWorkSpace* p_exWorkSpace, bool clear = false) = 0;

virtual ErrorCode GetPostingDebug(ExtraWorkSpace* p_exWorkSpace, std::shared_ptr<VectorIndex> p_index, SizeType vid, std::vector<SizeType>& VIDs, std::shared_ptr<VectorSet>& vecs) = 0;

Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter type changed from std::shared_ptr<VectorIndex> (const reference) to std::shared_ptr<VectorIndex>& (non-const reference). This is a breaking API change that affects the interface contract. The change appears intentional to allow the index to be reloaded inside RefineIndex, but this should be clearly documented or the method should be renamed to indicate it modifies the index reference.

Suggested change
// Note: This method may modify (reassign) the p_index shared pointer.
// Callers should be aware that the index reference may be changed inside this method.

Copilot uses AI. Check for mistakes.
Comment on lines 621 to +622
Checkpoint(m_opt->m_indexDirectory);
p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder);
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order of Checkpoint() and SaveIndex() has been swapped. Previously SaveIndex was called first, then Checkpoint. Now Checkpoint is called before SaveIndex. This ordering change could affect the consistency of saved state - if SaveIndex fails after Checkpoint succeeds, the checkpoint may reference an invalid or incomplete index state.

Suggested change
Checkpoint(m_opt->m_indexDirectory);
p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder);
p_index->SaveIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder);
Checkpoint(m_opt->m_indexDirectory);

Copilot uses AI. Check for mistakes.
if ((finalcode =
VectorIndex::LoadIndex(m_opt->m_indexDirectory + FolderSep + m_opt->m_headIndexFolder,
p_index)) != ErrorCode::Success)
return finalcode;
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block reloads the entire index and related data structures after checkpoint. However, the error handling for these operations only returns on failure - there's no check to ensure p_index is valid after reloading. If VectorIndex::LoadIndex succeeds but returns a null pointer, subsequent use of p_index could cause crashes. Additionally, this reload logic appears to unconditionally replace the input index reference, which may be unexpected behavior given the API signature change.

Suggested change
return finalcode;
return finalcode;
if (!p_index) {
SPTAGLIB_LOG(Helper::LogLevel::LL_Error, "LoadIndex succeeded but p_index is null\n");
return ErrorCode::Fail;
}

Copilot uses AI. Check for mistakes.
Comment on lines 687 to 695
if ((ret=db->Get(headID, &postingList, MaxTimeout, &(p_exWorkSpace->m_diskRequests))) !=
ErrorCode::Success)// || !m_checkSum.ValidateChecksum(postingList.c_str(), (int)(postingList.size()), *m_checkSums[headID]))
ErrorCode::Success || !m_checkSum.ValidateChecksum(postingList.c_str(), (int)(postingList.size()), *m_checkSums[headID]))
{
SPTAGLIB_LOG(Helper::LogLevel::LL_Error, "Split fail to get oversized postings: key=%d size=%d\n", headID, m_postingSizes.GetSize(headID));
SPTAGLIB_LOG(Helper::LogLevel::LL_Error,
"Split fail to get oversized postings: key=%d required size=%d read size=%d checksum "
"issue=%d\n",
headID, (int)(m_postingSizes.GetSize(headID) * m_vectorInfoSize),
(int)(postingList.size()), (int)(ret == ErrorCode::Success));
return ret;
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksum validation has been re-enabled. However, when the checksum validation fails (but ret == ErrorCode::Success), the function returns ret which is ErrorCode::Success, not an error code. This means a checksum failure won't be reported as an error to the caller, potentially allowing corrupted data to be processed.

Copilot uses AI. Check for mistakes.
{
std::string indexFolder = GetParameter("IndexDirectory", "Base");
//ret = SaveIndex(indexFolder);
ret = SaveIndex(indexFolder);
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The SaveIndex call has been uncommented, which means the index will be saved to the original folder before copying to the clone location. This is likely the correct behavior (as indicated by the fix in the PR title), but it introduces a potential performance issue: if the index is large, saving it twice (once to the original location, once via copy to clone) could be inefficient. Consider whether a direct save to the clone location would be more appropriate.

Copilot uses AI. Check for mistakes.
finalcode = ret;
return;
finalcode = ErrorCode::Fail;
//return;
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from setting finalcode = ret and returning immediately to setting finalcode = ErrorCode::Fail and commenting out the return. This error handling change is problematic because it continues processing after detecting a validation failure instead of early-exiting. The loop will continue even though an error was detected, which could lead to processing corrupted data and make debugging harder.

Suggested change
//return;
return;

Copilot uses AI. Check for mistakes.
Comment on lines +458 to +461
if (m_pQuantizer)
{
ret = m_pQuantizer->SaveQuantizer(handles.back());
}
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quantizer saving logic has been moved from after metadata operations (lines 477-479 in original) to before the NeedRefine() check (lines 458-461). This could be problematic because:

  1. The quantizer is now saved even if RefineIndex or SaveIndexData fails later
  2. The quantizer file handle is accessed via handles.back() which assumes the quantizer file is the last in the handles vector, but this may not be true if metadata files are added after quantizer files in the indexfiles list (lines 436-444)

Copilot uses AI. Check for mistakes.
Comment on lines +1398 to +1405
for (int i = 0; i < p_indexStreams.size(); i++)
{
p_indexStreams[i]->ShutDown();
}

if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success)
return ret;

Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streams are shut down immediately after metadata refinement but before the m_extraSearcher->RefineIndex() call. If RefineIndex needs to access or modify these streams, this will fail. The shutdown should occur after all operations that might need the streams are complete.

Suggested change
for (int i = 0; i < p_indexStreams.size(); i++)
{
p_indexStreams[i]->ShutDown();
}
if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success)
return ret;
// Move stream shutdown after all refinement operations are complete
if ((ret = m_extraSearcher->RefineIndex(m_index, false, &headOldtoNew, p_mapping)) != ErrorCode::Success)
return ret;
for (int i = 0; i < p_indexStreams.size(); i++)
{
p_indexStreams[i]->ShutDown();
}

Copilot uses AI. Check for mistakes.
int QuantifyAssumptionBroken(VectorIndex* p_index, SizeType headID, std::string& postingList, SizeType SplitHead, std::vector<SizeType>& newHeads, std::set<int>& brokenID, int topK = 0, float ratio = 1.0)
{
int assumptionBrokenNum = 0;
int postVectorNum = postingList.size() / m_vectorInfoSize;
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Changed from &postingList.front() to postingList.data(). While both are generally equivalent for non-empty strings, .data() is safer because calling .front() on an empty string is undefined behavior. However, verify that postingList cannot be empty at this point, otherwise dereferencing the pointer will still cause issues.

Suggested change
int postVectorNum = postingList.size() / m_vectorInfoSize;
int postVectorNum = postingList.size() / m_vectorInfoSize;
if (postingList.empty()) {
return 0;
}

Copilot uses AI. Check for mistakes.
}
auto compEnd = std::chrono::high_resolution_clock::now();
if (realNum <= m_mergeThreshold) MergeAsync(p_index.get(), curPostingID); // TODO: Control merge
//if (realNum <= m_mergeThreshold) MergeAsync(p_index.get(), curPostingID); // TODO: Control merge
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automatic merge call for postings with low vector counts has been commented out. This means postings that fall below the merge threshold will no longer be automatically merged during search operations. This could lead to performance degradation over time as small postings accumulate. The comment "TODO: Control merge" suggests this is intentional but incomplete - there should be an alternative mechanism or configuration to control when merges occur.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants