Skip to content

Adding file mode option to serialize() methods#1071

Closed
chatman wants to merge 1 commit intorapidsai:branch-25.08from
SearchScale:ishan/append-mode-for-serialization
Closed

Adding file mode option to serialize() methods#1071
chatman wants to merge 1 commit intorapidsai:branch-25.08from
SearchScale:ishan/append-mode-for-serialization

Conversation

@chatman
Copy link
Contributor

@chatman chatman commented Jun 30, 2025

In serialize() methods of CAGRA, Brute Force and HNSW, it would be preferable to add the ability to append to a file (ios::app). This will be useful for Lucene, where the file written to the segment contains a header, post which the serialized index could be written, by opening the file in append mode.

Also, useful for composite files (file containing multiple indexes). For this usecase, though, a deserialize() method where indexes from the same file can be loaded using offsets+length. This is not in scope for this PR, though.

In this case, the additional file_mode parameter can have 'a' for ios::app or 'w' for ios::out.

cuvsError_t cuvsBruteForceSerializeWithMode(cuvsResources_t res,
                                             const char* filename,
                                             cuvsBruteForceIndex_t index,
                                             char file_mode);

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jun 30, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

endif()

list(APPEND CUVS_CUDA_FLAGS --expt-extended-lambda --expt-relaxed-constexpr)
list(APPEND CUVS_CUDA_FLAGS -allow-unsupported-compiler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why was this option required for nvcc?

Comment on lines +197 to +199
const char* filename,
cuvsBruteForceIndex_t index,
char file_mode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the indent off by 1 here?

@cjnolet
Copy link
Member

cjnolet commented Jul 16, 2025

@chatman is this still relevant given the more recent move to serializing cagra out to Lucene HNSW format directly?

@cjnolet
Copy link
Member

cjnolet commented Jul 22, 2025

@chatman ping ^^

@chatman
Copy link
Contributor Author

chatman commented Jul 22, 2025

@chatman is this still relevant given the more recent move to serializing cagra out to Lucene HNSW format directly?

For CAGRA or Tiered index serialization, either this or #1085 is needed. My preference would be #1085, and in that case we can close this.

@chatman
Copy link
Contributor Author

chatman commented Jul 22, 2025

Closing this in favour of #1085.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants