-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add exact search if no native engine files are available #2136
Add exact search if no native engine files are available #2136
Conversation
789a0e7
to
0c55fdb
Compare
0362681
to
317066f
Compare
src/main/java/org/opensearch/knn/index/codec/KNN80Codec/KNN80DocValuesProducer.java
Outdated
Show resolved
Hide resolved
* @return boolean - true if exactSearch needs to be done after ANNSearch. | ||
*/ | ||
private boolean canDoExactSearch(final LeafReaderContext context, final int filterIdsCount, final int annResultCount) { | ||
if (annResultCount == 0 && isMissingNativeEngineFiles(context)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are looking for annResultCount == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To narrow the calls to checkNativeEngine files are missing or not? For ex: if result != 0, we know that it is not missing.
private BitSet createBitSet(final DocIdSetIterator filteredDocIdsIterator, final Bits liveDocs, int maxDoc) throws IOException { | ||
if (liveDocs == null && filteredDocIdsIterator instanceof BitSetIterator) { | ||
private BitSet getAllDocsBitSet(final LeafReaderContext ctx) throws IOException { | ||
final FloatVectorValues floatVectorValues = ctx.reader().getFloatVectorValues(this.knnQuery.getField()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be ByteVectorValues too. You should actually create KNNVectorValues here using KNNVectorValuesFactory class.
17a3abd
to
b02ff99
Compare
8a1a4f9
to
12233c4
Compare
When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ec40c4a
into
opensearch-project:feature/build-vector-ds-greedily
…project#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
…project#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
…project#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
…t search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com>
…t search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> (cherry picked from commit 5a56829)
…nd perform exact search when there are no engine files (#2201) * Add support to build vector data structures greedily and perform exact search when there are no engine files (#2188) * Introduce new setting to configure when to build graph during segment creation (#2007) Added new updatable index setting "build_vector_data_structure_threshold", which will be considered when to build braph or not for native engines. This is noop for lucene. This depends on use lucene format as prerequisite. We don't need to add flag since it is only enable if lucene format is already enabled. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add integration test for binary vector values (#2142) Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Allow build graph greedily for quantization scenarios (#2175) Previosuly we only added support to build greedily for non quantization scenario. In this commit, we can remove that constraint, however, we cannot skip writing quanitization state since it is required irrespective of type of search is executed later. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add exact search if no native engine files are available (#2136) * Add exact search if no engine files are in segments When graph is not available, plugin will return empty results. With this change, exact search will be performed when only no engine file is available in segment. We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search. --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> * Add support for radial search in exact search (#2174) * Add support for radial search in exact search When threshold value is set, knn plugin will not be creating graph. Hence, when search request is trigged during that time, exact search will return valid results. However, radial search was never included as part of exact search. This will break radial search when threshold is added and radial search is requested. In this commit, new method is introduced to accept min score and return documents that are greater than min score, similar to how radial search is performed by native engines. This search is independent of engine, but, radial search is supported only for FAISS engine out of all native engines. Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> (cherry picked from commit 5a56829) * Fix compilation issue due to package error Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> --------- Signed-off-by: Vijayan Balasubramanian <balasvij@amazon.com> Co-authored-by: Vijayan Balasubramanian <balasvij@amazon.com>
Description
Add exact search if no native engine files are available . This PR is part of issue where users will have option to decide when to build graph, which helps them to reduce build time. When graph is not available, plugin will return empty results. With this PR, exact search will be performed when only no engine file is available in segment.
We also don't need version check or feature flag because, option to not build vector data structure will only be available post 2.17. If an index is created using pre 2.17 version, segment will always have engine files and this feature will never be called during search.
Related Issues
Part of #1942
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.