Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports additional query timing types for profiling plugin query components #17146

Open
wants to merge 4 commits into
base: 2.x
Choose a base branch
from

Conversation

shatejas
Copy link

@shatejas shatejas commented Jan 27, 2025

Description

Adds enums related to knn to be able to profile ann query. Currently its difficult to debug latencies for knn, this will help increase visibility on knn query

KNN PR: opensearch-project/k-NN#2450

Related Issues

Resolves opensearch-project/k-NN#2286

Sample response

},
	"profile": {
		"shards": [
			{
				"id": "[ZaUiItRkQIy9BnX_i0ccNg][target_index_faiss][0]",
				"inbound_network_time_in_millis": 0,
				"outbound_network_time_in_millis": 0,
				"searches": [
					{
						"query": [
							{
								"type": "BooleanQuery",
								"description": "IndexOrDocValuesQuery(indexQuery=rating:[8 TO 10], dvQuery=rating:[8 TO 10]) NativeEngineKnnVectorQuery[]...KNNQuery[]",
								"time_in_nanos": 41773456,
								"breakdown": {
									"advance": 0,
									"advance_count": 0,
									"build_scorer": 11648000,
									"build_scorer_count": 2,
									"compute_max_score": 0,
									"compute_max_score_count": 0,
									"create_weight": 30073458,
									"create_weight_count": 1,
									"match": 0,
									"match_count": 0,
									"next_doc": 35915,
									"next_doc_count": 13,
									"score": 16083,
									"score_count": 12,
									"set_min_competitive_score": 0,
									"set_min_competitive_score_count": 0,
									"shallow_advance": 0,
									"shallow_advance_count": 0
								},
								"children": [
									{
										"type": "IndexOrDocValuesQuery",
										"description": "IndexOrDocValuesQuery(indexQuery=rating:[8 TO 10], dvQuery=rating:[8 TO 10])",
										"time_in_nanos": 7893251,
										"breakdown": {
											"advance": 0,
											"advance_count": 0,
											"build_scorer": 6763916,
											"build_scorer_count": 3,
											"compute_max_score": 0,
											"compute_max_score_count": 0,
											"create_weight": 1113750,
											"create_weight_count": 1,
											"match": 0,
											"match_count": 0,
											"next_doc": 13418,
											"next_doc_count": 11,
											"score": 2167,
											"score_count": 10,
											"set_min_competitive_score": 0,
											"set_min_competitive_score_count": 0,
											"shallow_advance": 0,
											"shallow_advance_count": 0
										}
									},
									{
										"type": "NativeEngineKnnVectorQuery",
										"description": "NativeEngineKnnVectorQuery[]...KNNQuery[]",
										"time_in_nanos": 25468916,
										"breakdown": {
											"advance": 0,
											"advance_count": 0,
											"ann_search": 0,
											"ann_search_count": 0,
											"build_scorer": 287542,
											"build_scorer_count": 3,
											"compute_max_score": 0,
											"compute_max_score_count": 0,
											"create_weight": 25172250,
											"create_weight_count": 1,
											"exact_knn_search": 0,
											"exact_knn_search_count": 0,
											"match": 0,
											"match_count": 0,
											"next_doc": 7374,
											"next_doc_count": 4,
											"score": 1750,
											"score_count": 3,
											"set_min_competitive_score": 0,
											"set_min_competitive_score_count": 0,
											"shallow_advance": 0,
											"shallow_advance_count": 0
										},
										"children": [
											{
												"type": "KNNQuery",
												"description": "",
												"time_in_nanos": 2426625,
												"breakdown": {
													"advance": 0,
													"advance_count": 0,
													"ann_search": 2426625,
													"ann_search_count": 1,
													"build_scorer": 0,
													"build_scorer_count": 0,
													"compute_max_score": 0,
													"compute_max_score_count": 0,
													"create_weight": 0,
													"create_weight_count": 0,
													"exact_knn_search": 0,
													"exact_knn_search_count": 0,
													"match": 0,
													"match_count": 0,
													"next_doc": 0,
													"next_doc_count": 0,
													"score": 0,
													"score_count": 0,
													"set_min_competitive_score": 0,
													"set_min_competitive_score_count": 0,
													"shallow_advance": 0,
													"shallow_advance_count": 0
												}
											}
										]
									}
								]
							}
						],
						"rewrite_time": 429250,
						"collector": [
							{
								"name": "SimpleTopScoreDocCollector",
								"reason": "search_top_hits",
								"time_in_nanos": 375375
							}
						]
					}
				],
				"aggregations": []
			}

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ContextIndexSearcher

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Signed-off-by: Tejas Shah <shatejas@amazon.com>
@shatejas shatejas marked this pull request as ready for review January 27, 2025 21:01
@shatejas shatejas changed the title Adds KNN specific enums for profiling, exposes profiler in Adds KNN specific enums for profiling, exposes profiler in ContextIndexSearcher Jan 27, 2025
Copy link
Contributor

❌ Gradle check result for 51f1446: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -48,7 +48,9 @@ public enum QueryTimingType {
SCORE,
SHALLOW_ADVANCE,
COMPUTE_MAX_SCORE,
SET_MIN_COMPETITIVE_SCORE;
SET_MIN_COMPETITIVE_SCORE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas the core does not know anything about k-nn plugin (or any other plugin per se), this has to be part of the plugin related instrumentation. We may need to think how the profile phases could be extended / customized though, if required.

Copy link
Author

@shatejas shatejas Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I understand, whats the recommendation in that case? this is one way I found to be able to have additional components in profile query

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shatejas we may never run into a need to have such an extensibility feature, so we may have to design in a plugin neural way.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may never run into a need to have such an extensibility feature

Currently there is a need to have time_in_nanos for knn. KNN query is relatively complex, both ann and exact search as well as the filter query inside the knn are major components and there is no visibility on these making it extremely difficult to debug performance issues.

Currently I wasn't able to find a hook to have knn components in query breakdown without these changes.

so we may have to design in a plugin neural way.

can you elaborate whats involved here? if its major change in knn plugin it might have to be iterative and this change might work till then

Copy link
Contributor

@navneet1v navneet1v Jan 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I think agree on designing this in a plugin neutral way. @shatejas lets have extension points in core that can be used by Plugins to provide their QueryTimingTyes.

One idea I can think of here is QueryTimingType would be getting used to put in some string in the profile output. We can create another enum/class which collects all the TimingType from all the plugins and then put them at right place during serialization.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta Thanks, I am looking into it. I haven't found a solution yet. Doing a deep dive on possible options

Copy link
Author

@shatejas shatejas Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I tried another approach which holds additional types per query. Let me know what you think

Other options:

  • Maintain a Query -> QueryBreakdown registry in the profile tree. But I am not sure if there is a use case for it where a plugin wants to override a default types for a query.
  • Maintain a Query -> QueryProfiler registry and get profilers based on Query type falling back to default. Haven't tried it but from what it looks like each profile breakdown is written as a separate json blob

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I tried another approach which holds additional types per query. Let me know what you think

Thanks @shatejas , I believe it is important to evolve the APIs in a consistent way, here is the quick sketch that I would like to hear your opinion on:

  • the plugins contribute queries using SearchPlugin plugin hooks
  • however, there is nothing here regarding the query profiling

It probably would make sense to introduce the QueryProfilerSpec API, that we should let plugins to contribute, couple of options to consider:

  • the QuerySpec could (optionally) supply the corresponding QueryProfilerSpec , or
  • the SearchPlugin may a generic hook like
    default List<QueryProfilerSpec<?>> getQueryProfilers() {
        return emptyList();
    }
    

That would provide a basic to build atop, does it make sense?
Thank you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I found one way to do it shatejas@04cff1c

Currently it uses QueryProfiler instead of AbstractQueryProfiler for simplicity. Overall I don't like the implementation, the context index searcher now is responsible for creating a new instance of profiler. Moreover it has to maintain a state of what profilers were used to be able to send it back to Profilers class.

Moreover, I don't think plugins should be responsible for concurrentProfilers, for one plugin queries simply leverage concurrency from opensearch-core. apart from that concurrentProfler implementation seems pretty complex, it will be a heavy lift for plugins (if they are not allowed to provide instance of existing one).

Please note that we don't need to replace or piggybacking here

Just so I understand the concern - Why should plugins not be allowed to piggyback on existing response if its not polluting the response? I understand that the APIs should be consistent, but the current implementation doesn't seem to allow it.

Copy link
Collaborator

@reta reta Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reta I found one way to do it shatejas@04cff1c

Thanks @shatejas , I will try to look at it shortly

Just so I understand the concern - Why should plugins not be allowed to piggyback on existing response if its not polluting the response? I understand that the APIs should be consistent, but the current implementation doesn't seem to allow it.

To reiterate, I think the default query profiler should be always on. The additional profilers could be introduced. The concept of "piggybacking" with such a design is not needed here.

Signed-off-by: Tejas Shah <shatejas@amazon.com>
@shatejas shatejas changed the title Adds KNN specific enums for profiling, exposes profiler in ContextIndexSearcher Supports additional query timing types for profiling plugin query components Jan 28, 2025
Copy link
Contributor

❌ Gradle check result for 568cfe2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Jan 28, 2025

On the unrelated note, @shatejas please target main branch

@shatejas
Copy link
Author

On the unrelated note, @shatejas please target main branch

@reta there are some issues with knn main branch which make it harder to test. Can open up a PR against main branch once the approach is finalized

Signed-off-by: Tejas Shah <shatejas@amazon.com>
Copy link
Contributor

❌ Gradle check result for fe1c855: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@@ -98,6 +100,10 @@ default QueryBuilder rewrite(QueryRewriteContext queryShardContext) throws IOExc
return this;
}

default Set<String> queryProfilerTimingTypes() {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to remove this

}

public void setTimer(T timing, Timer timer) {
timings[timing.ordinal()] = timer;
timings.put(timing.name().toLowerCase(Locale.ROOT), timer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't timings initialized as an unmodifiableMap on line 71?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, didn't realize I made it unmodifiable. I will make the map modifiable

Comment on lines +91 to +94
for (String timingType : this.timings.keySet()) {
map.put(timingType, this.timings.get(timingType).getApproximateTiming());
map.put(timingType + TIMING_TYPE_COUNT_SUFFIX, this.timings.get(timingType).getCount());
map.put(timingType + TIMING_TYPE_START_TIME_SUFFIX, this.timings.get(timingType).getEarliestTimerStartTime());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an iteration over timings.entrySet().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to entrySets instead

@@ -94,8 +105,8 @@ public Map<String, Object> toDebugMap() {

public long toNodeTime() {
long total = 0;
for (T timingType : timingTypes) {
total += timings[timingType.ordinal()].getApproximateTiming();
for (String timingType : timings.keySet()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be an iteration over timings.entrySet().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to entrySets instead

@@ -50,7 +51,7 @@ public class AggregationProfileBreakdown extends AbstractProfileBreakdown<Aggreg
private final Map<String, Object> extra = new HashMap<>();

public AggregationProfileBreakdown() {
super(AggregationTimingType.class);
super(AggregationTimingType.class, emptySet());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that aggregations from plugins wouldn't want to emit profile data, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for now, I made that assumption. I didn't have experience with aggregations and there wasn't a need in knn plugin. I am open to inputs and making sure its extensible in future if required and not a one way door

@@ -335,6 +338,7 @@ public class SearchModule {
private final SearchPlugin.ExecutorServiceProvider indexSearcherExecutorProvider;

private final Collection<ConcurrentSearchRequestDecider.Factory> concurrentSearchDeciderFactories;
private final Map<Class<? extends Query>, Set<String>> profilerTimingsPerQuery = new HashMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that using Lucene Query classes as the keys for the custom timing is a good idea. In particular, there are a number of Lucene queries whose type changes as a result of the rewrite operation. In some cases, the rewritten query is an anonymous subclass.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point.

The problem is to getting the profile breakdown is completely dependent on Query. To make it context aware and not have unnecessary timing in response, I chose the key as Query.

Since the breakdown is obtained through IndexSearcher there aren't many options if it needs to be context aware to register timings.

any suggestions instead of Query to make it context aware?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants