-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unused simple query string features #3360
Conversation
8230586
to
e7fb1a3
Compare
Size Change: -1.29 kB (0%) Total Size: 1.01 MB
ℹ️ View Unchanged
|
I'm hesitant to ask this because I know it's a significant amount of work, but this change (even though so few people use it!) does seem non-trivial for how our search functions. Do you feel it might be appropriate to have this change be under a different API version (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I'm so excited to see if this improves anything on our search timings 👀
Based on Olga's analysis of the request logs and what is actually used, it really seems like the features she's suggested to turn off are just not actually used, or used so rarely that we wouldn't be breaking a feature that people actually rely on. Also: API versioning is a massive undertaking that we should do a significant amount of planning for (#662) and for that to block what is hypothetically a significant improvement to query performance (one that API consumers would not automatically benefit from, and therefore one that we wouldn't see the most benefit from in our infrastructure and response times) would be devastating to me. From my perspective, balancing the relatively small (statistically zero) actual uses of these features against the known benefits to performance, and then again against the known complexity and time investment that versioning would take (and additionally the known issue that we wouldn't see the beneficial change across the board, only in requests that change to v2), makes me think there's no reasonable way to argue for us to version this change. The benefits of versioning negate the immediate benefit of rolling out this change. If we hear that someone is actually using and relying on this feature, then we can selectively turn them on when a query includes key characters ( |
Thanks for your thoughts Sara 🙂 I agree, my impression was to move forward regardless but I wanted to check with folks to make sure that sense was shared. And we can move forward with confidence thanks to the great analysis Olga did, too! |
Do you think it would be helpful to add this and add logging for when the query has a key character so that we can analyze how much they are used? Now that the frontend page does not have these links, we will probably get more accurate usage numbers. |
Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR: @AetherUnbound Excluding weekend1 days, this PR was ready for review 2 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Approving to unblock, although breaking compatibility is not very appealing. It we can conditionally enable those features when the strings have the special characters, that would be ideal.
I asked one of the Elasticsearch expert folks we've been talking to about the response time issues whether this would make a difference. My thinking being, I'd be surprised if ES doesn't itself already do something like a In other words, if these features aren't used very heavily, then maybe they aren't causing any performance difference. On the other hand, if they are enabled, even if conditionally (whether by us or by relying on ES to correctly ignore features that aren't in the query syntax), then it leaves open the possibility for someone to cause our service overwhelming load by sending heaps of these kinds of requests with complex features. From a service resilience standpoint, it's worth turning these features off entirely because they break compatibility. From that perspective, ideally we turn them off soon, before there is any large scale reliance on them. I'll update here once I've heard from our ES expert colleague. For now my sense is that either merging this PR as is (turn features off entirely) or not doing it at all are the two options that will make sense. That's unless our colleague lets us know that there is in fact somehow a difference between a feature being off and a feature being not used in a query but still technically available. If that is the case, then the conditional checks in our code ( |
This sounds like a good option too, @obulat. If you're comfortable with this PR drafting and us getting that analysis information for a few weeks, it would be good information to help us make an accurate decision. |
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
1d60c9d
to
c6f9a28
Compare
Signed-off-by: Olga Bulat <obulat@gmail.com> # Conflicts: # api/test/unit/controllers/test_search_controller_search_query.py
# Conflicts: # api/test/unit/controllers/test_search_controller_search_query.py
Signed-off-by: Olga Bulat <obulat@gmail.com>
I've added the logging for query strings that contain one of the 4 symbols, but kept the flags off. This is why I requested a re-review, @dhruvkb, @sarayourfriend. |
] | ||
for flag, pattern in flags: | ||
if bool(re.search(pattern, query)): | ||
log.info(f"Special feature in `{query_name}` query string. {flag}: {query}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, but it would produce less logs overall while making it easier to tally them if the logging did something like, counting the instances of each flag and logging all flags at once in JSON like { "log_message": "Special features present in query", "query_name": query_name, "query": query} | {flag: count_flag(flag, query) for flag in flags }
(one-liner just for commenting, not to suggest that's the way to do it).
No big deal, though. This way works too, it just produces slightly more logs (not many, because we know these are used so rarely and mostly following the frontend examples) and requires parsing the message string rather than using Logs Insights discovered fields feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this suggestion, I was actually not certain that the logging pattern would be good, and I didn't even realize that there could be several flags used in a single search. I'll update the logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the logging, now if you go to http://0.0.0.0:50280/v1/images/?q=cat+net*+(dog)+%5C(+someth~2
, you get the following message logged:
[2023-11-30 08:32:53,795 - root - 362][INFO] [3419191a256b4058b2e8ffd5066b1ec2] {'log_message': 'Special features present in query', 'query_name': 'q', 'query': 'cat net* (dog) \\( someth~2', 'flags': ['PRECEDENCE', 'ESCAPE', 'FUZZY|SLOP', 'PREFIX']}
I don't think we need to log the flag cound separately, do you, @sarayourfriend ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the raw count isn't that important, is it! Good call 👍 The change LGTM.
Signed-off-by: Olga Bulat <obulat@gmail.com>
Fixes
Fixes #3327 by @sarayourfriend
Description
This PR removes the following feature from the simple query string:
The following features are left in:
For more reasoning on why each feature is removed or left in, see the linked issue.
The search guide page removed mentions of the features that were removed. The text was also simplified to make it clearer now that there are only 2 features left. The
+
feature wasn't removed from simple search query in the API, however, since the behavior of "cat dog" is the same as "cat+dog", I removed it from the search syntax description page to prevent confusion.Testing Instructions
Run the app. The features that were turned off should not affect the search results (
()
precedence,~
fuzziness,), but the"
quotes for the exact match,-
for exclusion,+
forand
should work as they did before.The frontend
/search-help
page text should be clear and describe the search query syntax.Also, look through the query features that were left in/removed, and see if you agree with it.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin