-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify the related search #3151
Conversation
Quick note that the filtered index only filters sensitive text, so user reported sensitive media will still be in there (just with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is a solid improvement to the related/
endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I see that #3156 builds off of this but in a way that makes some of these changes irrelevant, is that correct?
Yes. I wanted to implement the simple "quick win" approach on Friday, thinking that the "more correct" approach would take a long time and be more difficult. Then, next day I realized that #3156 would actually be doable, I opened it, and based it on this PR (because here we remove the |
search_client = Search(index=index) | ||
# Search the default index for the item itself as it might be sensitive. | ||
item_search = Search(index=index) | ||
item_hit = item_search.query("match", identifier=uuid).execute().hits[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use identifier__keyword
and term
to avoid stemming on the text
field that identifier
is currently. This issue came up during the development of the SearchContext::build
method where I noticed I wasn't getting the exact list of identifiers back 😱
tags = ",".join([tag.name for tag in tags[:10]]) | ||
tags_query = SimpleQueryString(fields=["tags.name"], query=tags) | ||
related_query |= tags_query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a multi_match wouldn't be more straightforward than constructing a simple query string query? If we do use simple query string, though, ,
isn't an operator available, as far as I can tell: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html#simple-query-string-syntax
A space should be sufficient to use the default OR
operator. In this case even query_string
could work fine.
_id = item.execute().hits[0].id | ||
# Match related using title. | ||
title = item_hit.title | ||
title_query = SimpleQueryString(query=title, fields=["title"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if match_phrase
is more appropriate for title? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to share some initial comments but this is very cool, I'm optimistic it could lead to a genuine improvement in the related results. Do we have an event on related result clicking? I'd be keen to compare before/after if this change goes through to see whether it improves things, especially if we deploy the creator change first and then this afterwards. If there are progressively positive changes, we'd at least have a gut check about what users are looking for in the related query (that being a point besides potentially improving the performance of this route).
I couldn't find a way to compare the amount of related clicks to the search result clicks in Plausible UI, because we can't select clicks on all "related" items, only clicks on an item that is related to specific item id. |
* Match related by title and tags * Use default index for item search * Handle ES hits without tags * Use the first 10 tags for the query --------- Co-authored-by: Dhruv Bhanushali <dhruv_b@live.com>
661acf2
to
07d6b65
Compare
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Fixes
Fixes #3149 by @obulat
Description
This PR:
related
searchremoves the(removing this filter would show the user-reported mature items, so I put it back)mature
: False filter since we use the filtered indexcreator
from the fields that are used in theMoreLikeThis
queryTesting Instructions
Check that the related endpoints (the
related
in image and audio results) work as expected.To see this work visually, you can run the frontend Nuxt app using the local API:
frontend/.env
:API_URL=http://0.0.0.0:50280/
just up
.just frontend/run dev
.Now, you can search for something, open the single result page and see that it has the related items.
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin