-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Search: Fix word exclusion in client side search #13893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Search: Fix word exclusion in client side search #13893
Conversation
.filter((term) => term); // remove remaining empty strings | ||
var splitQuery = (query) => { | ||
const consecutiveLetters = | ||
/[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK if it's bad to define is regex first and reference it later in the new regex. It could also be plain text. But I thought this way, the origin of the regex can be easier understood.
}) | ||
) | ||
break; | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the most important change!
Regarding the CI: I don't see any changes on my PR which would trigger the current CI fail. Is this a common issue? TBH it does not look like it's affecting the master branch too 🤔. I'm a little aimless what to do here. |
That's OK, yep - I believe that is due to bug #13886 (in progress, potentially to be fixed by #13883). |
…earch-terms # Conflicts: # CHANGES.rst
A delayed thought here: adding the exclusion operator to hyphenated query terms could cause unexpected results. For example, the query |
var splitQuery = (query) => { | ||
const consecutiveLetters = | ||
/[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu; | ||
const searchWords = new RegExp( | ||
`(${consecutiveLetters.source})|\\s(-${consecutiveLetters.source})`, | ||
"gu", | ||
); | ||
return Array.from( | ||
query | ||
.matchAll(searchWords) | ||
.map((results) => results[1] ?? results[2]) // select one of the possible groups (e.g. "word" or "-word"). | ||
.filter((term) => term), // remove remaining empty strings. | ||
); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time trying to find an equivalent that is more minimal in terms of lines-of-code / characters-of-code changed.
The following isn't hugely readable -- it's a complex regex, but it essentially enables splits on the -
character, provided that a lookbehind for whitespace fails.
In other words: adds -
as a split boundary, but only if it is found within a word.
var splitQuery = (query) => { | |
const consecutiveLetters = | |
/[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu; | |
const searchWords = new RegExp( | |
`(${consecutiveLetters.source})|\\s(-${consecutiveLetters.source})`, | |
"gu", | |
); | |
return Array.from( | |
query | |
.matchAll(searchWords) | |
.map((results) => results[1] ?? results[2]) // select one of the possible groups (e.g. "word" or "-word"). | |
.filter((term) => term), // remove remaining empty strings. | |
); | |
}; | |
var splitQuery = (query) => | |
query | |
.split(/(?<!\s)[-]|[^\p{Letter}\p{Number}\-_\p{Emoji_Presentation}]+/gu) | |
.filter((term) => term); // remove remaining empty strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk how to apply this suggestion on the mobile app 😅. Will do it at home.
[...excludedTerms].some((excludedTerm) => { | ||
// Both mappings will contain either a single integer or a list of integers. | ||
// Converting them to lists makes the comparison more readable. | ||
let excludedTermFiles = [].concat(terms[excludedTerm]); | ||
let excludedTitleFiles = [].concat(titleTerms[excludedTerm]); | ||
return ( | ||
excludedTermFiles.includes(file) | ||
|| excludedTitleFiles.includes(file) | ||
); | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...excludedTerms].some((excludedTerm) => { | |
// Both mappings will contain either a single integer or a list of integers. | |
// Converting them to lists makes the comparison more readable. | |
let excludedTermFiles = [].concat(terms[excludedTerm]); | |
let excludedTitleFiles = [].concat(titleTerms[excludedTerm]); | |
return ( | |
excludedTermFiles.includes(file) | |
|| excludedTitleFiles.includes(file) | |
); | |
}) | |
[...excludedTerms].some( | |
(term) => | |
terms[term] === file | |
|| titleTerms[term] === file | |
|| (terms[term] || []).includes(file) | |
|| (titleTerms[term] || []).includes(file), | |
) |
Do we need this set of excludedTerms
filtering changes? I would suggest not modifying these lines unless strictly necessary. Tests continue to pass when I revert this.
I acknowledge that the break
to continue
fixup is important though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll need to add another test then. The last conditions raise an error in some cases. I'll add it soon.
Purpose
The built-in search is capable of excluding search terms. Thats a great feature which would make the search a lot better!
Unfortunately, there are two blocking components:
splitQuery
will discard hyphens which define the excluded termsperformTermsSearch
will abort the search if any excluded term is matchedReferences
Closes #13892