Search: Fix word exclusion in client side search #13893

cglukas · 2025-09-15T09:19:32Z

Purpose

The built-in search is capable of excluding search terms. Thats a great feature which would make the search a lot better!
Unfortunately, there are two blocking components:

The splitQuery will discard hyphens which define the excluded terms
The performTermsSearch will abort the search if any excluded term is matched

References

Closes #13892

…cted token.

…ne page.

cglukas · 2025-09-15T09:35:32Z

sphinx/themes/basic/static/searchtools.js

-      .filter((term) => term); // remove remaining empty strings
+  var splitQuery = (query) => {
+    const consecutiveLetters =
+      /[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu;


IDK if it's bad to define is regex first and reference it later in the new regex. It could also be plain text. But I thought this way, the origin of the regex can be easier understood.

cglukas · 2025-09-15T09:38:44Z

sphinx/themes/basic/static/searchtools.js

+        })
      )
-        break;
+        continue;


This is probably the most important change!

tests/js/searchtools.spec.js

cglukas · 2025-09-15T09:47:58Z

Regarding the CI: I don't see any changes on my PR which would trigger the current CI fail. Is this a common issue? TBH it does not look like it's affecting the master branch too 🤔. I'm a little aimless what to do here.

jayaddison · 2025-09-16T08:48:14Z

Regarding the CI: I don't see any changes on my PR which would trigger the current CI fail. Is this a common issue? TBH it does not look like it's affecting the master branch too 🤔. I'm a little aimless what to do here.

That's OK, yep - I believe that is due to bug #13886 (in progress, potentially to be fixed by #13883).

# Conflicts: # CHANGES.rst

…earch-terms # Conflicts: # CHANGES.rst

jayaddison · 2025-10-12T14:07:05Z

A delayed thought here: adding the exclusion operator to hyphenated query terms could cause unexpected results.

For example, the query example -test-case currently parses to ["example", "-test", "case"], I think.

jayaddison · 2025-10-12T19:48:24Z

sphinx/themes/basic/static/searchtools.js

+  var splitQuery = (query) => {
+    const consecutiveLetters =
+      /[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu;
+    const searchWords = new RegExp(
+      `(${consecutiveLetters.source})|\\s(-${consecutiveLetters.source})`,
+      "gu",
+    );
+    return Array.from(
+      query
+        .matchAll(searchWords)
+        .map((results) => results[1] ?? results[2]) // select one of the possible groups (e.g. "word" or "-word").
+        .filter((term) => term), // remove remaining empty strings.
+    );
+  };


I spent some time trying to find an equivalent that is more minimal in terms of lines-of-code / characters-of-code changed.

The following isn't hugely readable -- it's a complex regex, but it essentially enables splits on the - character, provided that a lookbehind for whitespace fails.

In other words: adds - as a split boundary, but only if it is found within a word.

Suggested change

var splitQuery = (query) => {

const consecutiveLetters =

/[\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu;

const searchWords = new RegExp(

`(${consecutiveLetters.source})|\\s(-${consecutiveLetters.source})`,

"gu",

);

return Array.from(

query

.matchAll(searchWords)

.map((results) => results[1] ?? results[2]) // select one of the possible groups (e.g. "word" or "-word").

.filter((term) => term), // remove remaining empty strings.

);

};

var splitQuery = (query) =>

query

.split(/(?<!\s)[-]|[^\p{Letter}\p{Number}\-_\p{Emoji_Presentation}]+/gu)

.filter((term) => term); // remove remaining empty strings

Idk how to apply this suggestion on the mobile app 😅. Will do it at home.

jayaddison · 2025-10-12T19:54:00Z

sphinx/themes/basic/static/searchtools.js

+        [...excludedTerms].some((excludedTerm) => {
+          // Both mappings will contain either a single integer or a list of integers.
+          // Converting them to lists makes the comparison more readable.
+          let excludedTermFiles = [].concat(terms[excludedTerm]);
+          let excludedTitleFiles = [].concat(titleTerms[excludedTerm]);
+          return (
+            excludedTermFiles.includes(file)
+            || excludedTitleFiles.includes(file)
+          );
+        })


Suggested change

[...excludedTerms].some((excludedTerm) => {

// Both mappings will contain either a single integer or a list of integers.

// Converting them to lists makes the comparison more readable.

let excludedTermFiles = [].concat(terms[excludedTerm]);

let excludedTitleFiles = [].concat(titleTerms[excludedTerm]);

return (

excludedTermFiles.includes(file)

|| excludedTitleFiles.includes(file)

);

})

[...excludedTerms].some(

(term) =>

terms[term] === file

|| titleTerms[term] === file

|| (terms[term] || []).includes(file)

|| (titleTerms[term] || []).includes(file),

)

Do we need this set of excludedTerms filtering changes? I would suggest not modifying these lines unless strictly necessary. Tests continue to pass when I revert this.

I acknowledge that the break to continue fixup is important though.

Ok, I'll need to add another test then. The last conditions raise an error in some cases. I'll add it soon.

cglukas added 5 commits September 13, 2025 11:15

Fix that negated search words are filtered out in the splitQuery.

0449e48

Fix that negated search words create a SyntaxError: Invalid or unexpe…

dbfbd46

…cted token.

Fix that excluded words would abort the entire search if matched in o…

7a8d46e

…ne page.

Fix format and add contribution information.

cfdfda4

Fix long line.

80d6ad1

cglukas commented Sep 15, 2025

View reviewed changes

tests/js/searchtools.spec.js Show resolved Hide resolved

cglukas and others added 5 commits September 19, 2025 22:11

Add a dedicated fixture for testing excluded words in searches.

9b06fd5

Is the pipeline flaky?

11d2e2e

Add missing file.

6cc4055

Remove debug change.

ffc193b

Merge branch 'master' into work/fix-excluded-search-terms

7b79f42

# Conflicts: # CHANGES.rst

AA-Turner added the html search label Oct 5, 2025

cglukas added 2 commits October 7, 2025 20:45

Reduce line length to fix doclinter.

a4764b1

Merge remote-tracking branch 'origin/master' into work/fix-excluded-s…

234fedc

…earch-terms # Conflicts: # CHANGES.rst

jayaddison reviewed Oct 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Search: Fix word exclusion in client side search #13893

Search: Fix word exclusion in client side search #13893

Uh oh!

cglukas commented Sep 15, 2025

Uh oh!

cglukas Sep 15, 2025

Uh oh!

cglukas Sep 15, 2025

Uh oh!

Uh oh!

cglukas commented Sep 15, 2025

Uh oh!

jayaddison commented Sep 16, 2025

Uh oh!

jayaddison commented Oct 12, 2025

Uh oh!

jayaddison Oct 12, 2025

Uh oh!

cglukas Oct 14, 2025

Uh oh!

jayaddison Oct 12, 2025

Uh oh!

cglukas Oct 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Search: Fix word exclusion in client side search #13893

Are you sure you want to change the base?

Search: Fix word exclusion in client side search #13893

Uh oh!

Conversation

cglukas commented Sep 15, 2025

Purpose

References

Uh oh!

cglukas Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

cglukas Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cglukas commented Sep 15, 2025

Uh oh!

jayaddison commented Sep 16, 2025

Uh oh!

jayaddison commented Oct 12, 2025

Uh oh!

jayaddison Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

cglukas Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

jayaddison Oct 12, 2025

Choose a reason for hiding this comment

Uh oh!

cglukas Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants