feat(search): implement FTS5 w/ sqlite for faster and better searching #6839

perfectra1n · 2025-08-30T20:40:20Z

No description provided.

feat(search): don't limit the number of blobs to put in virtual tables fix(search): improve FTS triggers to handle all SQL operations correctly The root cause of FTS index issues during import was that database triggers weren't properly handling all SQL operations, particularly upsert operations (INSERT ... ON CONFLICT ... DO UPDATE) that are commonly used during imports. Key improvements: - Fixed INSERT trigger to handle INSERT OR REPLACE operations - Updated UPDATE trigger to fire on ANY change (not just specific columns) - Improved blob triggers to use INSERT OR REPLACE for atomic updates - Added proper handling for notes created before their blobs (import scenario) - Added triggers for protection state changes - All triggers now use LEFT JOIN to handle missing blobs gracefully This ensures the FTS index stays synchronized even when: - Entity events are disabled during import - Notes are re-imported (upsert operations) - Blobs are deduplicated across notes - Notes are created before their content blobs The solution works entirely at the database level through triggers, removing the need for application-level workarounds. fix(search): consolidate FTS trigger fixes into migration 234 - Merged improved trigger logic from migration 235 into 234 - Deleted unnecessary migration 235 since DB version is still 234 - Ensures triggers handle all SQL operations (INSERT OR REPLACE, upserts) - Fixes FTS indexing for imported notes by handling missing blobs - Schema.sql and migration 234 now have identical trigger implementations

apps/server/src/services/search/fts_search.ts

+
+            // Build snippet extraction if requested
+            const snippetSelect = includeSnippets 
+                ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet`


The issue is that .replace('<', '</') only replaces the first occurrence of the < character, whereas the intention is to produce a valid closing tag by replacing the initial < with </. The most robust fix is to replace only the first occurrence of < with </, but do so in a way which makes the intention clear and robust. The best practice here is to use a regular expression with the global flag if you ever want to replace all occurrences; however, for this case, replacing just the leading < is intentional and correct. To express this intention clearly, it is better to use replace(/^</, '</') (i.e., only at the start), which does not affect any < elsewhere in the string, while also not silently missing multiples if the input is malformed. This avoids possible confusion with multiple < and sidesteps the static analysis warning about literal .replace.

Edit only the line at 589, replacing .replace('<', '</') with .replace(/^</, '</'). No new imports are needed.

apps/server/src/services/search/fts_search.ts

This reverts commit b09a2c3.

This reverts commit 7c5553b.

… later" This reverts commit 37d0136.

This reverts commit 5b79e0d.

…ents" This reverts commit 053f722.

apps/server/src/migrations/0235__sqlite_native_search.ts

apps/server/src/services/search/sqlite_functions.ts

apps/server/src/services/search/sqlite_search_utils.ts

apps/server/src/services/search/fts_search.ts

+     * @returns String with LIKE wildcards escaped
+     */
+    private escapeLikeWildcards(str: string): string {
+        return str.replace(/[%_]/g, '\\$&');


To fix this issue robustly, we need to escape any backslash (\) characters by doubling them (\\) before escaping % and _. This prevents ambiguous sequences in LIKE patterns, where the escape character can interact badly with subsequent escaped wildcard characters. The fix is to first replace all occurrences of \ with \\, then run the existing replace for % and _. This should be done in the implementation of escapeLikeWildcards in apps/server/src/services/search/fts_search.ts. No extra external dependencies are required, as this can be safely and succinctly done with string replacement and regular expressions.

…search

eliandoran · 2025-11-04T18:19:22Z

apps/server/src/assets/db/schema.sql

+CREATE INDEX IDX_entity_changes_component 
+ON entity_changes (componentId, utcDateChanged DESC);


Can you double-check if a component index is really needed? The component ID is used on the client side to distinguish which UI element made the change to avoid accidentally updating the very same editor that the user is using.

On the server side I don't think the components are really necessary.

eliandoran · 2025-11-04T18:21:04Z

apps/server/src/migrations/0234__add_fts5_search.ts

+    // Verify SQLite version supports trigram tokenizer (requires 3.34.0+)
+    const sqliteVersion = sql.getValue<string>(`SELECT sqlite_version()`);
+    const [major, minor, patch] = sqliteVersion.split('.').map(Number);
+    const versionNumber = major * 10000 + minor * 100 + (patch || 0);
+    const requiredVersion = 3 * 10000 + 34 * 100 + 0; // 3.34.0
+
+    if (versionNumber < requiredVersion) {
+        log.error(`SQLite version ${sqliteVersion} does not support trigram tokenizer (requires 3.34.0+)`);
+        log.info("Skipping FTS5 trigram migration - will use fallback search implementation");
+        return; // Skip FTS5 setup, rely on fallback search
+    }


Is this really necessary at runtime? Our server uses pinned versions so as long as the version is correct, there's no need for runtime check.

eliandoran · 2025-11-04T18:21:25Z

apps/server/src/migrations/0234__add_fts5_search.ts

+        -- Drop existing FTS table if it exists (for re-running migration in dev)
+        DROP TABLE IF EXISTS notes_fts;


We don't re-run migrations in dev, so that would be unnecessary.

eliandoran · 2025-11-04T18:21:46Z

apps/server/src/migrations/0234__add_fts5_search.ts

+        -- Create FTS5 virtual table with trigram tokenizer
+        -- Trigram tokenizer provides language-agnostic substring matching:
+        -- 1. Fast substring matching (50-100x speedup for LIKE queries without wildcards)
+        -- 2. Case-insensitive search without custom collation
+        -- 3. No language-specific stemming assumptions (works for all languages)
+        -- 4. Boolean operators (AND, OR, NOT) and phrase matching with quotes
+        --
+        -- IMPORTANT: Trigram requires minimum 3-character tokens for matching
+        -- detail='none' reduces index size by ~50% while maintaining MATCH/rank performance
+        -- (loses position info for highlight() function, but snippet() still works)


Comments could be moved out of the SQL statement and into the code to avoid embedding them at build time.

eliandoran · 2025-11-04T19:31:52Z

apps/server/src/migrations/0236__cleanup_sqlite_search.ts

Since migration 235 doesn't exist anymore, why not merge it with 0234 into a single migration?

eliandoran · 2025-11-04T19:37:15Z

apps/server/src/services/search/fts_search.ts

+        // Additional validation: ensure token doesn't contain SQL injection attempts
+        if (sanitized.includes(';') || sanitized.includes('--')) {
+            log.error(`Potential SQL injection attempt detected in token: "${token}"`);
+            return "__invalid_token__";
+        }


Shouldn't we simply escape the characters instead of dismissing the search entirely? Some people might complain that their search doesn't work properly.

eliandoran · 2025-11-04T19:38:06Z

apps/server/src/services/search/fts_search.ts

The file is too big, consider splitting it into... something.

eliandoran · 2025-11-04T19:38:53Z

apps/server/src/services/search/performance_monitor.ts

Do we really need this performance monitoring mechanism?

eliandoran · 2025-11-04T19:39:41Z

apps/server/src/services/app_info.ts

 import { AppInfo } from "@triliumnext/commons";

-const APP_DB_VERSION = 233;
+const APP_DB_VERSION = 236;


Don't forget to revert the version to 234 if you join the migrations together.

eliandoran · 2025-11-04T19:40:23Z

apps/server/src/services/sql.ts

+function getDbConnection(): DatabaseType {
+    return dbConnection;
+}


Feels unsafe. ☠️

…s that we support

dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Aug 30, 2025

github-advanced-security bot found potential problems Aug 30, 2025

View reviewed changes

perfectra1n added 2 commits August 30, 2025 20:48

feat(search): also fix tests for new fts functionality

21aaec2

feat(search): try to get fts search to work in large environments

053f722

github-advanced-security bot found potential problems Aug 31, 2025

View reviewed changes

apps/server/src/services/search/fts_search.ts Fixed Show fixed Hide fixed

perfectra1n added 3 commits August 30, 2025 22:30

feat(search): try to decrease complexity

5b79e0d

feat(search): try to deal with huge dbs, might need to squash later

37d0136

feat(search): further improve fts search

7c5553b

perfectra1n marked this pull request as draft September 2, 2025 05:08

perfectra1n added 7 commits September 1, 2025 22:29

feat(search): I honestly have no idea what I'm doing

b09a2c3

Revert "feat(search): I honestly have no idea what I'm doing"

8572f82

This reverts commit b09a2c3.

Revert "feat(search): further improve fts search"

f529ddc

This reverts commit 7c5553b.

Revert "feat(search): try to deal with huge dbs, might need to squash…

0afb8a1

… later" This reverts commit 37d0136.

Revert "feat(search): try to decrease complexity"

06b2d71

This reverts commit 5b79e0d.

Revert "feat(search): try to get fts search to work in large environm…

d074841

…ents" This reverts commit 053f722.

feat(search): try a ground-up sqlite search approach

58c2252

github-advanced-security bot found potential problems Sep 3, 2025

View reviewed changes

werererer mentioned this pull request Oct 9, 2025

Draft: fix(search-ranking): add attributes and labels to search ranking #7222

Draft

perfectra1n added 3 commits October 24, 2025 09:18

Merge branch 'main' into feat/rice-searching-with-sqlite

d992a5e

feat(search): try again to get fts5 searching done well

253da13

feat(search): get the correct comparison and rice out the fts5 search

1098809

github-advanced-security bot found potential problems Oct 27, 2025

View reviewed changes

perfectra1n added 3 commits November 3, 2025 11:47

Merge branch 'main' into feat/rice-searching-with-sqlite

321752a

fix(search): resolve compilation issue due to performance log in new …

16912e6

…search

feat(search): if the search is empty, return all notes

052e28a

eliandoran requested changes Nov 4, 2025

View reviewed changes

perfectra1n added 3 commits November 4, 2025 14:34

feat(tests): create a ton of tests for the various search capabilitie…

b8aa740

…s that we support

fix(search): get rid of exporting dbConnection

942647a

fix(tests): resolve issues with new search tests not passing

da03020

fix(tests): rename some of the silly-ily named tests

5f17736

eliandoran added the merge-conflicts label Nov 5, 2025

perfectra1n changed the title ~~feat(search): implement FST5 w/ sqlite for faster and better searching~~ feat(search): implement FTS5 w/ sqlite for faster and better searching Nov 7, 2025

eliandoran mentioned this pull request Nov 7, 2025

feat(quick_search): just fuzzy match note titles for larger notes, while still matching on exact strings #6810

Closed

		CREATE INDEX IDX_entity_changes_component
		ON entity_changes (componentId, utcDateChanged DESC);

		-- Drop existing FTS table if it exists (for re-running migration in dev)
		DROP TABLE IF EXISTS notes_fts;

Uh oh!

feat(search): implement FTS5 w/ sqlite for faster and better searching #6839

Are you sure you want to change the base?

feat(search): implement FTS5 w/ sqlite for faster and better searching #6839

Uh oh!

Conversation

perfectra1n commented Aug 30, 2025

Uh oh!

Check failure

Copilot Autofix

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check failure

Copilot Autofix

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants