-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat(search): implement FTS5 w/ sqlite for faster and better searching #6839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
feat(search): don't limit the number of blobs to put in virtual tables fix(search): improve FTS triggers to handle all SQL operations correctly The root cause of FTS index issues during import was that database triggers weren't properly handling all SQL operations, particularly upsert operations (INSERT ... ON CONFLICT ... DO UPDATE) that are commonly used during imports. Key improvements: - Fixed INSERT trigger to handle INSERT OR REPLACE operations - Updated UPDATE trigger to fire on ANY change (not just specific columns) - Improved blob triggers to use INSERT OR REPLACE for atomic updates - Added proper handling for notes created before their blobs (import scenario) - Added triggers for protection state changes - All triggers now use LEFT JOIN to handle missing blobs gracefully This ensures the FTS index stays synchronized even when: - Entity events are disabled during import - Notes are re-imported (upsert operations) - Blobs are deduplicated across notes - Notes are created before their content blobs The solution works entirely at the database level through triggers, removing the need for application-level workarounds. fix(search): consolidate FTS trigger fixes into migration 234 - Merged improved trigger logic from migration 235 into 234 - Deleted unnecessary migration 235 since DB version is still 234 - Ensures triggers handle all SQL operations (INSERT OR REPLACE, upserts) - Fixes FTS indexing for imported notes by handling missing blobs - Schema.sql and migration 234 now have identical trigger implementations
|
|
||
| // Build snippet extraction if requested | ||
| const snippetSelect = includeSnippets | ||
| ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet` |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
The issue is that .replace('<', '</') only replaces the first occurrence of the < character, whereas the intention is to produce a valid closing tag by replacing the initial < with </. The most robust fix is to replace only the first occurrence of < with </, but do so in a way which makes the intention clear and robust. The best practice here is to use a regular expression with the global flag if you ever want to replace all occurrences; however, for this case, replacing just the leading < is intentional and correct. To express this intention clearly, it is better to use replace(/^</, '</') (i.e., only at the start), which does not affect any < elsewhere in the string, while also not silently missing multiples if the input is malformed. This avoids possible confusion with multiple < and sidesteps the static analysis warning about literal .replace.
Edit only the line at 589, replacing .replace('<', '</') with .replace(/^</, '</'). No new imports are needed.
-
Copy modified line R589
| @@ -586,7 +586,7 @@ | ||
|
|
||
| // Build snippet extraction if requested | ||
| const snippetSelect = includeSnippets | ||
| ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet` | ||
| ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace(/^</, '</')}', '...', ${snippetLength}) as snippet` | ||
| : ''; | ||
|
|
||
| const query = ` |
| * @returns String with LIKE wildcards escaped | ||
| */ | ||
| private escapeLikeWildcards(str: string): string { | ||
| return str.replace(/[%_]/g, '\\$&'); |
Check failure
Code scanning / CodeQL
Incomplete string escaping or encoding High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix this issue robustly, we need to escape any backslash (\) characters by doubling them (\\) before escaping % and _. This prevents ambiguous sequences in LIKE patterns, where the escape character can interact badly with subsequent escaped wildcard characters. The fix is to first replace all occurrences of \ with \\, then run the existing replace for % and _. This should be done in the implementation of escapeLikeWildcards in apps/server/src/services/search/fts_search.ts. No extra external dependencies are required, as this can be safely and succinctly done with string replacement and regular expressions.
-
Copy modified lines R203-R204
| @@ -200,7 +200,8 @@ | ||
| * @returns String with LIKE wildcards escaped | ||
| */ | ||
| private escapeLikeWildcards(str: string): string { | ||
| return str.replace(/[%_]/g, '\\$&'); | ||
| // First escape backslashes, then % and _ for LIKE patterns | ||
| return str.replace(/\\/g, '\\\\').replace(/[%_]/g, '\\$&'); | ||
| } | ||
|
|
||
| /** |
| CREATE INDEX IDX_entity_changes_component | ||
| ON entity_changes (componentId, utcDateChanged DESC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you double-check if a component index is really needed? The component ID is used on the client side to distinguish which UI element made the change to avoid accidentally updating the very same editor that the user is using.
On the server side I don't think the components are really necessary.
| // Verify SQLite version supports trigram tokenizer (requires 3.34.0+) | ||
| const sqliteVersion = sql.getValue<string>(`SELECT sqlite_version()`); | ||
| const [major, minor, patch] = sqliteVersion.split('.').map(Number); | ||
| const versionNumber = major * 10000 + minor * 100 + (patch || 0); | ||
| const requiredVersion = 3 * 10000 + 34 * 100 + 0; // 3.34.0 | ||
|
|
||
| if (versionNumber < requiredVersion) { | ||
| log.error(`SQLite version ${sqliteVersion} does not support trigram tokenizer (requires 3.34.0+)`); | ||
| log.info("Skipping FTS5 trigram migration - will use fallback search implementation"); | ||
| return; // Skip FTS5 setup, rely on fallback search | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary at runtime? Our server uses pinned versions so as long as the version is correct, there's no need for runtime check.
| -- Drop existing FTS table if it exists (for re-running migration in dev) | ||
| DROP TABLE IF EXISTS notes_fts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't re-run migrations in dev, so that would be unnecessary.
| -- Create FTS5 virtual table with trigram tokenizer | ||
| -- Trigram tokenizer provides language-agnostic substring matching: | ||
| -- 1. Fast substring matching (50-100x speedup for LIKE queries without wildcards) | ||
| -- 2. Case-insensitive search without custom collation | ||
| -- 3. No language-specific stemming assumptions (works for all languages) | ||
| -- 4. Boolean operators (AND, OR, NOT) and phrase matching with quotes | ||
| -- | ||
| -- IMPORTANT: Trigram requires minimum 3-character tokens for matching | ||
| -- detail='none' reduces index size by ~50% while maintaining MATCH/rank performance | ||
| -- (loses position info for highlight() function, but snippet() still works) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments could be moved out of the SQL statement and into the code to avoid embedding them at build time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since migration 235 doesn't exist anymore, why not merge it with 0234 into a single migration?
| // Additional validation: ensure token doesn't contain SQL injection attempts | ||
| if (sanitized.includes(';') || sanitized.includes('--')) { | ||
| log.error(`Potential SQL injection attempt detected in token: "${token}"`); | ||
| return "__invalid_token__"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we simply escape the characters instead of dismissing the search entirely? Some people might complain that their search doesn't work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is too big, consider splitting it into... something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this performance monitoring mechanism?
| import { AppInfo } from "@triliumnext/commons"; | ||
|
|
||
| const APP_DB_VERSION = 233; | ||
| const APP_DB_VERSION = 236; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to revert the version to 234 if you join the migrations together.
apps/server/src/services/sql.ts
Outdated
| function getDbConnection(): DatabaseType { | ||
| return dbConnection; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels unsafe. ☠️
No description provided.