Skip to content

Conversation

@perfectra1n
Copy link
Member

No description provided.

feat(search): don't limit the number of blobs to put in virtual tables

fix(search): improve FTS triggers to handle all SQL operations correctly

The root cause of FTS index issues during import was that database triggers
weren't properly handling all SQL operations, particularly upsert operations
(INSERT ... ON CONFLICT ... DO UPDATE) that are commonly used during imports.

Key improvements:
- Fixed INSERT trigger to handle INSERT OR REPLACE operations
- Updated UPDATE trigger to fire on ANY change (not just specific columns)
- Improved blob triggers to use INSERT OR REPLACE for atomic updates
- Added proper handling for notes created before their blobs (import scenario)
- Added triggers for protection state changes
- All triggers now use LEFT JOIN to handle missing blobs gracefully

This ensures the FTS index stays synchronized even when:
- Entity events are disabled during import
- Notes are re-imported (upsert operations)
- Blobs are deduplicated across notes
- Notes are created before their content blobs

The solution works entirely at the database level through triggers,
removing the need for application-level workarounds.

fix(search): consolidate FTS trigger fixes into migration 234

- Merged improved trigger logic from migration 235 into 234
- Deleted unnecessary migration 235 since DB version is still 234
- Ensures triggers handle all SQL operations (INSERT OR REPLACE, upserts)
- Fixes FTS indexing for imported notes by handling missing blobs
- Schema.sql and migration 234 now have identical trigger implementations
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Aug 30, 2025

// Build snippet extraction if requested
const snippetSelect = includeSnippets
? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet`

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This replaces only the first occurrence of '<'.

Copilot Autofix

AI 4 days ago

The issue is that .replace('<', '</') only replaces the first occurrence of the < character, whereas the intention is to produce a valid closing tag by replacing the initial < with </. The most robust fix is to replace only the first occurrence of < with </, but do so in a way which makes the intention clear and robust. The best practice here is to use a regular expression with the global flag if you ever want to replace all occurrences; however, for this case, replacing just the leading < is intentional and correct. To express this intention clearly, it is better to use replace(/^</, '</') (i.e., only at the start), which does not affect any < elsewhere in the string, while also not silently missing multiples if the input is malformed. This avoids possible confusion with multiple < and sidesteps the static analysis warning about literal .replace.

Edit only the line at 589, replacing .replace('<', '</') with .replace(/^</, '</'). No new imports are needed.


Suggested changeset 1
apps/server/src/services/search/fts_search.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/server/src/services/search/fts_search.ts b/apps/server/src/services/search/fts_search.ts
--- a/apps/server/src/services/search/fts_search.ts
+++ b/apps/server/src/services/search/fts_search.ts
@@ -586,7 +586,7 @@
 
             // Build snippet extraction if requested
             const snippetSelect = includeSnippets 
-                ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet`
+                ? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace(/^</, '</')}', '...', ${snippetLength}) as snippet`
                 : '';
 
             const query = `
EOF
@@ -586,7 +586,7 @@

// Build snippet extraction if requested
const snippetSelect = includeSnippets
? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace('<', '</')}', '...', ${snippetLength}) as snippet`
? `, snippet(notes_fts, ${FTS_CONFIG.SNIPPET_COLUMN_CONTENT}, '${highlightTag}', '${highlightTag.replace(/^</, '</')}', '...', ${snippetLength}) as snippet`
: '';

const query = `
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@perfectra1n perfectra1n marked this pull request as draft September 2, 2025 05:08
* @returns String with LIKE wildcards escaped
*/
private escapeLikeWildcards(str: string): string {
return str.replace(/[%_]/g, '\\$&');

Check failure

Code scanning / CodeQL

Incomplete string escaping or encoding High

This does not escape backslash characters in the input.

Copilot Autofix

AI 4 days ago

To fix this issue robustly, we need to escape any backslash (\) characters by doubling them (\\) before escaping % and _. This prevents ambiguous sequences in LIKE patterns, where the escape character can interact badly with subsequent escaped wildcard characters. The fix is to first replace all occurrences of \ with \\, then run the existing replace for % and _. This should be done in the implementation of escapeLikeWildcards in apps/server/src/services/search/fts_search.ts. No extra external dependencies are required, as this can be safely and succinctly done with string replacement and regular expressions.


Suggested changeset 1
apps/server/src/services/search/fts_search.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/apps/server/src/services/search/fts_search.ts b/apps/server/src/services/search/fts_search.ts
--- a/apps/server/src/services/search/fts_search.ts
+++ b/apps/server/src/services/search/fts_search.ts
@@ -200,7 +200,8 @@
      * @returns String with LIKE wildcards escaped
      */
     private escapeLikeWildcards(str: string): string {
-        return str.replace(/[%_]/g, '\\$&');
+        // First escape backslashes, then % and _ for LIKE patterns
+        return str.replace(/\\/g, '\\\\').replace(/[%_]/g, '\\$&');
     }
 
     /**
EOF
@@ -200,7 +200,8 @@
* @returns String with LIKE wildcards escaped
*/
private escapeLikeWildcards(str: string): string {
return str.replace(/[%_]/g, '\\$&');
// First escape backslashes, then % and _ for LIKE patterns
return str.replace(/\\/g, '\\\\').replace(/[%_]/g, '\\$&');
}

/**
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
Comment on lines +207 to +208
CREATE INDEX IDX_entity_changes_component
ON entity_changes (componentId, utcDateChanged DESC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double-check if a component index is really needed? The component ID is used on the client side to distinguish which UI element made the change to avoid accidentally updating the very same editor that the user is using.

On the server side I don't think the components are really necessary.

Comment on lines +18 to +28
// Verify SQLite version supports trigram tokenizer (requires 3.34.0+)
const sqliteVersion = sql.getValue<string>(`SELECT sqlite_version()`);
const [major, minor, patch] = sqliteVersion.split('.').map(Number);
const versionNumber = major * 10000 + minor * 100 + (patch || 0);
const requiredVersion = 3 * 10000 + 34 * 100 + 0; // 3.34.0

if (versionNumber < requiredVersion) {
log.error(`SQLite version ${sqliteVersion} does not support trigram tokenizer (requires 3.34.0+)`);
log.info("Skipping FTS5 trigram migration - will use fallback search implementation");
return; // Skip FTS5 setup, rely on fallback search
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary at runtime? Our server uses pinned versions so as long as the version is correct, there's no need for runtime check.

Comment on lines +38 to +39
-- Drop existing FTS table if it exists (for re-running migration in dev)
DROP TABLE IF EXISTS notes_fts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't re-run migrations in dev, so that would be unnecessary.

Comment on lines +41 to +50
-- Create FTS5 virtual table with trigram tokenizer
-- Trigram tokenizer provides language-agnostic substring matching:
-- 1. Fast substring matching (50-100x speedup for LIKE queries without wildcards)
-- 2. Case-insensitive search without custom collation
-- 3. No language-specific stemming assumptions (works for all languages)
-- 4. Boolean operators (AND, OR, NOT) and phrase matching with quotes
--
-- IMPORTANT: Trigram requires minimum 3-character tokens for matching
-- detail='none' reduces index size by ~50% while maintaining MATCH/rank performance
-- (loses position info for highlight() function, but snippet() still works)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments could be moved out of the SQL statement and into the code to avoid embedding them at build time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since migration 235 doesn't exist anymore, why not merge it with 0234 into a single migration?

Comment on lines +188 to +192
// Additional validation: ensure token doesn't contain SQL injection attempts
if (sanitized.includes(';') || sanitized.includes('--')) {
log.error(`Potential SQL injection attempt detected in token: "${token}"`);
return "__invalid_token__";
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we simply escape the characters instead of dismissing the search entirely? Some people might complain that their search doesn't work properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file is too big, consider splitting it into... something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this performance monitoring mechanism?

import { AppInfo } from "@triliumnext/commons";

const APP_DB_VERSION = 233;
const APP_DB_VERSION = 236;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to revert the version to 234 if you join the migrations together.

Comment on lines 392 to 394
function getDbConnection(): DatabaseType {
return dbConnection;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels unsafe. ☠️

@perfectra1n perfectra1n changed the title feat(search): implement FST5 w/ sqlite for faster and better searching feat(search): implement FTS5 w/ sqlite for faster and better searching Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants