feat(db): allow external sqlite blobs #6677

alexdachin · 2025-08-17T11:06:06Z

Currently Trilium stores the contents of a File note type inside the SQLite database. While this is extremely nice for simplicity and portability, it can become problematic once you start uploading many reference files (for example some large quality images, large PDFs, videos etc). While in theory SQLite supports huge databases, storing many files in a single database:

makes the database file harder to backup
increases database lock times
increases the database corruption risk slightly

I propose storing big blobs in the filesystem and storing a reference to them in the database. The files are stored in the data directory to begin with for simplicity, but it also opens the gate for storing them in other places in the future (like s3 compatible storage systems). This keeps the SQLite database small and performant, enables faster incremental backups (like rsync), reduces the risk of database corruption and still allows Trilium to manage things like note hierarchy with cloning.

In order to not disturb the workflow of other people, this feature is disabled by default (so even large blobs are still stored internally in the database). To enable it you need to set TRILIUM_EXTERNAL_BLOB_STORAGE environment variable.

The threshold of storing notes externally vs internally is 100kb by default but can be changed with TRILIUM_EXTERNAL_BLOB_THRESHOLD environment variable. This way small blobs are read from the database more efficiently compared to storing all blobs in the filesystem.

Closes #6546

alexdachin · 2025-08-17T11:10:42Z

apps/server/src/becca/entities/abstract_becca_entity.ts

+                blobStorageService.deleteExternal(filePath);
+            }
+        } catch (error) {
+            // contentLocation column might not be present when applying older migrations


I tried inlining some of these AbstractBeccaEntity in the 0233__migrate_geo_map_to_collection.ts migration, but some things are event based and the subscribers are using AbstractBeccaEntity as well.

The alternative would be to add a warning that people have to upgrade to the the previous Trilium version first and wait for migrations to complete, but I wanted to avoid that

perfectra1n · 2025-08-18T05:13:42Z

Can you explain more as to what it’s doing, how it works, and what it’d be useful for? Looks cool at first glance! :)

alexdachin · 2025-08-18T07:30:09Z

Can you explain more as to what it’s doing, how it works, and what it’d be useful for? Looks cool at first glance! :)

Thanks, I updated the PR description.

There is some more context in the issue I linked in the description as well, but I tried to summarize everything here as well.

eliandoran

As a start, the implementation seems pretty good.

There is a bug when importing large files as code. Try importing this zip with external storage enabled and it will result in a 9 MB JSON file, but when accessing it it's empty.

Trace-20250708T195034.json.zip

adoriandoran · 2025-08-20T00:36:30Z

I have noticed an increased risk of BLOB key collisions when the server is running on Windows. The BLOB key is case-sensitive, whereas the underlying file system is case-insensitive.

adoriandoran · 2025-08-20T01:15:09Z

Once the data directory is relocated, the external BLOB paths break, since they are relative to Trilium’s directory. I suggest storing these paths in the database as relative to the “external-blobs” directory instead.

alexdachin · 2025-08-20T09:05:01Z

I see, thank you all for the feedback! I'm away for a few days, but I'll have a look as soon as I get back

alexdachin · 2025-08-27T17:51:56Z

Thank you again for checking this out!

I found out what the issue was with the large json file and fixed it.

Also, good points regarding case insensitive file systems, I switched to a random uuid instead and stored the relative path.

Do you mind checking it one more time? 🙏

alexdachin · 2025-10-12T14:36:07Z

Hi. I see over the last 2 months it acquired some merge conflicts. Since it didn’t get much traction after the latest changes, I wanted to check if anyone would be available to review it if I resolve the conflicts

eliandoran · 2025-10-12T14:44:40Z

@alexdachin , yes.

eliandoran · 2025-10-21T06:56:35Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant feature to allow storing large blobs in the filesystem instead of the database, which is a great step towards improving performance and manageability for large Trilium instances. The implementation is well-thought-out, especially the handling of migrations and configuration options. I've identified a few areas for improvement, primarily concerning file system operations' atomicity and performance. There are potential race conditions in file deletion that could lead to orphaned files, and some synchronous I/O operations could block the server's event loop. Additionally, there are opportunities to optimize database queries related to migration checks. Addressing these points will make the feature more robust and performant.

gemini-code-assist · 2025-10-21T06:59:30Z

apps/server/src/becca/entities/abstract_becca_entity.ts

+        try {
+            const row = sql.getRow<{ contentLocation: string }>("SELECT contentLocation FROM blobs WHERE blobId = ?", [oldBlobId]);
+            if (row?.contentLocation.startsWith('file://')) {
+                const filePath = row.contentLocation.replace('file://', '');
+                blobStorageService.deleteExternal(filePath);
+            }
+        } catch (error) {
+            // contentLocation column might not be present when applying older migrations
+            if (error instanceof Error && error.name === 'SqliteError' && error.message.includes("no such column: contentLocation")) {
+                // ignore
+            } else {
+                log.error(`Failed to delete external content file for ${oldBlobId}: ${error}`);
+            }
+        }
+
        sql.execute("DELETE FROM blobs WHERE blobId = ?", [oldBlobId]);


The current logic for deleting blobs can lead to orphaned files. The deleteExternal function is asynchronous and does not propagate errors, so if file deletion fails, the database record is still removed. This atomicity issue can be resolved by making file deletion synchronous and ensuring the database record is only deleted upon successful file removal.

I've also suggested a change to blobStorageService.deleteExternal to make it synchronous. With that change, this function should be updated to correctly handle errors and prevent orphaning files.

try { const row = sql.getRow<{ contentLocation: string }>("SELECT contentLocation FROM blobs WHERE blobId = ?", [oldBlobId]); if (row?.contentLocation.startsWith('file://')) { const filePath = row.contentLocation.replace('file://', ''); blobStorageService.deleteExternal(filePath); } sql.execute("DELETE FROM blobs WHERE blobId = ?", [oldBlobId]); // blobs are not marked as erased in entity_changes, they are just purged completely // this is because technically every keystroke can create a new blob, and there would be just too many sql.execute("DELETE FROM entity_changes WHERE entityName = 'blobs' AND entityId = ?", [oldBlobId]); } catch (error) { // contentLocation column might not be present when applying older migrations if (error instanceof Error && error.name === 'SqliteError' && error.message.includes("no such column: contentLocation")) { // for old DBs, the column doesn't exist, so we just delete the DB record sql.execute("DELETE FROM blobs WHERE blobId = ?", [oldBlobId]); sql.execute("DELETE FROM entity_changes WHERE entityName = 'blobs' AND entityId = ?", [oldBlobId]); } else { log.error(`Failed to delete external content file for ${oldBlobId}, aborting DB deletion to prevent orphaning file. Error: ${error}`); } }

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/services/blob-storage.ts

+    deleteExternal(relativePath: string): void {
+        const filePath = path.join(this.externalBlobPath, relativePath);
+        fs.unlink(filePath, (error) => {
+            if (error) {
+                log.error(`Failed to delete external blob file ${filePath}: ${error}`);
+            } else {
+                log.info(`Deleted external blob file: ${filePath}`);
+            }
+        });


The deleteExternal function uses an asynchronous callback-based fs.unlink. This "fire-and-forget" approach prevents callers from handling deletion errors or waiting for completion. This can lead to race conditions and orphaned files, where the database record is deleted before the file system operation completes, or even if it fails.

Using a synchronous method ensures the operation completes before proceeding and allows errors to be properly propagated and handled by the caller.

deleteExternal(relativePath: string): void { const filePath = path.join(this.externalBlobPath, relativePath); try { fs.unlinkSync(filePath); log.info(`Deleted external blob file: ${filePath}`); } catch (error) { if (error.code !== 'ENOENT') { log.error(`Failed to delete external blob file ${filePath}: ${error}`); throw error; } } }

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/services/blob.ts

+function calculateContentHash({ blobId, content, contentLocation }: Blob) {
+    return contentLocation !== 'internal'
+        ? hash(`${blobId}|${contentLocation}`)
+        : hash(`${blobId}|${content?.toString() || ''}`);


Calling content.toString() on a Buffer without specifying an encoding defaults to 'utf8'. If the buffer contains arbitrary binary data that isn't valid UTF-8, this can lead to data loss or an URIError, resulting in an incorrect or inconsistent hash.

Suggested change

: hash(`${blobId}|${content?.toString() || ''}`);

: hash(`${blobId}|${content ? (Buffer.isBuffer(content) ? content.toString('base64') : content) : ''}`);

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/becca/entities/abstract_becca_entity.ts


-        sql.upsert("blobs", "blobId", pojo);
+        // external content columns might not be present when applying older migrations
+        const hasExternalContentColumns = sql.getValue("SELECT 1 FROM pragma_table_info('blobs') WHERE name = 'contentLocation'");


The pragma_table_info query is executed on every blob save to check for migration status. This is inefficient as the schema will not change during runtime. This check only needs to be performed once when the application starts.

Caching the result of this check will remove a redundant database query from a frequent operation, improving performance.

const hasExternalContentColumns = blobStorageService.hasExternalContentColumns();

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/becca/entities/abstract_becca_entity.ts

+        try {
+            row = sql.getRow<{ content: string | Buffer, contentLocation: string }>(/*sql*/`SELECT content, contentLocation FROM blobs WHERE blobId = ?`, [this.blobId]);
+        } catch (error) {
+            // contentLocation column might not be present when applying older migrations
+            if (error instanceof Error && error.name === 'SqliteError' && error.message.includes("no such column: contentLocation")) {
+                row = sql.getRow<{ content: string | Buffer, contentLocation: string }>(/*sql*/`SELECT content, 'internal' as contentLocation FROM blobs WHERE blobId = ?`, [this.blobId]);
+            } else {
+                throw error;
+            }
+        }


This try-catch block for handling migrations adds complexity and a potential second database query on every content retrieval for older databases. This can be optimized by checking for the column's existence once at startup.

Using a cached flag will simplify the code and make it more performant by avoiding the try-catch and potential fallback query.

const query = blobStorageService.hasExternalContentColumns() ? /*sql*/`SELECT content, contentLocation FROM blobs WHERE blobId = ?` : /*sql*/`SELECT content, 'internal' as contentLocation FROM blobs WHERE blobId = ?`; row = sql.getRow(query, [this.blobId]);

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/services/blob-storage.ts

+                fs.mkdirSync(dir, { recursive: true, mode: 0o700 });
+            }
+
+            fs.writeFileSync(absolutePath, content, { mode: 0o600 });


fs.writeFileSync is a synchronous I/O operation that will block the Node.js event loop until the file is fully written. For a server application, this can negatively impact performance and responsiveness, especially with large files. While changing this to asynchronous I/O (fs.promises.writeFile) would require a larger refactoring of the call chain to be async, it's the recommended approach for a high-performance server.

gemini-code-assist · 2025-10-21T06:59:31Z

apps/server/src/services/erase.ts

+        if (row.contentLocation?.startsWith('file://')) {
+            try {
+                const filePath = row.contentLocation.replace('file://', '');
+                blobStorageService.deleteExternal(filePath);
+            } catch (error) {
+                log.error(`Failed to delete external blob file ${row.contentLocation}: ${error}`);
+            }


The try-catch block around blobStorageService.deleteExternal(filePath) is currently ineffective because deleteExternal uses an asynchronous, callback-based fs.unlink and does not throw errors to the caller. Any file system errors during deletion will be silently logged within deleteExternal but not caught here.

My other suggestion to make deleteExternal synchronous (using fs.unlinkSync) will make this try-catch block functional and improve error handling for bulk deletions.

eliandoran

There are two bugs:

The first is a critical one (data loss):

Import the Trace.json file. It will get stored as an external blob.
Replace the entire text with a few characters to delete the external blob.
Paste the Trace.json back in (or any big text to trigger saving).
Restart the server.

The server will restart with something like:

Consistency issue fixed: Note '5zzWKjTnXz82' content was set to '' since it was null even though it is not deleted
Consistency issue fixed: Note 'i2EbsDSQkdeb' content was set to '{}' since it was null even though it is not deleted

Look at the Trace.json note, the content will be replaced with {}.

The second one involves uploading a video. Regardless of the video size, it will not be saved as an external blob.

In addition, please address the topics from the bot.

capi · 2025-10-21T13:23:06Z

This makes backups a bit more complicated. Currently you can backup by doing a backup of the SQLite database, which you can do transaction save via e.g. VACCUM INTO, or other SQLite mechanisms.

With this you actually need to either stop the process before the backup or take a consistent filesystem snapshot.

That said, I really think this is a sensible thing to do. It should just be pretty tinted out in the backup documentation.

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 17, 2025

alexdachin commented Aug 17, 2025

View reviewed changes

alexdachin marked this pull request as draft August 17, 2025 11:14

alexdachin changed the title ~~Allow external blobs~~ Feat/Allow external blobs Aug 17, 2025

alexdachin changed the title ~~Feat/Allow external blobs~~ feat: allow external blobs Aug 17, 2025

alexdachin force-pushed the external-blobs branch 2 times, most recently from 6deb934 to f72f0fd Compare August 17, 2025 11:54

alexdachin marked this pull request as ready for review August 17, 2025 12:44

alexdachin mentioned this pull request Aug 17, 2025

Store file notes in the filesystem rather than the database #6546

Open

alexdachin force-pushed the external-blobs branch from fc61395 to 12f982d Compare August 18, 2025 07:24

alexdachin force-pushed the external-blobs branch from 12f982d to 4c9793f Compare August 19, 2025 07:51

eliandoran requested changes Aug 19, 2025

View reviewed changes

eliandoran marked this pull request as draft August 19, 2025 18:19

perfectra1n changed the title ~~feat: allow external blobs~~ feat(db): allow external sqlite blobs Aug 23, 2025

eliandoran added the merge-conflicts label Aug 25, 2025

eliandoran force-pushed the main branch from c9c931b to bd35539 Compare August 25, 2025 18:35

alexdachin force-pushed the external-blobs branch 2 times, most recently from 6ca2066 to c0635e5 Compare August 27, 2025 17:11

eliandoran removed the merge-conflicts label Aug 27, 2025

alexdachin force-pushed the external-blobs branch from c0635e5 to 100d89a Compare August 27, 2025 17:23

alexdachin marked this pull request as ready for review August 31, 2025 18:29

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Aug 31, 2025

alexdachin requested a review from eliandoran August 31, 2025 18:29

eliandoran added the merge-conflicts label Sep 14, 2025

Allow external blobs

5377581

alexdachin force-pushed the external-blobs branch from 100d89a to 5377581 Compare October 14, 2025 08:36

eliandoran removed the merge-conflicts label Oct 14, 2025

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

eliandoran requested changes Oct 21, 2025

View reviewed changes

eliandoran marked this pull request as draft October 21, 2025 07:57

eliandoran added the merge-conflicts label Nov 5, 2025

	: hash(`${blobId}\|${content?.toString() \|\| ''}`);
	: hash(`${blobId}\|${content ? (Buffer.isBuffer(content) ? content.toString('base64') : content) : ''}`);

Uh oh!

feat(db): allow external sqlite blobs #6677

Are you sure you want to change the base?

feat(db): allow external sqlite blobs #6677

Uh oh!

Conversation

alexdachin commented Aug 17, 2025 • edited by eliandoran Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexdachin Aug 17, 2025

Choose a reason for hiding this comment

Uh oh!

perfectra1n commented Aug 18, 2025

Uh oh!

alexdachin commented Aug 18, 2025

Uh oh!

eliandoran left a comment

Choose a reason for hiding this comment

Uh oh!

adoriandoran commented Aug 20, 2025

Uh oh!

adoriandoran commented Aug 20, 2025

Uh oh!

alexdachin commented Aug 20, 2025

Uh oh!

alexdachin commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexdachin commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eliandoran commented Oct 12, 2025

Uh oh!

eliandoran commented Oct 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

eliandoran left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

capi commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alexdachin commented Aug 17, 2025 •

edited by eliandoran

Loading

alexdachin commented Aug 27, 2025 •

edited

Loading

alexdachin commented Oct 12, 2025 •

edited

Loading

eliandoran left a comment •

edited

Loading