diff --git a/rules/cre-2025-0298/n8n-sqlite-data-loss.yaml b/rules/cre-2025-0298/n8n-sqlite-data-loss.yaml new file mode 100644 index 0000000..822934a --- /dev/null +++ b/rules/cre-2025-0298/n8n-sqlite-data-loss.yaml @@ -0,0 +1,89 @@ +rules: + - cre: + id: CRE-2025-0298 + severity: 0 + title: n8n SQLite data loss due to incorrect pool size configuration + category: database-problem + author: mavrick + description: | + This rule detects when n8n is running with SQLite database and a pool size of 0, which can cause critical data loss including complete disappearance of workflows and execution history. The issue occurs because the default SQLite driver configuration (DB_SQLITE_POOL_SIZE=0) uses an unreliable single connection mode that can lead to database corruption or data not being properly persisted during concurrent operations. + cause: | + The root cause is the default SQLite configuration in n8n when DB_SQLITE_POOL_SIZE is set to 0: + - **Single connection mode**: When pool size is 0, SQLite operates with a single connection which can fail to properly persist data + - **Concurrent operation conflicts**: Multiple workflow executions or save operations can conflict without proper connection pooling + - **WAL mode issues**: SQLite Write-Ahead Logging (WAL) mode may not function correctly with a single connection + - **Upgrade/migration triggers**: Data loss often manifests during n8n version upgrades (e.g., 1.100 to 1.101) + - **Silent failures**: Data disappears without error messages or warnings, making detection difficult + impact: | + - **CRITICAL**: Complete loss of workflows, including all configuration and logic + - **Total execution history loss**: All past execution logs and debugging information disappear + - **Business continuity failure**: Production workflows vanish, causing immediate service disruption + - **Unrecoverable data loss**: No backup or recovery mechanism available for lost workflows + - **Silent failure**: Data loss occurs without error messages, potentially going unnoticed for hours or days + impactScore: 10 + tags: + - data-loss + - database-corruption + - data-loss-risk + - workflow-orchestration + - known-problem + - database-write-failure + - sqlite + - n8n + - production-issue + - configuration-error + - connection-pooling + - single-connection + - concurrent-access + - data-persistence + - workflow-automation + - business-critical + - silent-failure + - database-reliability + mitigation: | + **Immediate response:** + - **Stop n8n immediately**: Prevent further data loss by stopping the n8n service + - **Check environment variables**: Verify DB_SQLITE_POOL_SIZE setting with `echo $DB_SQLITE_POOL_SIZE` + - **Backup database**: Create immediate backup of the SQLite database file (typically ~/.n8n/database.sqlite) + - **Check for WAL files**: Look for .sqlite-wal and .sqlite-shm files that may contain uncommitted data + + **Resolution steps:** + 1. **Set proper pool size**: Configure `DB_SQLITE_POOL_SIZE=2` or higher in environment variables + 2. **Update n8n configuration**: Add to .env file or systemd service: + ``` + DB_SQLITE_POOL_SIZE=2 + ``` + 3. **Restart n8n service**: Apply the configuration change + 4. **Verify configuration**: Check logs for proper SQLite pool initialization + 5. **Test workflow persistence**: Create a test workflow and verify it persists after restart + + **Prevention measures:** + - **Always configure pool size**: Never run production n8n with default SQLite settings + - **Consider PostgreSQL**: For production environments, migrate to PostgreSQL for better reliability + - **Regular backups**: Implement automated SQLite database backups every hour + - **Monitor database writes**: Set up monitoring for successful database write operations + - **Version upgrade testing**: Always test n8n upgrades in staging environment first + - **Export workflows regularly**: Use n8n CLI to export workflows as JSON backups + references: + - https://github.com/n8n-io/n8n/issues/17271 + - https://docs.n8n.io/hosting/configuration/environment-variables/#sqlite + - https://www.sqlite.org/wal.html + - https://docs.n8n.io/hosting/databases/sqlite/ + applications: + - name: n8n + version: ">=1.100.0" + mitigationScore: 3 + metadata: + gen: 1 + id: 6unur45AyiTScMHo4MHWmP + kind: prequel + rule: + sequence: + window: 30s + event: + source: cre.log.n8n + order: + - regex: 'DB_TYPE.*sqlite|DATABASE_TYPE.*sqlite|Using SQLite as database' + - regex: 'DB_SQLITE_POOL_SIZE.*0|pool.*size.*0|sqlite.*pool.*0' + negate: + - regex: 'DB_SQLITE_POOL_SIZE.*[1-9]' \ No newline at end of file diff --git a/rules/cre-2025-0298/test.log b/rules/cre-2025-0298/test.log new file mode 100644 index 0000000..8dd327d --- /dev/null +++ b/rules/cre-2025-0298/test.log @@ -0,0 +1,36 @@ +2025-08-28T10:00:01.234Z [INFO] Starting n8n version 1.101.2 +2025-08-28T10:00:01.456Z [INFO] Initializing database connection +2025-08-28T10:00:01.789Z [INFO] DB_TYPE=sqlite +2025-08-28T10:00:02.012Z [INFO] DATABASE_TYPE: sqlite +2025-08-28T10:00:02.234Z [INFO] Using SQLite as database backend +2025-08-28T10:00:02.456Z [DEBUG] Loading database configuration +2025-08-28T10:00:02.678Z [INFO] DB_SQLITE_POOL_SIZE=0 +2025-08-28T10:00:02.890Z [WARN] SQLite pool size is set to 0, using single connection mode +2025-08-28T10:00:03.123Z [INFO] Database connection established +2025-08-28T10:00:03.345Z [INFO] Running database migrations +2025-08-28T10:00:03.567Z [INFO] Migration from version 1.100 to 1.101 starting +2025-08-28T10:00:04.789Z [DEBUG] Executing migration scripts +2025-08-28T10:00:05.012Z [INFO] Loading workflows from database +2025-08-28T10:00:05.234Z [WARN] Found 5 active workflows +2025-08-28T10:00:05.456Z [INFO] Starting workflow engine +2025-08-28T10:00:05.678Z [INFO] n8n ready on http://localhost:5678 +2025-08-28T10:00:10.890Z [INFO] Workflow 'Daily Report Generator' triggered +2025-08-28T10:00:11.123Z [DEBUG] Executing workflow wf_001 +2025-08-28T10:00:11.345Z [INFO] Workflow execution started +2025-08-28T10:00:15.567Z [INFO] Workflow execution completed +2025-08-28T10:00:15.789Z [DEBUG] Saving execution data to database +2025-08-28T10:00:16.012Z [WARN] Database write operation delayed +2025-08-28T10:00:20.234Z [INFO] User accessing workflow editor +2025-08-28T10:00:20.456Z [DEBUG] Loading workflow list from database +2025-08-28T10:00:20.678Z [ERROR] Unexpected database response +2025-08-28T10:00:20.890Z [WARN] Workflow count mismatch: expected 5, found 0 +2025-08-28T10:00:21.123Z [ERROR] Unable to load workflows +2025-08-28T10:00:21.345Z [INFO] Database query returned empty result +2025-08-28T10:00:21.567Z [CRITICAL] All workflows appear to be missing +2025-08-28T10:00:21.789Z [ERROR] Execution history table is empty +2025-08-28T10:00:22.012Z [INFO] Attempting database integrity check +2025-08-28T10:00:22.234Z [INFO] SQLite integrity check: ok +2025-08-28T10:00:22.456Z [WARN] Database file exists but tables are empty +2025-08-28T10:00:22.678Z [ERROR] Critical data loss detected +2025-08-28T10:00:22.890Z [INFO] No WAL file found for recovery +2025-08-28T10:00:23.123Z [CRITICAL] 48 hours of work has been lost \ No newline at end of file diff --git a/rules/tags/categories.yaml b/rules/tags/categories.yaml index a08a6ed..2572657 100644 --- a/rules/tags/categories.yaml +++ b/rules/tags/categories.yaml @@ -244,6 +244,15 @@ categories: description: | Failures that prevent MongoDB from starting successfully due to corrupted metadata, invalid configurations, or unrecoverable internal errors (e.g., WiredTiger metadata corruption). These failures often require manual repair or backup restoration. + - name: workflow-automation-problems + displayName: Workflow Automation Problems + description: Problems related to workflow automation platforms like n8n, Temporal, and others including data loss, execution failures, and configuration issues + - name: sqlite-problems + displayName: SQLite Problems + description: Problems specific to SQLite database including connection pooling issues, data persistence failures, and configuration errors + - name: data-loss-problems + displayName: Data Loss Problems + description: Critical problems that result in data loss or corruption, requiring immediate attention to prevent further damage - name: supabase-problem displayName: Supabase Problems description: | diff --git a/rules/tags/tags.yaml b/rules/tags/tags.yaml index ce00e20..bb4d867 100644 --- a/rules/tags/tags.yaml +++ b/rules/tags/tags.yaml @@ -848,6 +848,9 @@ tags: - name: cluster-scaling displayName: Cluster Scaling description: Problems related to Kubernetes cluster scaling operations and capacity management + - name: sqlite + displayName: SQLite + description: Problems related to SQLite database system - name: maxmemory displayName: Max Memory description: Problems related to Redis maxmemory configuration and memory limits @@ -951,17 +954,44 @@ tags: displayName: Recursive Analysis description: Problems where systems enter recursive self-analysis loops leading to resource exhaustion - name: n8n - displayName: N8N + displayName: n8n description: Problems related to n8n workflow automation platform + - name: production-issue + displayName: Production Issue + description: Problems that occur specifically in production environments + - name: configuration-error + displayName: Configuration Error + description: Problems caused by incorrect configuration settings or environment variables + - name: connection-pooling + displayName: Connection Pooling + description: Problems related to database connection pool management and configuration + - name: single-connection + displayName: Single Connection + description: Problems caused by single connection mode in database systems + - name: concurrent-access + displayName: Concurrent Access + description: Problems related to concurrent access patterns in database or file systems + - name: data-persistence + displayName: Data Persistence + description: Problems with data persistence mechanisms and storage reliability - name: workflow-automation displayName: Workflow Automation - description: Problems related to workflow automation systems and platforms + description: Problems related to workflow automation platforms and orchestration systems + - name: business-critical + displayName: Business Critical + description: Problems that have direct impact on business operations and continuity - name: silent-failure displayName: Silent Failure - description: Problems that occur without visible error messages or alerts, making detection extremely difficult + description: Problems that occur without obvious error messages or warnings + - name: database-reliability + displayName: Database Reliability + description: Problems affecting database system reliability and data consistency + - name: database-write-failure + displayName: Database Write Failure + description: Problems where database write operations fail or are not properly persisted - name: production-critical displayName: Production Critical description: Issues that have severe impact on production systems and require immediate attention - name: data-integrity displayName: Data Integrity - description: Problems that affect the completeness, accuracy, or consistency of data \ No newline at end of file + description: Problems that affect the completeness, accuracy, or consistency of data