Conversation
Two-level PostgreSQL declarative partitioning for the logs table: - Level 1: LIST by team_id (isolates each team's data) - Level 2: RANGE by created_at monthly (instant retention via partition drops) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR introduces PostgreSQL partition management for logs with team-based LIST partitioning at Level 1 and monthly RANGE partitioning at Level 2. Partition lifecycle operations are integrated into team creation and deletion, and maintenance jobs are scheduled during startup to ensure upcoming partitions and clean up expired data. Changes
Sequence Diagram(s)sequenceDiagram
participant API as Admin API
participant App as Application<br/>(main.py)
participant PartSvc as Partition<br/>Service
participant DB as PostgreSQL
Note over API,DB: Team Creation Workflow
API->>API: create_team()
API->>PartSvc: create_team_partition(team_id)
PartSvc->>DB: Create LIST partition<br/>for team_id
PartSvc->>DB: Create default<br/>sub-partition
PartSvc->>DB: Create current & next<br/>month partitions
DB-->>PartSvc: Partitions created
PartSvc-->>API: Partition ready
API-->>API: Team created
Note over App,DB: Startup & Maintenance Workflow
App->>PartSvc: ensure_upcoming_partitions()<br/>(on startup)
PartSvc->>DB: Check all teams
PartSvc->>DB: Ensure current &<br/>next month<br/>partitions exist
DB-->>PartSvc: Partitions verified
PartSvc-->>App: Startup partition<br/>check complete
App->>App: Schedule daily<br/>partition_maintenance job
Note over App,DB: Team Deletion Workflow
API->>API: delete_team()
API->>PartSvc: drop_team_partition(team_id)
PartSvc->>DB: Drop team partition<br/>(cascades to<br/>sub-partitions)
DB-->>PartSvc: Partition dropped
PartSvc-->>API: Cleanup complete
API-->>API: Team deleted
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/admin.py (1)
113-121:⚠️ Potential issue | 🟡 MinorPartition creation failure leaves the team record committed without its partition.
If
create_team_partitionthrows afterTeam.createsucceeds, the client gets a 500 but the team already exists. Subsequent log ingestion falls intologs_defaultuntilensure_upcoming_partitionsself-heals. Consider either wrapping both operations in a transaction or catching the partition error and attempting cleanup (delete the team) to keep the API response consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/admin.py` around lines 113 - 121, The Team.create call can succeed while create_team_partition fails, leaving a team without its partition; wrap the team creation and partition setup in a single transactional flow or, if transactions aren’t possible, catch errors from create_team_partition after Team.create and roll back by deleting the created team record (use Team.delete or equivalent) before rethrowing/returning the error. Specifically modify the code surrounding Team.create and create_team_partition to either use a DB transaction scope (ensuring both operations commit atomically) or add a try/catch around create_team_partition that calls the Team deletion/cleanup for the newly created team.id and then surfaces the original error to the caller. Ensure you reference Team.create, create_team_partition, and the cleanup call (e.g., Team.delete or destroy method) so the change is easy to locate.
🧹 Nitpick comments (8)
backend/app/services/partitions.py (6)
196-201: Passcutoffas adatetimeobject, not a formatted string.
cutoffis already a timezone-awaredatetime. Tortoise ORM (and the underlying asyncpg driver) can binddatetimeobjects directly to$1. Formatting to a string (cutoff_str) loses precision and relies on PostgreSQL to re-parse it.Proposed fix
elif partition_start < cutoff < partition_end: # Partial month — delete rows before cutoff - cutoff_str = cutoff.strftime("%Y-%m-%d %H:%M:%S+00") result = await conn.execute_query( - f"DELETE FROM {relname} WHERE created_at < $1", - [cutoff_str], + f"DELETE FROM {relname} WHERE created_at < $1", + [cutoff], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` around lines 196 - 201, The DELETE currently formats cutoff to a string before binding; instead, pass the timezone-aware datetime directly to conn.execute_query so asyncpg/Tortoise can bind it natively: replace the cutoff_str usage and pass cutoff as the parameter to the f"DELETE FROM {relname} WHERE created_at < $1" execute_query call (update the code around relname, cutoff and conn.execute_query to remove the strftime conversion).
122-141:ensure_upcoming_partitionsissues one query per team — consider batching the existence checks.For deployments with many teams, this function executes at least one
_table_existsquery per team (plus twocreate_monthly_partitioncalls, each with their own existence check). A single catalog query to fetch all existing partition names and then computing the diff in Python would reduce round-trips significantly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` around lines 122 - 141, ensure_upcoming_partitions currently calls _table_exists per team causing N queries; instead run a single catalog query (via the same conn from _get_conn) to fetch all existing partition names for the teams and target months, compute missing team partitions and missing monthly partitions in Python using _partition_name(team.id) and the month names (now and next_month), then call create_team_partition(team.id) only for teams whose team partition is missing and call create_monthly_partition(team.id, year, month) only for missing month partitions; reuse Team.all(), _partition_name, create_team_partition, create_monthly_partition and _get_conn to locate code and ensure you preserve existing semantics (create full structure when team partition missing, otherwise only ensure current+next month).
194-194: Replaceprint()with structured logging.Production services should use the
loggingmodule for proper log-level control, formatting, and integration with observability tools.Proposed fix
+import logging + +logger = logging.getLogger(__name__) + ... - await conn.execute_script(f"DROP TABLE {relname};") - print(f"Dropped expired partition {relname}") + await conn.execute_script(f"DROP TABLE {relname};") + logger.info("Dropped expired partition %s", relname) elif partition_start < cutoff < partition_end: ... - if deleted: - print(f"Deleted {deleted} rows from boundary partition {relname}") + if deleted: + logger.info("Deleted %s rows from boundary partition %s", deleted, relname)Also applies to: 204-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` at line 194, Replace the bare print statements that log dropped partition names with structured logging: add or use a module-level logger (logger = logging.getLogger(__name__)) in backend.app.services.partitions and replace print(f"Dropped expired partition {relname}") (and the similar occurrence around lines 204) with an appropriate log call such as logger.info("Dropped expired partition %s", relname) so messages respect log levels, formatting, and observability integration.
34-39:_table_existsdoesn't filter by schema — could match tables in other schemas.
pg_class.relnameis not schema-qualified. If the search path includes multiple schemas, a stale or unrelated table with the same name could cause a false positive.Schema-qualified check
async def _table_exists(conn: BaseDBAsyncClient, table_name: str) -> bool: rows = await conn.execute_query_dict( - "SELECT 1 FROM pg_class WHERE relname = $1 AND relkind IN ('r', 'p')", - [table_name], + "SELECT 1 FROM pg_class c " + "JOIN pg_namespace n ON n.oid = c.relnamespace " + "WHERE c.relname = $1 AND c.relkind IN ('r', 'p') " + "AND n.nspname = current_schema()", + [table_name], ) return len(rows) > 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` around lines 34 - 39, _table_exists currently queries pg_class by relname only, which can return tables from other schemas; update the SQL used in _table_exists (the call to conn.execute_query_dict) to join pg_namespace (pg_class.relnamespace = pg_namespace.oid) and filter by the intended schema (e.g., WHERE pg_namespace.nspname = current_schema() AND pg_class.relname = $1) or add a schema parameter to _table_exists and compare against that schema name, so the existence check is schema-qualified rather than name-only.
52-57: SQL built via f-strings — safe here but fragile pattern.
part_nameandteam_id_strare derived fromUUID.hex/str(UUID), so injection isn't exploitable today. However, the same f-string pattern is repeated across all DDL functions (lines 53–57, 61–63, 94–97, 108, 119). Since PostgreSQL parameterized queries can't bind identifiers or DDL clauses, consider at least adding an assertion on the input format (e.g.,assert re.fullmatch(r'[a-f0-9]{32}', _team_hex(team_id))) to harden against future misuse.Example: add a validation helper
import re _HEX_RE = re.compile(r"[a-f0-9]{32}\Z") def _team_hex(team_id: UUID) -> str: h = team_id.hex assert _HEX_RE.match(h), f"unexpected hex: {h}" return h🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` around lines 52 - 57, The f-string DDL builds (creating partitions) use values derived from UUIDs without validation (symbols: part_name, team_id_str) — add a small validator helper (e.g., _HEX_RE and _team_hex(team_id)) that asserts the UUID hex matches r"[a-f0-9]{32}\Z" and use _team_hex(...) when composing part_name and team_id_str; also validate any identifier derived from that hex (the partition name) before interpolation so future non-UUID inputs cannot slip in.
170-204:relnamefrompg_classused unsanitized in DROP/DELETE — add a name-pattern guard.Ruff S608 flags line 199 for string-based query construction. While
relnamecomes from a trusted system catalog, applying a regex check before interpolation adds defense-in-depth and silences the linter. This also covers theDROP TABLEon line 193.Proposed guard
+ import re + _PART_NAME_RE = re.compile(r"^logs_[a-f0-9]{32}_\d{4}_\d{2}$") + for row in rows: relname = row["relname"] + if not _PART_NAME_RE.match(relname): + continue + # Parse partition name: logs_<hex>_YYYY_MM parts = relname.rsplit("_", 2)(Move the compiled regex to module level.)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/services/partitions.py` around lines 170 - 204, Add a defensive name-pattern guard for relname before using it in string-interpolated SQL: compile an allowed identifier regex at module level (e.g., ALLOWED_PARTITION_RE = re.compile(r"^logs_[0-9a-f]+_\d{4}_\d{2}$")) and in the loop that defines relname (the block that parses parts and computes partition_start/partition_end) skip any relname that does not match ALLOWED_PARTITION_RE; only after the match use conn.execute_script(f"DROP TABLE {relname};") and conn.execute_query(f"DELETE FROM {relname} WHERE created_at < $1", ...) so the linter/S608 is satisfied and the operations are restricted to expected partition names.backend/migrations/002_partition_logs_by_team.sql (2)
104-107: Bulk INSERT into a fully indexed partitioned table can be slow on large datasets.The data copy runs after all indexes (5 on the parent, plus PK) are already created on lines 57–61. For a large
logs_oldtable, building those indexes during INSERT is expensive. Consider creating indexes after the data copy, or at minimum, document that this migration may require a maintenance window.Proposed reorder: move index creation after data copy
ALTER SEQUENCE logs_id_seq OWNED BY logs.id; - -- Create indexes on the parent (inherited by all partitions) - CREATE INDEX idx_logs_team_id_id ON logs (team_id, id DESC); - CREATE INDEX idx_logs_timestamp ON logs ("timestamp"); - CREATE INDEX idx_logs_level ON logs (level); - CREATE INDEX idx_logs_source ON logs (source); - CREATE INDEX idx_logs_created_at ON logs (created_at); - -- 3. Create top-level default partition ... ... -- 5. Copy all data from old table INSERT INTO logs (id, team_id, "timestamp", level, message, metadata, source, created_at) SELECT id, team_id, "timestamp", level, message, metadata, source, created_at FROM logs_old; + -- Create indexes after bulk load for better performance + CREATE INDEX idx_logs_team_id_id ON logs (team_id, id DESC); + CREATE INDEX idx_logs_timestamp ON logs ("timestamp"); + CREATE INDEX idx_logs_level ON logs (level); + CREATE INDEX idx_logs_source ON logs (source); + CREATE INDEX idx_logs_created_at ON logs (created_at); + -- 6. Reset the sequence ...🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/migrations/002_partition_logs_by_team.sql` around lines 104 - 107, The migration currently builds all indexes on the partitioned parent table (including the primary key and the five indexes created earlier) before copying data from logs_old into logs, which makes the bulk INSERT very slow for large datasets; to fix, reorder the migration so the INSERT INTO logs ... SELECT ... FROM logs_old runs before creating the parent/index definitions (i.e., defer creation of the PK and the five indexes on logs until after the data copy), or if reordering is not acceptable, add an explicit comment and documentation in this migration noting it requires a maintenance window and may be slow for large logs_old tables; reference the INSERT INTO logs (id, team_id, "timestamp", ...) SELECT ... FROM logs_old and the subsequent parent/index creation steps when making the change.
29-30: No explicit table lock before rename — concurrent writes will fail during migration.
ALTER TABLE logs RENAME TO logs_oldwill acquire anACCESS EXCLUSIVElock, but there's a window between the rename and the recreation where inserts targetinglogswill error. If this runs on a live system, consider wrapping with an advisory lock or documenting the required maintenance window.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/migrations/002_partition_logs_by_team.sql` around lines 29 - 30, The migration performs ALTER TABLE logs RENAME TO logs_old with no session lock, leaving a window where concurrent inserts to "logs" will fail; to fix, acquire a session-level advisory lock (e.g. via pg_advisory_lock with a chosen stable key) before running ALTER TABLE logs RENAME TO logs_old and hold it until after the new "logs" table is created and any necessary triggers/schemas are swapped, then release the advisory lock (pg_advisory_unlock) — alternatively, if you cannot modify the SQL, document that this migration requires a maintenance window where writes are paused; reference the ALTER TABLE logs RENAME TO logs_old and the new logs/logs_old table creation steps when applying the lock.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@backend/app/api/admin.py`:
- Around line 113-121: The Team.create call can succeed while
create_team_partition fails, leaving a team without its partition; wrap the team
creation and partition setup in a single transactional flow or, if transactions
aren’t possible, catch errors from create_team_partition after Team.create and
roll back by deleting the created team record (use Team.delete or equivalent)
before rethrowing/returning the error. Specifically modify the code surrounding
Team.create and create_team_partition to either use a DB transaction scope
(ensuring both operations commit atomically) or add a try/catch around
create_team_partition that calls the Team deletion/cleanup for the newly created
team.id and then surfaces the original error to the caller. Ensure you reference
Team.create, create_team_partition, and the cleanup call (e.g., Team.delete or
destroy method) so the change is easy to locate.
---
Nitpick comments:
In `@backend/app/services/partitions.py`:
- Around line 196-201: The DELETE currently formats cutoff to a string before
binding; instead, pass the timezone-aware datetime directly to
conn.execute_query so asyncpg/Tortoise can bind it natively: replace the
cutoff_str usage and pass cutoff as the parameter to the f"DELETE FROM {relname}
WHERE created_at < $1" execute_query call (update the code around relname,
cutoff and conn.execute_query to remove the strftime conversion).
- Around line 122-141: ensure_upcoming_partitions currently calls _table_exists
per team causing N queries; instead run a single catalog query (via the same
conn from _get_conn) to fetch all existing partition names for the teams and
target months, compute missing team partitions and missing monthly partitions in
Python using _partition_name(team.id) and the month names (now and next_month),
then call create_team_partition(team.id) only for teams whose team partition is
missing and call create_monthly_partition(team.id, year, month) only for missing
month partitions; reuse Team.all(), _partition_name, create_team_partition,
create_monthly_partition and _get_conn to locate code and ensure you preserve
existing semantics (create full structure when team partition missing, otherwise
only ensure current+next month).
- Line 194: Replace the bare print statements that log dropped partition names
with structured logging: add or use a module-level logger (logger =
logging.getLogger(__name__)) in backend.app.services.partitions and replace
print(f"Dropped expired partition {relname}") (and the similar occurrence around
lines 204) with an appropriate log call such as logger.info("Dropped expired
partition %s", relname) so messages respect log levels, formatting, and
observability integration.
- Around line 34-39: _table_exists currently queries pg_class by relname only,
which can return tables from other schemas; update the SQL used in _table_exists
(the call to conn.execute_query_dict) to join pg_namespace
(pg_class.relnamespace = pg_namespace.oid) and filter by the intended schema
(e.g., WHERE pg_namespace.nspname = current_schema() AND pg_class.relname = $1)
or add a schema parameter to _table_exists and compare against that schema name,
so the existence check is schema-qualified rather than name-only.
- Around line 52-57: The f-string DDL builds (creating partitions) use values
derived from UUIDs without validation (symbols: part_name, team_id_str) — add a
small validator helper (e.g., _HEX_RE and _team_hex(team_id)) that asserts the
UUID hex matches r"[a-f0-9]{32}\Z" and use _team_hex(...) when composing
part_name and team_id_str; also validate any identifier derived from that hex
(the partition name) before interpolation so future non-UUID inputs cannot slip
in.
- Around line 170-204: Add a defensive name-pattern guard for relname before
using it in string-interpolated SQL: compile an allowed identifier regex at
module level (e.g., ALLOWED_PARTITION_RE =
re.compile(r"^logs_[0-9a-f]+_\d{4}_\d{2}$")) and in the loop that defines
relname (the block that parses parts and computes partition_start/partition_end)
skip any relname that does not match ALLOWED_PARTITION_RE; only after the match
use conn.execute_script(f"DROP TABLE {relname};") and
conn.execute_query(f"DELETE FROM {relname} WHERE created_at < $1", ...) so the
linter/S608 is satisfied and the operations are restricted to expected partition
names.
In `@backend/migrations/002_partition_logs_by_team.sql`:
- Around line 104-107: The migration currently builds all indexes on the
partitioned parent table (including the primary key and the five indexes created
earlier) before copying data from logs_old into logs, which makes the bulk
INSERT very slow for large datasets; to fix, reorder the migration so the INSERT
INTO logs ... SELECT ... FROM logs_old runs before creating the parent/index
definitions (i.e., defer creation of the PK and the five indexes on logs until
after the data copy), or if reordering is not acceptable, add an explicit
comment and documentation in this migration noting it requires a maintenance
window and may be slow for large logs_old tables; reference the INSERT INTO logs
(id, team_id, "timestamp", ...) SELECT ... FROM logs_old and the subsequent
parent/index creation steps when making the change.
- Around line 29-30: The migration performs ALTER TABLE logs RENAME TO logs_old
with no session lock, leaving a window where concurrent inserts to "logs" will
fail; to fix, acquire a session-level advisory lock (e.g. via pg_advisory_lock
with a chosen stable key) before running ALTER TABLE logs RENAME TO logs_old and
hold it until after the new "logs" table is created and any necessary
triggers/schemas are swapped, then release the advisory lock
(pg_advisory_unlock) — alternatively, if you cannot modify the SQL, document
that this migration requires a maintenance window where writes are paused;
reference the ALTER TABLE logs RENAME TO logs_old and the new logs/logs_old
table creation steps when applying the lock.
Validate UUID hex and partition names against strict regexes before string-interpolating into SQL, preventing injection from unexpected inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two-level PostgreSQL declarative partitioning for the logs table:
Summary by CodeRabbit
New Features
Chores