add first-class user_id column to logs with backfill support#4
add first-class user_id column to logs with backfill support#49cb14c1ec0 merged 2 commits intomasterfrom
Conversation
Promotes user identification from metadata JSONB to a dedicated indexed column for efficient per-team filtering. Includes migration, API filter support, backfill endpoint, and full frontend integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis pull request adds comprehensive user_id support to the log management system. Changes span the data model with a new nullable user_id field, ingest logic to propagate user_id, a database migration creating the column and index, API endpoints for filtering and backfilling user_id values, schema updates exposing new types, and frontend enhancements for user_id display, filtering, and a dedicated backfill interface. Changes
Sequence DiagramsequenceDiagram
participant LogsView as LogsView (Frontend)
participant API as Backend API
participant DB as Database
LogsView->>LogsView: User enters metadata key & opens backfill dialog
LogsView->>API: POST /teams/{teamId}/logs/backfill-user-id<br/>(metadataKey, overwrite)
API->>DB: UPDATE logs SET user_id = metadata[key]<br/>WHERE user_id IS NULL OR overwrite=true
DB-->>API: Return update count
API-->>LogsView: Return success + count
LogsView->>LogsView: Show success message
LogsView->>API: GET /teams/{teamId}/logs (fetch refreshed)
API->>DB: SELECT logs filtered by team_id
DB-->>API: Return updated logs with new user_id
API-->>LogsView: Return logs
LogsView->>LogsView: Render table with populated user_id
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
frontend/src/views/LogsView.vue (1)
85-92: Consider resetting backfill dialog state when opening.When the dialog is reopened, stale result/error messages from a previous run persist. Clearing state on open avoids user confusion.
Proposed fix
- `@click`="backfillDialog = true" + `@click`="openBackfillDialog"Then add a helper in the
<script setup>:function openBackfillDialog() { backfillResult.value = '' backfillError.value = false backfillDialog.value = true }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@frontend/src/views/LogsView.vue` around lines 85 - 92, The backfill dialog currently reuses stale state because the button directly sets backfillDialog = true; create a helper function openBackfillDialog() in the <script setup> that resets backfillResult.value = '' and backfillError.value = false before setting backfillDialog.value = true, and change the button's `@click` to call openBackfillDialog so the dialog is cleared each time it opens.backend/app/api/logs.py (2)
174-174: Unusedmembershipvariable.Static analysis (Ruff RUF059) flags that
membershipis unused. Use_prefix to signal intent.Proposed fix
- team, membership = await get_team_member(team_id, user) + team, _membership = await get_team_member(team_id, user)Note: if you add the role check suggested in the other comment, the variable will be used and this becomes moot.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/logs.py` at line 174, The call to get_team_member assigns (team, membership) but membership is never used, triggering a lint error; update the assignment to (team, _membership) or prefix the variable with an underscore (e.g., _membership) where get_team_member(team_id, user) is awaited so intent is clear to linters, or alternatively use membership in the suggested role-check logic elsewhere (the unique symbols to update are the get_team_member call and the local variable name membership).
178-189: UnboundedUPDATEon a potentially large table — consider batching or adding guardrails.This
UPDATEruns in a single transaction with noLIMIT. For teams with millions of log rows, this can hold a long write lock, saturate I/O, and risk statement timeouts. Consider batching (e.g., updating in chunks of 10k–50k rows using a loop withLIMIT+ctid-based pagination) or at minimum adding a row-count warning/cap.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/api/logs.py` around lines 178 - 189, The UPDATE currently executed via conn.execute_query with sql built from body.metadata_key and team.id can touch the entire logs table and should be converted to a batched update to avoid long locks; modify the logic around the SQL (where you currently append " AND user_id IS NULL" when not body.overwrite) to run in a loop that updates limited chunks (e.g., 10k–50k rows) using a safe pagination strategy such as ctid-based batching (UPDATE ... WHERE ctid IN (SELECT ctid FROM logs WHERE team_id = $2 AND metadata ? $1 [AND user_id IS NULL] LIMIT $N)), repeating until no rows updated, and accumulate the updated count into updated (instead of a single conn.execute_query call), or at minimum perform a row-count check before running the full update and abort/log if above a threshold.backend/migrations/003_add_user_id_to_logs.sql (2)
7-10:information_schemacheck doesn't filter bytable_schema.The query matches any table named
logsin any schema. If the app ever uses a non-default schema, this check could return a false positive and skip the migration. AddingAND table_schema = 'public'(or the relevant schema) would make this more robust.Proposed fix
IF NOT EXISTS ( SELECT 1 FROM information_schema.columns - WHERE table_name = 'logs' AND column_name = 'user_id' + WHERE table_schema = 'public' AND table_name = 'logs' AND column_name = 'user_id' ) THEN🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/migrations/003_add_user_id_to_logs.sql` around lines 7 - 10, The information_schema check for the 'logs' table/ 'user_id' column doesn't filter by schema and can match tables in other schemas; update the EXISTS condition that queries information_schema.columns (the block checking WHERE table_name = 'logs' AND column_name = 'user_id') to also constrain the schema—e.g. add AND table_schema = 'public' or use AND table_schema = current_schema()—so the migration only skips when the column truly exists in the intended schema.
17-25: ConsiderCREATE INDEX CONCURRENTLYfor production tables with existing data.The
CREATE INDEXon Line 21 acquires a full table lock, which can block writes on a largelogstable for the duration of index build. Since this is wrapped in aDOblock (which runs inside a transaction),CONCURRENTLYcannot be used here. For production deployments with significant existing data, consider running the index creation as a separate, standalone statement outside of theDOblock so you can useCREATE INDEX CONCURRENTLY.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/migrations/003_add_user_id_to_logs.sql` around lines 17 - 25, The current IF NOT EXISTS block creates idx_logs_team_id_user_id inside a transactional DO block which prevents using CREATE INDEX CONCURRENTLY; extract the index creation so it runs as a standalone statement outside the DO/transaction and replace CREATE INDEX with CREATE INDEX CONCURRENTLY ON logs (team_id, user_id), while keeping the existence check (or perform a separate pg_indexes check before running the standalone CONCURRENTLY command) so the migration can safely build the index without locking writes on the logs table.backend/app/schemas/log.py (1)
13-13: Addmax_length=255to match the DB column constraint.The
user_idDB column isVARCHAR(255)(seebackend/app/models/log.pyLine 22), but the Pydantic field has no length limit. Values exceeding 255 characters will produce an opaque database error instead of a clean 422 validation response.Proposed fix
- user_id: str | None = None + user_id: str | None = Field(None, max_length=255)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/app/schemas/log.py` at line 13, Update the Pydantic schema field for user_id in the Log schema to enforce the DB length constraint by using Field(..., max_length=255) (i.e., change the declaration of user_id to use pydantic.Field with max_length=255) and import Field from pydantic if not already present; this ensures values >255 chars raise a 422 validation error instead of a DB error and matches the VARCHAR(255) defined on the Log model's user_id column.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/app/api/logs.py`:
- Around line 162-193: The backfill_user_id endpoint currently only checks
membership via get_team_member and allows any member (including VIEWER) to
perform a bulk update; update backfill_user_id to enforce role-based
authorization by inspecting the membership.role (from the result of
get_team_member) against the TeamRole enum and only allow TeamRole.MEMBER and
TeamRole.MANAGER; if the role is not permitted, raise an HTTP 403 (forbidden)
error with an explanatory message so viewers cannot trigger the mutation. Ensure
you import or reference TeamRole and perform this check before acquiring the DB
connection or executing the SQL in backfill_user_id.
---
Nitpick comments:
In `@backend/app/api/logs.py`:
- Line 174: The call to get_team_member assigns (team, membership) but
membership is never used, triggering a lint error; update the assignment to
(team, _membership) or prefix the variable with an underscore (e.g.,
_membership) where get_team_member(team_id, user) is awaited so intent is clear
to linters, or alternatively use membership in the suggested role-check logic
elsewhere (the unique symbols to update are the get_team_member call and the
local variable name membership).
- Around line 178-189: The UPDATE currently executed via conn.execute_query with
sql built from body.metadata_key and team.id can touch the entire logs table and
should be converted to a batched update to avoid long locks; modify the logic
around the SQL (where you currently append " AND user_id IS NULL" when not
body.overwrite) to run in a loop that updates limited chunks (e.g., 10k–50k
rows) using a safe pagination strategy such as ctid-based batching (UPDATE ...
WHERE ctid IN (SELECT ctid FROM logs WHERE team_id = $2 AND metadata ? $1 [AND
user_id IS NULL] LIMIT $N)), repeating until no rows updated, and accumulate the
updated count into updated (instead of a single conn.execute_query call), or at
minimum perform a row-count check before running the full update and abort/log
if above a threshold.
In `@backend/app/schemas/log.py`:
- Line 13: Update the Pydantic schema field for user_id in the Log schema to
enforce the DB length constraint by using Field(..., max_length=255) (i.e.,
change the declaration of user_id to use pydantic.Field with max_length=255) and
import Field from pydantic if not already present; this ensures values >255
chars raise a 422 validation error instead of a DB error and matches the
VARCHAR(255) defined on the Log model's user_id column.
In `@backend/migrations/003_add_user_id_to_logs.sql`:
- Around line 7-10: The information_schema check for the 'logs' table/ 'user_id'
column doesn't filter by schema and can match tables in other schemas; update
the EXISTS condition that queries information_schema.columns (the block checking
WHERE table_name = 'logs' AND column_name = 'user_id') to also constrain the
schema—e.g. add AND table_schema = 'public' or use AND table_schema =
current_schema()—so the migration only skips when the column truly exists in the
intended schema.
- Around line 17-25: The current IF NOT EXISTS block creates
idx_logs_team_id_user_id inside a transactional DO block which prevents using
CREATE INDEX CONCURRENTLY; extract the index creation so it runs as a standalone
statement outside the DO/transaction and replace CREATE INDEX with CREATE INDEX
CONCURRENTLY ON logs (team_id, user_id), while keeping the existence check (or
perform a separate pg_indexes check before running the standalone CONCURRENTLY
command) so the migration can safely build the index without locking writes on
the logs table.
In `@frontend/src/views/LogsView.vue`:
- Around line 85-92: The backfill dialog currently reuses stale state because
the button directly sets backfillDialog = true; create a helper function
openBackfillDialog() in the <script setup> that resets backfillResult.value = ''
and backfillError.value = false before setting backfillDialog.value = true, and
change the button's `@click` to call openBackfillDialog so the dialog is cleared
each time it opens.
Viewers can no longer trigger the bulk backfill mutation — only members and managers are permitted. Returns HTTP 403 before any DB work. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Promotes user identification from metadata JSONB to a dedicated indexed column for efficient per-team filtering. Includes migration, API filter support, backfill endpoint, and full frontend integration.
Summary by CodeRabbit