Skip to content

Comments

add first-class user_id column to logs with backfill support#4

Merged
9cb14c1ec0 merged 2 commits intomasterfrom
user-id
Feb 19, 2026
Merged

add first-class user_id column to logs with backfill support#4
9cb14c1ec0 merged 2 commits intomasterfrom
user-id

Conversation

@9cb14c1ec0
Copy link
Owner

@9cb14c1ec0 9cb14c1ec0 commented Feb 19, 2026

Promotes user identification from metadata JSONB to a dedicated indexed column for efficient per-team filtering. Includes migration, API filter support, backfill endpoint, and full frontend integration.

Summary by CodeRabbit

  • New Features
    • Added user ID tracking for logs with filtering and search capabilities.
    • Introduced User ID column in logs table display.
    • Added Backfill User ID feature to populate user IDs from metadata with optional overwrite behavior.

Promotes user identification from metadata JSONB to a dedicated indexed
column for efficient per-team filtering. Includes migration, API filter
support, backfill endpoint, and full frontend integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2026

Warning

Rate limit exceeded

@9cb14c1ec0 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 41 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

This pull request adds comprehensive user_id support to the log management system. Changes span the data model with a new nullable user_id field, ingest logic to propagate user_id, a database migration creating the column and index, API endpoints for filtering and backfilling user_id values, schema updates exposing new types, and frontend enhancements for user_id display, filtering, and a dedicated backfill interface.

Changes

Cohort / File(s) Summary
Backend Data Model & Migration
backend/app/models/log.py, backend/migrations/003_add_user_id_to_logs.sql
Added nullable user_id field to Log model. Migration adds user_id VARCHAR(255) column and composite index on (team_id, user_id) with idempotent checks.
Backend Ingest Layer
backend/app/api/ingest.py
Propagates user_id field in both single log ingestion (ingest_log) and batch ingestion (ingest_logs_batch) via Log.create calls.
Backend API & Schemas
backend/app/api/logs.py, backend/app/schemas/log.py, backend/app/schemas/__init__.py
Added user_id filtering to search_logs and delete_logs endpoints. Introduced new backfill_user_id endpoint (POST) that updates logs.user_id from metadata key via raw SQL. Extended LogCreate, LogResponse, and LogSearchParams with optional user_id field. Exported UserIdBackfillRequest and UserIdBackfillResponse schemas.
Frontend Client & UI
frontend/src/api/client.ts, frontend/src/views/LogsView.vue
Added user_id field to Log interface. Extended LogsView with user_id query filter, table column display, and new Backfill User ID feature including dialog input for metadata key and overwrite option, with POST integration to backfill endpoint and success/error handling.

Sequence Diagram

sequenceDiagram
    participant LogsView as LogsView (Frontend)
    participant API as Backend API
    participant DB as Database

    LogsView->>LogsView: User enters metadata key & opens backfill dialog
    LogsView->>API: POST /teams/{teamId}/logs/backfill-user-id<br/>(metadataKey, overwrite)
    API->>DB: UPDATE logs SET user_id = metadata[key]<br/>WHERE user_id IS NULL OR overwrite=true
    DB-->>API: Return update count
    API-->>LogsView: Return success + count
    LogsView->>LogsView: Show success message
    LogsView->>API: GET /teams/{teamId}/logs (fetch refreshed)
    API->>DB: SELECT logs filtered by team_id
    DB-->>API: Return updated logs with new user_id
    API-->>LogsView: Return logs
    LogsView->>LogsView: Render table with populated user_id
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A rabbit hops through logs so bright,
With user_id fields in gleaming sight,
We backfill metadata, swap and glean,
The finest logging ever seen!
Hop, filter, search—our system's keen! 🌟

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately reflects the main objectives: adding a first-class user_id column and providing backfill support, which aligns with all major changes across models, migrations, API endpoints, and frontend UI.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch user-id

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
frontend/src/views/LogsView.vue (1)

85-92: Consider resetting backfill dialog state when opening.

When the dialog is reopened, stale result/error messages from a previous run persist. Clearing state on open avoids user confusion.

Proposed fix
-        `@click`="backfillDialog = true"
+        `@click`="openBackfillDialog"

Then add a helper in the <script setup>:

function openBackfillDialog() {
  backfillResult.value = ''
  backfillError.value = false
  backfillDialog.value = true
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@frontend/src/views/LogsView.vue` around lines 85 - 92, The backfill dialog
currently reuses stale state because the button directly sets backfillDialog =
true; create a helper function openBackfillDialog() in the <script setup> that
resets backfillResult.value = '' and backfillError.value = false before setting
backfillDialog.value = true, and change the button's `@click` to call
openBackfillDialog so the dialog is cleared each time it opens.
backend/app/api/logs.py (2)

174-174: Unused membership variable.

Static analysis (Ruff RUF059) flags that membership is unused. Use _ prefix to signal intent.

Proposed fix
-    team, membership = await get_team_member(team_id, user)
+    team, _membership = await get_team_member(team_id, user)

Note: if you add the role check suggested in the other comment, the variable will be used and this becomes moot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/logs.py` at line 174, The call to get_team_member assigns
(team, membership) but membership is never used, triggering a lint error; update
the assignment to (team, _membership) or prefix the variable with an underscore
(e.g., _membership) where get_team_member(team_id, user) is awaited so intent is
clear to linters, or alternatively use membership in the suggested role-check
logic elsewhere (the unique symbols to update are the get_team_member call and
the local variable name membership).

178-189: Unbounded UPDATE on a potentially large table — consider batching or adding guardrails.

This UPDATE runs in a single transaction with no LIMIT. For teams with millions of log rows, this can hold a long write lock, saturate I/O, and risk statement timeouts. Consider batching (e.g., updating in chunks of 10k–50k rows using a loop with LIMIT + ctid-based pagination) or at minimum adding a row-count warning/cap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/api/logs.py` around lines 178 - 189, The UPDATE currently
executed via conn.execute_query with sql built from body.metadata_key and
team.id can touch the entire logs table and should be converted to a batched
update to avoid long locks; modify the logic around the SQL (where you currently
append " AND user_id IS NULL" when not body.overwrite) to run in a loop that
updates limited chunks (e.g., 10k–50k rows) using a safe pagination strategy
such as ctid-based batching (UPDATE ... WHERE ctid IN (SELECT ctid FROM logs
WHERE team_id = $2 AND metadata ? $1 [AND user_id IS NULL] LIMIT $N)), repeating
until no rows updated, and accumulate the updated count into updated (instead of
a single conn.execute_query call), or at minimum perform a row-count check
before running the full update and abort/log if above a threshold.
backend/migrations/003_add_user_id_to_logs.sql (2)

7-10: information_schema check doesn't filter by table_schema.

The query matches any table named logs in any schema. If the app ever uses a non-default schema, this check could return a false positive and skip the migration. Adding AND table_schema = 'public' (or the relevant schema) would make this more robust.

Proposed fix
     IF NOT EXISTS (
         SELECT 1 FROM information_schema.columns
-        WHERE table_name = 'logs' AND column_name = 'user_id'
+        WHERE table_schema = 'public' AND table_name = 'logs' AND column_name = 'user_id'
     ) THEN
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/migrations/003_add_user_id_to_logs.sql` around lines 7 - 10, The
information_schema check for the 'logs' table/ 'user_id' column doesn't filter
by schema and can match tables in other schemas; update the EXISTS condition
that queries information_schema.columns (the block checking WHERE table_name =
'logs' AND column_name = 'user_id') to also constrain the schema—e.g. add AND
table_schema = 'public' or use AND table_schema = current_schema()—so the
migration only skips when the column truly exists in the intended schema.

17-25: Consider CREATE INDEX CONCURRENTLY for production tables with existing data.

The CREATE INDEX on Line 21 acquires a full table lock, which can block writes on a large logs table for the duration of index build. Since this is wrapped in a DO block (which runs inside a transaction), CONCURRENTLY cannot be used here. For production deployments with significant existing data, consider running the index creation as a separate, standalone statement outside of the DO block so you can use CREATE INDEX CONCURRENTLY.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/migrations/003_add_user_id_to_logs.sql` around lines 17 - 25, The
current IF NOT EXISTS block creates idx_logs_team_id_user_id inside a
transactional DO block which prevents using CREATE INDEX CONCURRENTLY; extract
the index creation so it runs as a standalone statement outside the
DO/transaction and replace CREATE INDEX with CREATE INDEX CONCURRENTLY ON logs
(team_id, user_id), while keeping the existence check (or perform a separate
pg_indexes check before running the standalone CONCURRENTLY command) so the
migration can safely build the index without locking writes on the logs table.
backend/app/schemas/log.py (1)

13-13: Add max_length=255 to match the DB column constraint.

The user_id DB column is VARCHAR(255) (see backend/app/models/log.py Line 22), but the Pydantic field has no length limit. Values exceeding 255 characters will produce an opaque database error instead of a clean 422 validation response.

Proposed fix
-    user_id: str | None = None
+    user_id: str | None = Field(None, max_length=255)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/app/schemas/log.py` at line 13, Update the Pydantic schema field for
user_id in the Log schema to enforce the DB length constraint by using
Field(..., max_length=255) (i.e., change the declaration of user_id to use
pydantic.Field with max_length=255) and import Field from pydantic if not
already present; this ensures values >255 chars raise a 422 validation error
instead of a DB error and matches the VARCHAR(255) defined on the Log model's
user_id column.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/app/api/logs.py`:
- Around line 162-193: The backfill_user_id endpoint currently only checks
membership via get_team_member and allows any member (including VIEWER) to
perform a bulk update; update backfill_user_id to enforce role-based
authorization by inspecting the membership.role (from the result of
get_team_member) against the TeamRole enum and only allow TeamRole.MEMBER and
TeamRole.MANAGER; if the role is not permitted, raise an HTTP 403 (forbidden)
error with an explanatory message so viewers cannot trigger the mutation. Ensure
you import or reference TeamRole and perform this check before acquiring the DB
connection or executing the SQL in backfill_user_id.

---

Nitpick comments:
In `@backend/app/api/logs.py`:
- Line 174: The call to get_team_member assigns (team, membership) but
membership is never used, triggering a lint error; update the assignment to
(team, _membership) or prefix the variable with an underscore (e.g.,
_membership) where get_team_member(team_id, user) is awaited so intent is clear
to linters, or alternatively use membership in the suggested role-check logic
elsewhere (the unique symbols to update are the get_team_member call and the
local variable name membership).
- Around line 178-189: The UPDATE currently executed via conn.execute_query with
sql built from body.metadata_key and team.id can touch the entire logs table and
should be converted to a batched update to avoid long locks; modify the logic
around the SQL (where you currently append " AND user_id IS NULL" when not
body.overwrite) to run in a loop that updates limited chunks (e.g., 10k–50k
rows) using a safe pagination strategy such as ctid-based batching (UPDATE ...
WHERE ctid IN (SELECT ctid FROM logs WHERE team_id = $2 AND metadata ? $1 [AND
user_id IS NULL] LIMIT $N)), repeating until no rows updated, and accumulate the
updated count into updated (instead of a single conn.execute_query call), or at
minimum perform a row-count check before running the full update and abort/log
if above a threshold.

In `@backend/app/schemas/log.py`:
- Line 13: Update the Pydantic schema field for user_id in the Log schema to
enforce the DB length constraint by using Field(..., max_length=255) (i.e.,
change the declaration of user_id to use pydantic.Field with max_length=255) and
import Field from pydantic if not already present; this ensures values >255
chars raise a 422 validation error instead of a DB error and matches the
VARCHAR(255) defined on the Log model's user_id column.

In `@backend/migrations/003_add_user_id_to_logs.sql`:
- Around line 7-10: The information_schema check for the 'logs' table/ 'user_id'
column doesn't filter by schema and can match tables in other schemas; update
the EXISTS condition that queries information_schema.columns (the block checking
WHERE table_name = 'logs' AND column_name = 'user_id') to also constrain the
schema—e.g. add AND table_schema = 'public' or use AND table_schema =
current_schema()—so the migration only skips when the column truly exists in the
intended schema.
- Around line 17-25: The current IF NOT EXISTS block creates
idx_logs_team_id_user_id inside a transactional DO block which prevents using
CREATE INDEX CONCURRENTLY; extract the index creation so it runs as a standalone
statement outside the DO/transaction and replace CREATE INDEX with CREATE INDEX
CONCURRENTLY ON logs (team_id, user_id), while keeping the existence check (or
perform a separate pg_indexes check before running the standalone CONCURRENTLY
command) so the migration can safely build the index without locking writes on
the logs table.

In `@frontend/src/views/LogsView.vue`:
- Around line 85-92: The backfill dialog currently reuses stale state because
the button directly sets backfillDialog = true; create a helper function
openBackfillDialog() in the <script setup> that resets backfillResult.value = ''
and backfillError.value = false before setting backfillDialog.value = true, and
change the button's `@click` to call openBackfillDialog so the dialog is cleared
each time it opens.

Viewers can no longer trigger the bulk backfill mutation — only members
and managers are permitted. Returns HTTP 403 before any DB work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@9cb14c1ec0 9cb14c1ec0 merged commit e3a32c0 into master Feb 19, 2026
3 checks passed
@9cb14c1ec0 9cb14c1ec0 deleted the user-id branch February 19, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant