Log patterns: Log patterns related code #300

dawkaka · 2026-01-22T19:09:30Z

Saving and presenting log patterns in ui and log patterns anomaly detection

Unusual increase in log patterns within a time window
New detected log patterns

Closes #

How to test

tonyalaribe · 2026-01-22T20:24:46Z

src/Models/Apis/LogPatterns.hs

+
+
+-- | Get a pattern by ID
+getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)


use the _selectWhere pattern, instead of enumerating the fields one by one

tonyalaribe · 2026-01-22T20:26:55Z

src/Models/Apis/Issues/Enhancement.hs

+        Issues.LogPattern ->
+          "Describe this log pattern issue and its implications.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service
+        Issues.LogPatternRateChange ->
+          "Describe this log pattern rate change and its implications.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service



Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

tonyalaribe · 2026-01-22T23:43:18Z

src/Models/Apis/LogPatterns.hs

+        SELECT
+          lp.id,
+          lp.project_id,
+          lp.log_pattern,
+          lp.pattern_hash,
+          lp.baseline_state,
+          lp.baseline_volume_hourly_mean,
+          lp.baseline_volume_hourly_stddev,
+          COALESCE(counts.current_count, 0)::INT AS current_hour_count
+        FROM apis.log_patterns lp
+        LEFT JOIN (
+          SELECT log_pattern, COUNT(*) AS current_count
+          FROM otel_logs_and_spans
+          WHERE project_id = ?
+            AND timestamp >= date_trunc('hour', NOW())
+            AND log_pattern IS NOT NULL
+          GROUP BY log_pattern
+        ) counts ON counts.log_pattern = lp.log_pattern
+        WHERE lp.project_id = ?
+          AND lp.state != 'ignored' AND lp.baseline_state = 'established'


How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?

Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range

Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case

tonyalaribe · 2026-01-22T23:43:48Z

src/Models/Apis/RequestDumps.hs

+            [text|
+              SELECT lp.log_pattern, count(*) as p_count
+              FROM apis.log_patterns lp
+              INNER JOIN otel_logs_and_spans ols
+                ON lp.log_pattern = ols.log_pattern AND lp.project_id::text = ols.project_id
+              WHERE lp.project_id = ?
+                AND lp.state != 'ignored'
+                AND ${whereCondition}
+              GROUP BY lp.log_pattern
+              ORDER BY p_count DESC
+              OFFSET ? LIMIT 15
+            |]


same as here. how is log_pattern joining on otel_logs_and_spans?

otel_logs_and_spans has a log_pattern column

Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.

tonyalaribe · 2026-01-22T23:45:32Z

src/Models/Apis/LogPatterns.hs

+
+-- | Get pattern stats from otel_logs_and_spans
+-- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation
+getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats)


why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?

This is for detecting spikes

tonyalaribe · 2026-01-22T23:47:09Z

src/BackgroundJobs.hs

+
+-- | Calculate baselines for log patterns
+-- Uses hourly counts from otel_logs_and_spans over the last 7 days
+calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx ()


I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.

claude · 2026-01-24T07:06:39Z

PR Review: Log Patterns Feature

Summary

This is a solid implementation of log patterns functionality with anomaly detection. The code follows Haskell best practices and makes good use of the available GHC extensions. However, there are opportunities for improvement in performance, security, and code succinctness.

🔴 Critical Issues

1. Unbounded Query Vulnerability (`src/Models/Apis/LogPatterns.hs:96-110`)

getLogPatterns :: DB es => Projects.ProjectId -> Int -> Int -> Eff es [LogPattern]
getLogPatterns pid limit offset = PG.query q (pid, limit, offset)

Issue: No validation on limit parameter - malicious caller could request millions of rows.
Fix: Add bounds: let safeLimit = min limit 1000

2. Race Condition in Baseline Calculation (`src/BackgroundJobs.hs:1674-1695`)

Sequential pattern processing without locking could cause race conditions when multiple jobs run concurrently.
Fix: Use database-level locking or optimistic concurrency control (add version column).

3. Missing Permission Check (`src/Models/Apis/LogPatterns.hs:138-148`)

acknowledgeLogPatterns :: DB es => Users.UserId -> V.Vector Text -> Eff es Int64

Issue: No validation that user has permission to acknowledge patterns for the project.
Fix: Join with project membership table in the query.

4. Potential Orphaned Jobs (`src/BackgroundJobs.hs:1748-1752`)

If insertIssue fails, the enhancement job is still created.
Fix: Wrap in transaction or use bracket pattern.

⚠️ Performance Issues

1. N+1 Query Pattern (`src/BackgroundJobs.hs:1678-1695`)

patterns <- LogPatterns.getLogPatterns pid 1000 0
forM_ patterns \lp -> do
  statsM <- LogPatterns.getPatternStats pid lp.logPattern 168

Impact: 1 + N queries for baseline calculation.
Fix: Create batch query using SQL window functions to compute all pattern statistics in one go.

2. Missing Composite Index (`static/migrations/0026_log_patterns.sql`)

Spike detection query filters by state and baseline_state, but no composite index exists.
Fix:

CREATE INDEX idx_log_patterns_spike_detection 
  ON apis.log_patterns(project_id, state, baseline_state)
  WHERE state != 'ignored' AND baseline_state = 'established';

3. Memory Leak Risk (`src/Pkg/Drain.hs:230-236`)

getAllLogGroups tree =
  let levelOnes = children tree
      levelTwos = V.concatMap nodes levelOnes
      allLogGroups = V.concatMap logGroups levelTwos
   in V.map (\grp -> (grp.exampleLog, templateStr grp, logIds grp)) allLogGroups

Issue: Multiple intermediate vectors without forced evaluation.
Fix: Add bang patterns or use strict vector operations.

📝 Code Succinctness Improvements

1. Use MultiWayIf (`src/BackgroundJobs.hs:1691-1693`)

-- Current:
newState = case lp.baselineState of
  BSEstablished -> BSEstablished 
  BSLearning -> if newMean > 100 || patternAgeDays > 1
                then BSEstablished 
                else BSLearning

-- Better:
newState = if
  | lp.baselineState == BSEstablished -> BSEstablished
  | newMean > 100 || patternAgeDays > 1 -> BSEstablished
  | otherwise -> BSLearning

2. Use ViewPatterns (`src/Pkg/Drain.hs:156-168`)

-- Current: 13 lines
leastRecentlyUsedIndex logGroups =
  V.ifoldl'
    ( \acc i g ->
        case acc of
          Nothing -> Just (i, lastSeen g)
          Just (j, t) ->
            if lastSeen g < t
              then Just (i, lastSeen g)
              else Just (j, t)
    )
    Nothing
    logGroups
    & maybe 0 fst

-- More concise: 6 lines
leastRecentlyUsedIndex = maybe 0 fst . V.ifoldl' step Nothing
  where
    step acc i g = case acc of
      Nothing -> Just (i, lastSeen g)
      Just (j, t) | lastSeen g < t -> Just (i, lastSeen g)
                  | otherwise -> Just (j, t)

3. Replace Case with fromMaybe (`src/Models/Apis/LogPatterns.hs:248-252`)

-- Current:
results <- PG.query q (pid, pattern')
case results of
  [Only count] -> return count
  _ -> return 0

-- Concise:
fromMaybe 0 . fmap fromOnly . listToMaybe <$> PG.query q (pid, pattern')

4. Define Data Types Instead of Large Tuples (`src/BackgroundJobs.hs:1720`)

-- Current:
forM_ spikeData \(patternId, patternHash, currentRate, mean, stddev) -> do

-- Better:
data SpikeInfo = SpikeInfo
  { patternId :: LogPatternId
  , patternHash :: Text
  , currentRate :: Double
  , baselineMean :: Double
  , baselineStddev :: Double
  } deriving stock (Show, Generic)

forM_ (toSpikeInfo <$> spikeData) \info -> ...

Better type safety and more readable.

5. Simplify Vector Conversions (`src/BackgroundJobs.hs:512-515`)

-- Current:
existingPatterns <- LogPatterns.getLogPatternTexts pid
let known = V.fromList $ map (\pat -> ("",False,pat,Nothing,Nothing,Nothing)) existingPatterns
    combined = known <> ((\(logId, content, trId, serviceName, level) -> (logId, True, content, trId, serviceName, level)) <$> events)

-- Better:
let known = (\pat -> ("", False, pat, Nothing, Nothing, Nothing)) <$> V.fromList existingPatterns
    combined = known <> (second5 (const True) <$> events)  -- Define second5 helper

🔒 Security Concerns

1. Missing Input Validation (`src/Models/Apis/LogPatterns.hs:151-165`)

Pattern content has no length limits - large strings could cause DoS.
Fix: guard (T.length pattern < 10000) >> upsertLogPattern ...

2. Unvalidated Bulk Actions (`src/Pages/Anomalies.hs:141-160`)

Bulk actions endpoint trusts client-supplied IDs without verifying project ownership.
Fix: Add WHERE project_id = ? to all anomaly ID lookups.

📦 Database Schema Issues

`static/migrations/0026_log_patterns.sql`

Line 15 & 24: Use enum types instead of TEXT for state and baseline_state:
```
state apis.log_pattern_state NOT NULL DEFAULT 'new'
```

Missing index: acknowledged_by column lacks index, causing slow joins:

CREATE INDEX idx_log_patterns_acknowledged_by ON apis.log_patterns(acknowledged_by);

Race condition in trigger (line 58): log_pattern_created_notify could create duplicate jobs.
Fix: Use INSERT ... ON CONFLICT DO NOTHING in the background job queue.

✅ Positive Highlights

Excellent use of deriving strategies and newtypes
Good separation of concerns (Models, BackgroundJobs, Pages)
Comprehensive database migration with indexes
Well-structured issue types with proper sum types
Good use of V.Vector for performance
LLM enhancement integration is well-architected
Proper use of Maybe for optional fields

Test Coverage

The test file test/unit/Pkg/DrainSpec.hs was reformatted (49 lines changed). Ensure all tests still pass and no logic was altered during formatting.

Recommended Priority

High: Fix unbounded query (add limit validation)
High: Add permission check to acknowledgeLogPatterns
High: Add composite index for spike detection
Medium: Batch N+1 queries in baseline calculation
Medium: Add transaction wrapper for issue creation
Low: Refactor large tuples to data types
Low: Apply succinctness improvements

Overall, this is a well-structured feature that follows good Haskell practices. Addressing the critical issues will make it production-ready. Great work! 🚀

tonyalaribe · 2026-01-26T21:40:58Z

src/Models/Apis/Issues.hs

      [text|
      SELECT id, created_at, updated_at, project_id, issue_type::text, endpoint_hash, acknowledged_at, acknowledged_by, archived_at, title, service, critical,
-        CASE WHEN critical THEN 'critical' ELSE 'info' END, affected_requests, affected_clients, NULL::double precision,
+        CASE WHEN critical THEN 'critical' ELSE 'info' END, 0::int, 0::int, NULL::double precision,


Why did you set these to 0?

Those are not part of the new issues table.

Why are we adding it to the query?

tonyalaribe · 2026-01-26T21:50:05Z

src/Models/Apis/Issues/Enhancement.hs

+        Issues.LogPattern ->
+          "Generate a concise title for this log pattern issue.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service
+        Issues.LogPatternRateChange ->
+          "Generate a concise title for this log pattern rate change.\n"
+            <> "Title: "
+            <> issue.title
+            <> "\n"
+            <> "Service: "
+            <> fromMaybe "unknown-service" issue.service


Use the text quasiquotes here.

tonyalaribe · 2026-01-26T21:54:23Z

static/migrations/0026_log_patterns.sql

@@ -0,0 +1,68 @@
+BEGIN;
+
+CREATE TABLE IF NOT EXISTS apis.log_patterns (


We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.

tonyalaribe · 2026-01-26T21:55:39Z

static/migrations/0026_log_patterns.sql

+CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC);
+CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name);
+
+CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$


Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.

It's a regular table.

claude · 2026-01-27T15:06:54Z

Code Review: Log Patterns Implementation

Overall Assessment: This is solid, production-quality Haskell code with excellent architecture. The implementation demonstrates strong type safety, proper database design, and statistical rigor. However, there are some critical issues and opportunities for improvement.

Critical Issues 🔴

1. Division by Zero Risk

Location: src/BackgroundJobs.hs:1712

let zScore = (currentRate - mean) / stddev

Missing check for mean > 0. If mean = 0 but currentRate > 0, this should likely be flagged as a spike for a new pattern gaining activity.

Fix:

| stddev > 0 && mean > 0 ->
    let currentRate = fromIntegral lpRate.currentHourCount
        zScore = (currentRate - mean) / stddev
        isSpike = abs zScore > 3.0 && currentRate > mean
| mean == 0 && currentHourCount > 10 -> 
    Just (lpRate.patternId, ...) -- New pattern with activity

2. Trigger Performance Issue

Location: static/migrations/0026_log_patterns.sql:40-58

The trigger fires on ALL inserts, including ON CONFLICT DO UPDATE cases. For 1000 pattern upserts per batch, this creates 1000 duplicate background jobs.

Fix:

CREATE TRIGGER log_pattern_created_notify 
AFTER INSERT ON apis.log_patterns 
FOR EACH ROW 
WHEN (NEW.xmax = 0)  -- Only real inserts, not upserts
EXECUTE PROCEDURE apis.new_log_pattern_proc();

3. Memory Risk with Large Pattern Sets

Location: src/BackgroundJobs.hs:512, 1678

Loading all patterns into memory without pagination. For projects with 10K+ patterns, this could cause OOM.

Fix: Add pagination or limit to getLogPatterns:

-- Process in batches of 1000
let batchSize = 1000
patterns <- getLogPatterns pid batchSize 0

High Priority Issues ⚠️

4. Missing Database Constraints

Location: static/migrations/0026_log_patterns.sql:15, 24

State columns lack CHECK constraints for data integrity.

Fix:

state TEXT NOT NULL DEFAULT 'new' CHECK (state IN ('new', 'acknowledged', 'ignored')),
baseline_state TEXT NOT NULL DEFAULT 'learning' CHECK (baseline_state IN ('learning', 'established')),

5. Hardcoded Magic Numbers

Throughout src/BackgroundJobs.hs, thresholds are hardcoded:

Line 1713: 3.0 (3-sigma threshold)
Line 1681: 168 (7-day baseline window)
Line 1743: 10000 (min events for new pattern issue)
Line 1692: 100 (min hourly events for baseline)

Fix: Extract to named constants with documentation:

-- Constants for baseline establishment
minEventsForEstablishment :: Double
minEventsForEstablishment = 100  -- Minimum hourly events to establish baseline

minAgeForEstablishment :: Double  
minAgeForEstablishment = 1  -- Minimum age in days

sigmaThreshold :: Double
sigmaThreshold = 3.0  -- Standard deviations for spike detection

6. Missing Index

Location: static/migrations/0026_log_patterns.sql

No index on pattern_hash alone for cross-project queries.

Fix:

CREATE INDEX idx_log_patterns_hash ON apis.log_patterns(pattern_hash);

Code Quality Improvements

7. Redundant SQL in LogPatterns.hs

Location: src/Models/Apis/LogPatterns.hs:95-110, 119-134

SELECT clauses repeat all 20 fields multiple times.

Improvement: Leverage the Entity deriving more:

getLogPatterns pid limit offset = 
  PG.query [sql| SELECT * FROM apis.log_patterns 
                 WHERE project_id = ? 
                 ORDER BY last_seen_at DESC 
                 LIMIT ? OFFSET ? |] (pid, limit, offset)

8. Use Enum for Change Direction

Location: src/Models/Apis/Issues.hs:669

direction :: Text should be type-safe:

data ChangeDirection = Spike | Drop deriving (Show, Eq, Generic, ToJSON, FromJSON)

9. MultiWayIf for Cleaner Conditionals

Location: src/BackgroundJobs.hs:1690-1693

Already using many extensions, but MultiWayIf would help:

let newState = if
  | patternAgeDays < 1 -> BSLearning
  | newMean < 100 && lp.baselineState == BSLearning -> BSLearning  
  | otherwise -> BSEstablished

Performance Considerations

Strengths ✅

Excellent use of median + MAD instead of mean + stddev for outlier robustness
Batch queries to prevent N+1 (lines 315-318)
HashMap for O(1) pattern lookup (line 1719)
Proper use of coerce for zero-cost conversions

Opportunities

Pattern Loading: Consider streaming for very large pattern sets
Drain Tree: Not persisted between runs (CPU vs memory trade-off is reasonable)
Query Optimization: Lines 285-306 use LEFT JOIN with subquery - consider materialized view for 1000s of patterns

Security ✅

✅ All queries use parameterized statements (no SQL injection)
✅ Proper access control via project_id filtering
⚠️ DoS potential: High log volume → thousands of patterns → thousands of baseline calculations (consider rate limiting)

Missing Functionality

No drop detection - only spike detection (line 1713: currentRate > mean prevents drop detection)
No pattern cleanup - consider retention policy for patterns not seen in 30+ days
No error handling/retry - baseline/spike calculation failures are silent
No tests - missing unit tests for baseline edge cases, integration tests for spike detection

What I Really Like 🌟

Statistical Rigor: Using median + MAD with proper 1.4826 scaling factor shows deep understanding
Type Safety: Excellent use of newtypes (LogPatternId), derivingVia, and type-safe database operations
Modern Haskell: Great use of RecordDotSyntax, BlockArguments, DerivingVia
State Machine: Proper baseline state management (BSLearning → BSEstablished)
Database Design: Comprehensive indexes and foreign key constraints
Code Organization: Clean separation between data models, background jobs, and UI

Recommendations

Must Fix Before Merge

✅ Fix division-by-zero in spike detection (BackgroundJobs.hs:1712)
✅ Fix trigger to avoid duplicate jobs (migration)
✅ Add CHECK constraints for state columns

Should Fix Soon

Add pagination for pattern loading (prevent OOM)
Extract magic numbers to named constants
Add basic error handling and logging
Add index on pattern_hash

Nice to Have

Implement drop detection
Add unit tests for statistical calculations
Add pattern cleanup/retention policy

Verdict: Strong implementation with minor critical fixes needed. The architecture is sound and the code demonstrates excellent Haskell practices. Once the critical issues are addressed, this will be production-ready.

claude · 2026-01-27T15:20:03Z

PR #300 Review: Log Patterns Functionality

Reviewed 12 files with +1102/-374 lines. Overall: Good implementation with room for improvement.

Priority Issues

Security: Missing Input Validation (LogPatterns.hs:151) - No length validation on patterns/hashes
Security: No Rate Limiting - Pattern creation lacks rate limiting
Performance: Memory Issue (BackgroundJobs.hs:512) - getLogPatternTexts loads ALL patterns into memory
Bug: Race Condition (BackgroundJobs.hs:1743-1754) - TOCTOU between event count check and pattern retrieval
Bug: Fragile LLM Parsing (Enhancement.hs:86-88) - Silent failures on unexpected format

Performance Optimizations

Vector Inefficiency (BackgroundJobs.hs:536): O(n) V.find - use HashMap for O(1)
DB Query Performance (LogPatterns.hs:218-243): Add index on (project_id, log_pattern, timestamp)
Magic Numbers (BackgroundJobs.hs:1692): Extract to constants

Code Succinctness (GHC Extensions)

Use guards instead of case on bools (LogPatterns.hs:138-141)
Use ViewPatterns for cleaner pattern matching (Drain.hs:100-107)
Use RecordWildCards to simplify long functions (Issues.hs:669-694)
Break long lines (Issues.hs:694 is 140+ chars)

Test Coverage

Good: Comprehensive DRAIN algorithm tests
Missing: LogPatterns DB ops, baseline calc, spike detection, integration tests

Package Usage

Consider using statistics package for robust median/MAD/stddev calculations

Summary Score

Code Quality: 7/10
Security: 6/10 (needs input validation & rate limiting)
Performance: 7/10
Test Coverage: 5/10
Maintainability: 7/10

Recommendation: Request changes for security/performance issues, then approve.

tonyalaribe · 2026-01-27T21:33:10Z

Rename patterns function from 15mins to 5mins.

claude · 2026-01-28T23:16:10Z

PR Review: Log Pattern Anomaly Detection

Great work on implementing log pattern anomaly detection! The implementation is thoughtful with good test coverage. However, I've identified several critical issues and optimization opportunities.

🚨 CRITICAL: Security Issue

SQL Injection Vulnerability in src/Models/Apis/RequestDumps.hs lines 1344-1356:

The target parameter is interpolated into SQL without proper validation. An attacker could manipulate targetM to inject SQL.

-- VULNERABLE CODE
target = fromMaybe "log_pattern" targetM
let q = [text|
  SELECT log_pattern, count(*) as p_count
  FROM otel_logs_and_spans
  WHERE ${whereCondition} AND log_pattern = ANY(?)
  ...
|]

Fix: Use a whitelist:

target = case targetM of
  Just "log_pattern" -> "log_pattern"
  Just "summary_pattern" -> "summary_pattern"
  _ -> "log_pattern"

⚡ Performance Issues

1. Expensive Baseline Calculations

Lines 1213-1248 use PERCENTILE_CONT twice per pattern, which is computationally expensive. For projects with hundreds of patterns, this will be slow.

Recommendations:

Create materialized view for hourly counts
Consider simpler approximations (e.g., 1.4826 * stddev_samp instead of MAD)
Batch process patterns instead of individual queries

2. Missing Indexes Causing Table Scans

Lines 89-100: Query filtering on (summary_pattern IS NULL OR log_pattern IS NULL) will cause full table scans.

Add partial indexes:

CREATE INDEX idx_otel_logs_missing_log_pattern 
  ON otel_logs_and_spans(project_id, timestamp) 
  WHERE log_pattern IS NULL;

3. Inefficient Pattern Lookup

Lines 113-119: getLogPatternTexts loads ALL patterns into memory for every batch. For projects with thousands of patterns, this is wasteful.

Recommendation: Filter at database level instead of in-memory.

4. Job Storm

Lines 76-81: Scheduling jobs for ALL projects at once creates a job storm.

Recommendation: Stagger job creation:

run_at = NOW() + random() * INTERVAL '1 hour'

🐛 Bugs

1. Pattern Metadata Extraction

Lines 139-146: If the first log ID doesn't match any event, metadata is lost.

-- BUG: Uses first ID which might not exist
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
      Just logId | logId /= "" ->
        case V.find (\(i, _, _, _, _) -> i == logId) events of

Fix: Find any matching event:

let metadata = case V.find (\(i, _, _, _, _) -> V.elem i ids) events of

2. Inconsistent Z-Score Logic

Line 224: Using abs but then checking direction is inconsistent.

-- INCONSISTENT
let zScore = (currentRate - mean) / stddev
    isSpike = abs zScore > 3.0 && currentRate > mean

Fix: Just check positive spikes:

let isSpike = zScore > 3.0  -- Only detect upward spikes

3. Variable Name Collision

Line 228: Using pid' when pid is in scope is confusing.

let spikeIds = V.fromList $ map (\(pid', _, _, _, _) -> pid') spikeData

Fix: Use descriptive names:

let spikeIds = V.fromList $ map (\(patternId, _, _, _, _) -> patternId) spikeData

📝 Code Quality & Succinctness

1. Long Parameter Lists Need Record Types

Multiple functions have 7-9 parameters. Use records:

-- Before (9 parameters!)
updateTreeWithLog :: DrainTree -> Int -> Text -> V.Vector Text -> Text -> Bool -> Text -> Text -> UTCTime -> DrainTree

-- After
data LogContext = LogContext
  { tokenCount :: Int
  , firstToken :: Text
  , tokens :: V.Vector Text
  , logId :: Text
  , isSample :: Bool
  , content :: Text
  , field :: Text
  , timestamp :: UTCTime
  }

updateTreeWithLog :: DrainTree -> LogContext -> DrainTree

2. Use RecordWildCards & NamedFieldPuns

Lines 1756-1761 can be simplified:

-- Enable RecordWildCards
case V.findIndex (\DrainLevelOne{tokenCount} -> tokenCount == targetCount) levelOnes of
  Just index ->
    let existingLevel@DrainLevelOne{nodes} = levelOnes V.! index

3. Repetitive Update Pattern

Lines 1760-1803: updateOrCreateLevelOne, updateOrCreateLevelTwo, and updateOrCreateLogGroup all follow the same pattern. Abstract into a higher-order function:

updateOrCreateInVector :: 
  (a -> Bool) ->           -- Find predicate
  (a -> (a, Bool)) ->      -- Update function
  (V.Vector a -> a) ->     -- Create function
  V.Vector a -> (V.Vector a, Bool)

4. Use Type Classes for Issue Prompts

Enhancement.hs lines 650-950 have repetitive pattern matching. Use type classes:

class IssueDataToPrompt a where
  toTitlePrompt :: a -> Text
  toDescriptionPrompt :: a -> Text

instance IssueDataToPrompt APIChangeData where ...
instance IssueDataToPrompt RuntimeExceptionData where ...

5. Inefficient Text Processing

Line 1903-1906: Character-by-character processing is slow. Use T.uncons:

-- Instead of: if T.head t == '"'
case T.uncons t of
  Just ('"', rest) -> ...
  Just ('[', rest) -> ...

🔍 SQL Optimization

1. SELECT * is Inefficient

Line 1319: Avoid SELECT *:

-- Replace with explicit column list
PG.query [sql| SELECT id, project_id, log_pattern, ... FROM apis.log_patterns WHERE id = ANY(?) |]

2. Missing Index for Common Query

Lines 1104-1115: Add composite index:

CREATE INDEX idx_log_patterns_project_last_seen 
  ON apis.log_patterns(project_id, last_seen_at DESC);

3. Filter Recently Active Patterns Only

Lines 1285-1311: Add filter to reduce unnecessary joins:

WHERE lp.project_id = ?
  AND lp.state != 'ignored' 
  AND lp.baseline_state = 'established'
  AND lp.last_seen_at > NOW() - INTERVAL '1 day'  -- Only check recently active

4. Data Migration for Constraint Change

Migration lines 2020-2093: Constraint changed but might have duplicates. Add deduplication:

-- Before changing constraint
DELETE FROM apis.log_patterns a USING apis.log_patterns b
WHERE a.id > b.id 
  AND a.project_id = b.project_id 
  AND a.log_level = b.log_level 
  AND a.field_path = b.field_path 
  AND a.pattern_hash = b.pattern_hash;

🔒 Security Concerns

1. Sample Messages May Contain PII

Line 257: Sample log messages might contain sensitive data.

Recommendations:

Truncate to first 200 chars
Add config flag to disable sample collection
Scrub common PII patterns (emails, API keys)

2. Validate Background Job Payloads

Line 2069: Ensure background job processor validates pattern hash format to prevent injection.

✅ What's Good

Excellent test coverage in test/unit/Pkg/DrainSpec.hs
Good use of batch queries to avoid N+1 (line 1316)
Proper use of quasi-quoters for SQL
Thoughtful use of median/MAD for robustness against outliers
Well-structured hierarchical clustering in Drain algorithm

🎯 Priority Actions

Must Fix Before Merge:

✅ Fix SQL injection in RequestDumps.hs
✅ Add pattern metadata extraction bug fix
✅ Add partial indexes for pattern extraction

Should Fix:
4. Optimize baseline calculation (batch or simplify stats)
5. Stagger background job creation
6. Add composite indexes for common queries
7. Use record types for long parameter lists
8. Fix z-score logic inconsistency

Nice to Have:
9. Extract higher-order update function
10. Use RecordWildCards/ViewPatterns
11. Add type classes for issue prompts
12. Use explicit SELECT columns

Overall, this is solid work! The main blockers are the SQL injection vulnerability and performance concerns at scale. Once those are addressed, this will be a great addition.

dawkaka and others added 6 commits January 22, 2026 08:45

feat: log patterns migrations

6554a94

log pattern model types

19e5557

feat: complete log patterns partition

c729678

Auto-format code with fourmolu

598d185

ui for log patterns

5e03ab0

Auto-format code with fourmolu

d983b09

This comment was marked as outdated.

Sign in to view

fix drain tests

fc40a76

This comment was marked as outdated.

Sign in to view

tonyalaribe reviewed Jan 22, 2026

View reviewed changes

dawkaka and others added 4 commits January 22, 2026 20:33

fix hlint errors

b55e22f

use select where

282bfb8

fetch log ids at once to avoid N+1 queries

c5f3125

Auto-format code with fourmolu

f2abf87

This comment was marked as outdated.

Sign in to view

feat: use text quasiquotes

83e8e1f

This comment was marked as outdated.

Sign in to view

add issue columns

642b376

This comment was marked as outdated.

Sign in to view

dawkaka and others added 2 commits January 22, 2026 21:22

claude review fixes

07f5f18

Auto-format code with fourmolu

684e41e

This comment was marked as outdated.

Sign in to view

Remove focus outline from AI search input

a254f11

Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.

This comment was marked as outdated.

Sign in to view

tonyalaribe reviewed Jan 22, 2026

View reviewed changes

chore: multiway if

bf14999

Auto-format code with fourmolu

127bdbe

This comment was marked as outdated.

Sign in to view

auto set basleine to established after 24 hours

5bbf760

This comment was marked as outdated.

Sign in to view

dawkaka and others added 2 commits January 24, 2026 07:02

updaqte log patterns

2a04530

Auto-format code with fourmolu

2733507

tonyalaribe reviewed Jan 26, 2026

View reviewed changes

dawkaka and others added 2 commits January 27, 2026 15:02

fetch log pattern stats without joining tables

eabc206

Auto-format code with fourmolu

e0566a6

dawkaka added 2 commits January 27, 2026 15:15

use text neat interpolation

bd731dc

remove skip

6c3b074

dawkaka and others added 5 commits January 28, 2026 20:21

improve drain algorithm tokenization

f9f6cbd

add field path to log patterns table

8d48445

add body field path

3d22265

add field path to drain

3e42aef

Auto-format code with fourmolu

d819123



		-- \| Get a pattern by ID
		getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)

		@@ -0,0 +1,68 @@
		BEGIN;

		CREATE TABLE IF NOT EXISTS apis.log_patterns (

Log patterns: Log patterns related code #300

Are you sure you want to change the base?

Log patterns: Log patterns related code #300

Uh oh!

Conversation

dawkaka commented Jan 22, 2026

How to test

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

claude bot commented Jan 24, 2026

PR Review: Log Patterns Feature

Summary

🔴 Critical Issues

1. Unbounded Query Vulnerability (src/Models/Apis/LogPatterns.hs:96-110)

2. Race Condition in Baseline Calculation (src/BackgroundJobs.hs:1674-1695)

3. Missing Permission Check (src/Models/Apis/LogPatterns.hs:138-148)

4. Potential Orphaned Jobs (src/BackgroundJobs.hs:1748-1752)

⚠️ Performance Issues

1. N+1 Query Pattern (src/BackgroundJobs.hs:1678-1695)

2. Missing Composite Index (static/migrations/0026_log_patterns.sql)

3. Memory Leak Risk (src/Pkg/Drain.hs:230-236)

📝 Code Succinctness Improvements

1. Use MultiWayIf (src/BackgroundJobs.hs:1691-1693)

2. Use ViewPatterns (src/Pkg/Drain.hs:156-168)

3. Replace Case with fromMaybe (src/Models/Apis/LogPatterns.hs:248-252)

4. Define Data Types Instead of Large Tuples (src/BackgroundJobs.hs:1720)

5. Simplify Vector Conversions (src/BackgroundJobs.hs:512-515)

🔒 Security Concerns

1. Missing Input Validation (src/Models/Apis/LogPatterns.hs:151-165)

2. Unvalidated Bulk Actions (src/Pages/Anomalies.hs:141-160)

📦 Database Schema Issues

static/migrations/0026_log_patterns.sql

✅ Positive Highlights

Test Coverage

Recommended Priority

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

claude bot commented Jan 27, 2026

Code Review: Log Patterns Implementation

1. Unbounded Query Vulnerability (`src/Models/Apis/LogPatterns.hs:96-110`)

2. Race Condition in Baseline Calculation (`src/BackgroundJobs.hs:1674-1695`)

3. Missing Permission Check (`src/Models/Apis/LogPatterns.hs:138-148`)

4. Potential Orphaned Jobs (`src/BackgroundJobs.hs:1748-1752`)

1. N+1 Query Pattern (`src/BackgroundJobs.hs:1678-1695`)

2. Missing Composite Index (`static/migrations/0026_log_patterns.sql`)

3. Memory Leak Risk (`src/Pkg/Drain.hs:230-236`)

1. Use MultiWayIf (`src/BackgroundJobs.hs:1691-1693`)

2. Use ViewPatterns (`src/Pkg/Drain.hs:156-168`)

3. Replace Case with fromMaybe (`src/Models/Apis/LogPatterns.hs:248-252`)

4. Define Data Types Instead of Large Tuples (`src/BackgroundJobs.hs:1720`)

5. Simplify Vector Conversions (`src/BackgroundJobs.hs:512-515`)

1. Missing Input Validation (`src/Models/Apis/LogPatterns.hs:151-165`)

2. Unvalidated Bulk Actions (`src/Pages/Anomalies.hs:141-160`)

`static/migrations/0026_log_patterns.sql`