Skip to content

Conversation

@dawkaka
Copy link
Contributor

@dawkaka dawkaka commented Jan 22, 2026

Saving and presenting log patterns in ui and log patterns anomaly detection

  • Unusual increase in log patterns within a time window
  • New detected log patterns

Closes #

How to test

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.



-- | Get a pattern by ID
getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the _selectWhere pattern, instead of enumerating the fields one by one

Comment on lines 250 to 264
Issues.LogPattern ->
"Describe this log pattern issue and its implications.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service
Issues.LogPatternRateChange ->
"Describe this log pattern rate change and its implications.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

Add no-focus-ring class to exclude AI search input from global
:focus-visible outline styling.
@claude

This comment was marked as outdated.

Comment on lines +288 to +307
SELECT
lp.id,
lp.project_id,
lp.log_pattern,
lp.pattern_hash,
lp.baseline_state,
lp.baseline_volume_hourly_mean,
lp.baseline_volume_hourly_stddev,
COALESCE(counts.current_count, 0)::INT AS current_hour_count
FROM apis.log_patterns lp
LEFT JOIN (
SELECT log_pattern, COUNT(*) AS current_count
FROM otel_logs_and_spans
WHERE project_id = ?
AND timestamp >= date_trunc('hour', NOW())
AND log_pattern IS NOT NULL
GROUP BY log_pattern
) counts ON counts.log_pattern = lp.log_pattern
WHERE lp.project_id = ?
AND lp.state != 'ignored' AND lp.baseline_state = 'established'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?

Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case

Comment on lines 573 to 584
[text|
SELECT lp.log_pattern, count(*) as p_count
FROM apis.log_patterns lp
INNER JOIN otel_logs_and_spans ols
ON lp.log_pattern = ols.log_pattern AND lp.project_id::text = ols.project_id
WHERE lp.project_id = ?
AND lp.state != 'ignored'
AND ${whereCondition}
GROUP BY lp.log_pattern
ORDER BY p_count DESC
OFFSET ? LIMIT 15
|]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as here. how is log_pattern joining on otel_logs_and_spans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otel_logs_and_spans has a log_pattern column

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.


-- | Get pattern stats from otel_logs_and_spans
-- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation
getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for detecting spikes


-- | Calculate baselines for log patterns
-- Uses hourly counts from otel_logs_and_spans over the last 7 days
calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx ()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.

@claude

This comment was marked as outdated.

@claude

This comment was marked as outdated.

@claude
Copy link

claude bot commented Jan 24, 2026

PR Review: Log Patterns Feature

Summary

This is a solid implementation of log patterns functionality with anomaly detection. The code follows Haskell best practices and makes good use of the available GHC extensions. However, there are opportunities for improvement in performance, security, and code succinctness.


🔴 Critical Issues

1. Unbounded Query Vulnerability (src/Models/Apis/LogPatterns.hs:96-110)

getLogPatterns :: DB es => Projects.ProjectId -> Int -> Int -> Eff es [LogPattern]
getLogPatterns pid limit offset = PG.query q (pid, limit, offset)

Issue: No validation on limit parameter - malicious caller could request millions of rows.
Fix: Add bounds: let safeLimit = min limit 1000

2. Race Condition in Baseline Calculation (src/BackgroundJobs.hs:1674-1695)

Sequential pattern processing without locking could cause race conditions when multiple jobs run concurrently.
Fix: Use database-level locking or optimistic concurrency control (add version column).

3. Missing Permission Check (src/Models/Apis/LogPatterns.hs:138-148)

acknowledgeLogPatterns :: DB es => Users.UserId -> V.Vector Text -> Eff es Int64

Issue: No validation that user has permission to acknowledge patterns for the project.
Fix: Join with project membership table in the query.

4. Potential Orphaned Jobs (src/BackgroundJobs.hs:1748-1752)

If insertIssue fails, the enhancement job is still created.
Fix: Wrap in transaction or use bracket pattern.


⚠️ Performance Issues

1. N+1 Query Pattern (src/BackgroundJobs.hs:1678-1695)

patterns <- LogPatterns.getLogPatterns pid 1000 0
forM_ patterns \lp -> do
  statsM <- LogPatterns.getPatternStats pid lp.logPattern 168

Impact: 1 + N queries for baseline calculation.
Fix: Create batch query using SQL window functions to compute all pattern statistics in one go.

2. Missing Composite Index (static/migrations/0026_log_patterns.sql)

Spike detection query filters by state and baseline_state, but no composite index exists.
Fix:

CREATE INDEX idx_log_patterns_spike_detection 
  ON apis.log_patterns(project_id, state, baseline_state)
  WHERE state != 'ignored' AND baseline_state = 'established';

3. Memory Leak Risk (src/Pkg/Drain.hs:230-236)

getAllLogGroups tree =
  let levelOnes = children tree
      levelTwos = V.concatMap nodes levelOnes
      allLogGroups = V.concatMap logGroups levelTwos
   in V.map (\grp -> (grp.exampleLog, templateStr grp, logIds grp)) allLogGroups

Issue: Multiple intermediate vectors without forced evaluation.
Fix: Add bang patterns or use strict vector operations.


📝 Code Succinctness Improvements

1. Use MultiWayIf (src/BackgroundJobs.hs:1691-1693)

-- Current:
newState = case lp.baselineState of
  BSEstablished -> BSEstablished 
  BSLearning -> if newMean > 100 || patternAgeDays > 1
                then BSEstablished 
                else BSLearning

-- Better:
newState = if
  | lp.baselineState == BSEstablished -> BSEstablished
  | newMean > 100 || patternAgeDays > 1 -> BSEstablished
  | otherwise -> BSLearning

2. Use ViewPatterns (src/Pkg/Drain.hs:156-168)

-- Current: 13 lines
leastRecentlyUsedIndex logGroups =
  V.ifoldl'
    ( \acc i g ->
        case acc of
          Nothing -> Just (i, lastSeen g)
          Just (j, t) ->
            if lastSeen g < t
              then Just (i, lastSeen g)
              else Just (j, t)
    )
    Nothing
    logGroups
    & maybe 0 fst

-- More concise: 6 lines
leastRecentlyUsedIndex = maybe 0 fst . V.ifoldl' step Nothing
  where
    step acc i g = case acc of
      Nothing -> Just (i, lastSeen g)
      Just (j, t) | lastSeen g < t -> Just (i, lastSeen g)
                  | otherwise -> Just (j, t)

3. Replace Case with fromMaybe (src/Models/Apis/LogPatterns.hs:248-252)

-- Current:
results <- PG.query q (pid, pattern')
case results of
  [Only count] -> return count
  _ -> return 0

-- Concise:
fromMaybe 0 . fmap fromOnly . listToMaybe <$> PG.query q (pid, pattern')

4. Define Data Types Instead of Large Tuples (src/BackgroundJobs.hs:1720)

-- Current:
forM_ spikeData \(patternId, patternHash, currentRate, mean, stddev) -> do

-- Better:
data SpikeInfo = SpikeInfo
  { patternId :: LogPatternId
  , patternHash :: Text
  , currentRate :: Double
  , baselineMean :: Double
  , baselineStddev :: Double
  } deriving stock (Show, Generic)

forM_ (toSpikeInfo <$> spikeData) \info -> ...

Better type safety and more readable.

5. Simplify Vector Conversions (src/BackgroundJobs.hs:512-515)

-- Current:
existingPatterns <- LogPatterns.getLogPatternTexts pid
let known = V.fromList $ map (\pat -> ("",False,pat,Nothing,Nothing,Nothing)) existingPatterns
    combined = known <> ((\(logId, content, trId, serviceName, level) -> (logId, True, content, trId, serviceName, level)) <$> events)

-- Better:
let known = (\pat -> ("", False, pat, Nothing, Nothing, Nothing)) <$> V.fromList existingPatterns
    combined = known <> (second5 (const True) <$> events)  -- Define second5 helper

🔒 Security Concerns

1. Missing Input Validation (src/Models/Apis/LogPatterns.hs:151-165)

Pattern content has no length limits - large strings could cause DoS.
Fix: guard (T.length pattern < 10000) >> upsertLogPattern ...

2. Unvalidated Bulk Actions (src/Pages/Anomalies.hs:141-160)

Bulk actions endpoint trusts client-supplied IDs without verifying project ownership.
Fix: Add WHERE project_id = ? to all anomaly ID lookups.


📦 Database Schema Issues

static/migrations/0026_log_patterns.sql

  1. Line 15 & 24: Use enum types instead of TEXT for state and baseline_state:

    state apis.log_pattern_state NOT NULL DEFAULT 'new'
  2. Missing index: acknowledged_by column lacks index, causing slow joins:

    CREATE INDEX idx_log_patterns_acknowledged_by ON apis.log_patterns(acknowledged_by);
  3. Race condition in trigger (line 58): log_pattern_created_notify could create duplicate jobs.
    Fix: Use INSERT ... ON CONFLICT DO NOTHING in the background job queue.


✅ Positive Highlights

  • Excellent use of deriving strategies and newtypes
  • Good separation of concerns (Models, BackgroundJobs, Pages)
  • Comprehensive database migration with indexes
  • Well-structured issue types with proper sum types
  • Good use of V.Vector for performance
  • LLM enhancement integration is well-architected
  • Proper use of Maybe for optional fields

Test Coverage

The test file test/unit/Pkg/DrainSpec.hs was reformatted (49 lines changed). Ensure all tests still pass and no logic was altered during formatting.


Recommended Priority

  1. High: Fix unbounded query (add limit validation)
  2. High: Add permission check to acknowledgeLogPatterns
  3. High: Add composite index for spike detection
  4. Medium: Batch N+1 queries in baseline calculation
  5. Medium: Add transaction wrapper for issue creation
  6. Low: Refactor large tuples to data types
  7. Low: Apply succinctness improvements

Overall, this is a well-structured feature that follows good Haskell practices. Addressing the critical issues will make it production-ready. Great work! 🚀

[text|
SELECT id, created_at, updated_at, project_id, issue_type::text, endpoint_hash, acknowledged_at, acknowledged_by, archived_at, title, service, critical,
CASE WHEN critical THEN 'critical' ELSE 'info' END, affected_requests, affected_clients, NULL::double precision,
CASE WHEN critical THEN 'critical' ELSE 'info' END, 0::int, 0::int, NULL::double precision,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you set these to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are not part of the new issues table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding it to the query?

Comment on lines 149 to 162
Issues.LogPattern ->
"Generate a concise title for this log pattern issue.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service
Issues.LogPatternRateChange ->
"Generate a concise title for this log pattern rate change.\n"
<> "Title: "
<> issue.title
<> "\n"
<> "Service: "
<> fromMaybe "unknown-service" issue.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the text quasiquotes here.

@@ -0,0 +1,68 @@
BEGIN;

CREATE TABLE IF NOT EXISTS apis.log_patterns (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.

CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC);
CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name);

CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a regular table.

@claude
Copy link

claude bot commented Jan 27, 2026

Code Review: Log Patterns Implementation

Overall Assessment: This is solid, production-quality Haskell code with excellent architecture. The implementation demonstrates strong type safety, proper database design, and statistical rigor. However, there are some critical issues and opportunities for improvement.


Critical Issues 🔴

1. Division by Zero Risk

Location: src/BackgroundJobs.hs:1712

let zScore = (currentRate - mean) / stddev

Missing check for mean > 0. If mean = 0 but currentRate > 0, this should likely be flagged as a spike for a new pattern gaining activity.

Fix:

| stddev > 0 && mean > 0 ->
    let currentRate = fromIntegral lpRate.currentHourCount
        zScore = (currentRate - mean) / stddev
        isSpike = abs zScore > 3.0 && currentRate > mean
| mean == 0 && currentHourCount > 10 -> 
    Just (lpRate.patternId, ...) -- New pattern with activity

2. Trigger Performance Issue

Location: static/migrations/0026_log_patterns.sql:40-58

The trigger fires on ALL inserts, including ON CONFLICT DO UPDATE cases. For 1000 pattern upserts per batch, this creates 1000 duplicate background jobs.

Fix:

CREATE TRIGGER log_pattern_created_notify 
AFTER INSERT ON apis.log_patterns 
FOR EACH ROW 
WHEN (NEW.xmax = 0)  -- Only real inserts, not upserts
EXECUTE PROCEDURE apis.new_log_pattern_proc();

3. Memory Risk with Large Pattern Sets

Location: src/BackgroundJobs.hs:512, 1678

Loading all patterns into memory without pagination. For projects with 10K+ patterns, this could cause OOM.

Fix: Add pagination or limit to getLogPatterns:

-- Process in batches of 1000
let batchSize = 1000
patterns <- getLogPatterns pid batchSize 0

High Priority Issues ⚠️

4. Missing Database Constraints

Location: static/migrations/0026_log_patterns.sql:15, 24

State columns lack CHECK constraints for data integrity.

Fix:

state TEXT NOT NULL DEFAULT 'new' CHECK (state IN ('new', 'acknowledged', 'ignored')),
baseline_state TEXT NOT NULL DEFAULT 'learning' CHECK (baseline_state IN ('learning', 'established')),

5. Hardcoded Magic Numbers

Throughout src/BackgroundJobs.hs, thresholds are hardcoded:

  • Line 1713: 3.0 (3-sigma threshold)
  • Line 1681: 168 (7-day baseline window)
  • Line 1743: 10000 (min events for new pattern issue)
  • Line 1692: 100 (min hourly events for baseline)

Fix: Extract to named constants with documentation:

-- Constants for baseline establishment
minEventsForEstablishment :: Double
minEventsForEstablishment = 100  -- Minimum hourly events to establish baseline

minAgeForEstablishment :: Double  
minAgeForEstablishment = 1  -- Minimum age in days

sigmaThreshold :: Double
sigmaThreshold = 3.0  -- Standard deviations for spike detection

6. Missing Index

Location: static/migrations/0026_log_patterns.sql

No index on pattern_hash alone for cross-project queries.

Fix:

CREATE INDEX idx_log_patterns_hash ON apis.log_patterns(pattern_hash);

Code Quality Improvements

7. Redundant SQL in LogPatterns.hs

Location: src/Models/Apis/LogPatterns.hs:95-110, 119-134

SELECT clauses repeat all 20 fields multiple times.

Improvement: Leverage the Entity deriving more:

getLogPatterns pid limit offset = 
  PG.query [sql| SELECT * FROM apis.log_patterns 
                 WHERE project_id = ? 
                 ORDER BY last_seen_at DESC 
                 LIMIT ? OFFSET ? |] (pid, limit, offset)

8. Use Enum for Change Direction

Location: src/Models/Apis/Issues.hs:669

direction :: Text should be type-safe:

data ChangeDirection = Spike | Drop deriving (Show, Eq, Generic, ToJSON, FromJSON)

9. MultiWayIf for Cleaner Conditionals

Location: src/BackgroundJobs.hs:1690-1693

Already using many extensions, but MultiWayIf would help:

let newState = if
  | patternAgeDays < 1 -> BSLearning
  | newMean < 100 && lp.baselineState == BSLearning -> BSLearning  
  | otherwise -> BSEstablished

Performance Considerations

Strengths ✅

  • Excellent use of median + MAD instead of mean + stddev for outlier robustness
  • Batch queries to prevent N+1 (lines 315-318)
  • HashMap for O(1) pattern lookup (line 1719)
  • Proper use of coerce for zero-cost conversions

Opportunities

  • Pattern Loading: Consider streaming for very large pattern sets
  • Drain Tree: Not persisted between runs (CPU vs memory trade-off is reasonable)
  • Query Optimization: Lines 285-306 use LEFT JOIN with subquery - consider materialized view for 1000s of patterns

Security ✅

  • ✅ All queries use parameterized statements (no SQL injection)
  • ✅ Proper access control via project_id filtering
  • ⚠️ DoS potential: High log volume → thousands of patterns → thousands of baseline calculations (consider rate limiting)

Missing Functionality

  1. No drop detection - only spike detection (line 1713: currentRate > mean prevents drop detection)
  2. No pattern cleanup - consider retention policy for patterns not seen in 30+ days
  3. No error handling/retry - baseline/spike calculation failures are silent
  4. No tests - missing unit tests for baseline edge cases, integration tests for spike detection

What I Really Like 🌟

  1. Statistical Rigor: Using median + MAD with proper 1.4826 scaling factor shows deep understanding
  2. Type Safety: Excellent use of newtypes (LogPatternId), derivingVia, and type-safe database operations
  3. Modern Haskell: Great use of RecordDotSyntax, BlockArguments, DerivingVia
  4. State Machine: Proper baseline state management (BSLearningBSEstablished)
  5. Database Design: Comprehensive indexes and foreign key constraints
  6. Code Organization: Clean separation between data models, background jobs, and UI

Recommendations

Must Fix Before Merge

  1. ✅ Fix division-by-zero in spike detection (BackgroundJobs.hs:1712)
  2. ✅ Fix trigger to avoid duplicate jobs (migration)
  3. ✅ Add CHECK constraints for state columns

Should Fix Soon

  1. Add pagination for pattern loading (prevent OOM)
  2. Extract magic numbers to named constants
  3. Add basic error handling and logging
  4. Add index on pattern_hash

Nice to Have

  1. Implement drop detection
  2. Add unit tests for statistical calculations
  3. Add pattern cleanup/retention policy

Verdict: Strong implementation with minor critical fixes needed. The architecture is sound and the code demonstrates excellent Haskell practices. Once the critical issues are addressed, this will be production-ready.

@claude
Copy link

claude bot commented Jan 27, 2026

PR #300 Review: Log Patterns Functionality

Reviewed 12 files with +1102/-374 lines. Overall: Good implementation with room for improvement.

Priority Issues

  1. Security: Missing Input Validation (LogPatterns.hs:151) - No length validation on patterns/hashes
  2. Security: No Rate Limiting - Pattern creation lacks rate limiting
  3. Performance: Memory Issue (BackgroundJobs.hs:512) - getLogPatternTexts loads ALL patterns into memory
  4. Bug: Race Condition (BackgroundJobs.hs:1743-1754) - TOCTOU between event count check and pattern retrieval
  5. Bug: Fragile LLM Parsing (Enhancement.hs:86-88) - Silent failures on unexpected format

Performance Optimizations

  • Vector Inefficiency (BackgroundJobs.hs:536): O(n) V.find - use HashMap for O(1)
  • DB Query Performance (LogPatterns.hs:218-243): Add index on (project_id, log_pattern, timestamp)
  • Magic Numbers (BackgroundJobs.hs:1692): Extract to constants

Code Succinctness (GHC Extensions)

  • Use guards instead of case on bools (LogPatterns.hs:138-141)
  • Use ViewPatterns for cleaner pattern matching (Drain.hs:100-107)
  • Use RecordWildCards to simplify long functions (Issues.hs:669-694)
  • Break long lines (Issues.hs:694 is 140+ chars)

Test Coverage

Good: Comprehensive DRAIN algorithm tests
Missing: LogPatterns DB ops, baseline calc, spike detection, integration tests

Package Usage

Consider using statistics package for robust median/MAD/stddev calculations

Summary Score

  • Code Quality: 7/10
  • Security: 6/10 (needs input validation & rate limiting)
  • Performance: 7/10
  • Test Coverage: 5/10
  • Maintainability: 7/10

Recommendation: Request changes for security/performance issues, then approve.

@tonyalaribe
Copy link
Contributor

Rename patterns function from 15mins to 5mins.

@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: Log Pattern Anomaly Detection

Great work on implementing log pattern anomaly detection! The implementation is thoughtful with good test coverage. However, I've identified several critical issues and optimization opportunities.


🚨 CRITICAL: Security Issue

SQL Injection Vulnerability in src/Models/Apis/RequestDumps.hs lines 1344-1356:

The target parameter is interpolated into SQL without proper validation. An attacker could manipulate targetM to inject SQL.

-- VULNERABLE CODE
target = fromMaybe "log_pattern" targetM
let q = [text|
  SELECT log_pattern, count(*) as p_count
  FROM otel_logs_and_spans
  WHERE ${whereCondition} AND log_pattern = ANY(?)
  ...
|]

Fix: Use a whitelist:

target = case targetM of
  Just "log_pattern" -> "log_pattern"
  Just "summary_pattern" -> "summary_pattern"
  _ -> "log_pattern"

⚡ Performance Issues

1. Expensive Baseline Calculations

Lines 1213-1248 use PERCENTILE_CONT twice per pattern, which is computationally expensive. For projects with hundreds of patterns, this will be slow.

Recommendations:

  • Create materialized view for hourly counts
  • Consider simpler approximations (e.g., 1.4826 * stddev_samp instead of MAD)
  • Batch process patterns instead of individual queries

2. Missing Indexes Causing Table Scans

Lines 89-100: Query filtering on (summary_pattern IS NULL OR log_pattern IS NULL) will cause full table scans.

Add partial indexes:

CREATE INDEX idx_otel_logs_missing_log_pattern 
  ON otel_logs_and_spans(project_id, timestamp) 
  WHERE log_pattern IS NULL;

3. Inefficient Pattern Lookup

Lines 113-119: getLogPatternTexts loads ALL patterns into memory for every batch. For projects with thousands of patterns, this is wasteful.

Recommendation: Filter at database level instead of in-memory.

4. Job Storm

Lines 76-81: Scheduling jobs for ALL projects at once creates a job storm.

Recommendation: Stagger job creation:

run_at = NOW() + random() * INTERVAL '1 hour'

🐛 Bugs

1. Pattern Metadata Extraction

Lines 139-146: If the first log ID doesn't match any event, metadata is lost.

-- BUG: Uses first ID which might not exist
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
      Just logId | logId /= "" ->
        case V.find (\(i, _, _, _, _) -> i == logId) events of

Fix: Find any matching event:

let metadata = case V.find (\(i, _, _, _, _) -> V.elem i ids) events of

2. Inconsistent Z-Score Logic

Line 224: Using abs but then checking direction is inconsistent.

-- INCONSISTENT
let zScore = (currentRate - mean) / stddev
    isSpike = abs zScore > 3.0 && currentRate > mean

Fix: Just check positive spikes:

let isSpike = zScore > 3.0  -- Only detect upward spikes

3. Variable Name Collision

Line 228: Using pid' when pid is in scope is confusing.

let spikeIds = V.fromList $ map (\(pid', _, _, _, _) -> pid') spikeData

Fix: Use descriptive names:

let spikeIds = V.fromList $ map (\(patternId, _, _, _, _) -> patternId) spikeData

📝 Code Quality & Succinctness

1. Long Parameter Lists Need Record Types

Multiple functions have 7-9 parameters. Use records:

-- Before (9 parameters!)
updateTreeWithLog :: DrainTree -> Int -> Text -> V.Vector Text -> Text -> Bool -> Text -> Text -> UTCTime -> DrainTree

-- After
data LogContext = LogContext
  { tokenCount :: Int
  , firstToken :: Text
  , tokens :: V.Vector Text
  , logId :: Text
  , isSample :: Bool
  , content :: Text
  , field :: Text
  , timestamp :: UTCTime
  }

updateTreeWithLog :: DrainTree -> LogContext -> DrainTree

2. Use RecordWildCards & NamedFieldPuns

Lines 1756-1761 can be simplified:

-- Enable RecordWildCards
case V.findIndex (\DrainLevelOne{tokenCount} -> tokenCount == targetCount) levelOnes of
  Just index ->
    let existingLevel@DrainLevelOne{nodes} = levelOnes V.! index

3. Repetitive Update Pattern

Lines 1760-1803: updateOrCreateLevelOne, updateOrCreateLevelTwo, and updateOrCreateLogGroup all follow the same pattern. Abstract into a higher-order function:

updateOrCreateInVector :: 
  (a -> Bool) ->           -- Find predicate
  (a -> (a, Bool)) ->      -- Update function
  (V.Vector a -> a) ->     -- Create function
  V.Vector a -> (V.Vector a, Bool)

4. Use Type Classes for Issue Prompts

Enhancement.hs lines 650-950 have repetitive pattern matching. Use type classes:

class IssueDataToPrompt a where
  toTitlePrompt :: a -> Text
  toDescriptionPrompt :: a -> Text

instance IssueDataToPrompt APIChangeData where ...
instance IssueDataToPrompt RuntimeExceptionData where ...

5. Inefficient Text Processing

Line 1903-1906: Character-by-character processing is slow. Use T.uncons:

-- Instead of: if T.head t == '"'
case T.uncons t of
  Just ('"', rest) -> ...
  Just ('[', rest) -> ...

🔍 SQL Optimization

1. SELECT * is Inefficient

Line 1319: Avoid SELECT *:

-- Replace with explicit column list
PG.query [sql| SELECT id, project_id, log_pattern, ... FROM apis.log_patterns WHERE id = ANY(?) |]

2. Missing Index for Common Query

Lines 1104-1115: Add composite index:

CREATE INDEX idx_log_patterns_project_last_seen 
  ON apis.log_patterns(project_id, last_seen_at DESC);

3. Filter Recently Active Patterns Only

Lines 1285-1311: Add filter to reduce unnecessary joins:

WHERE lp.project_id = ?
  AND lp.state != 'ignored' 
  AND lp.baseline_state = 'established'
  AND lp.last_seen_at > NOW() - INTERVAL '1 day'  -- Only check recently active

4. Data Migration for Constraint Change

Migration lines 2020-2093: Constraint changed but might have duplicates. Add deduplication:

-- Before changing constraint
DELETE FROM apis.log_patterns a USING apis.log_patterns b
WHERE a.id > b.id 
  AND a.project_id = b.project_id 
  AND a.log_level = b.log_level 
  AND a.field_path = b.field_path 
  AND a.pattern_hash = b.pattern_hash;

🔒 Security Concerns

1. Sample Messages May Contain PII

Line 257: Sample log messages might contain sensitive data.

Recommendations:

  • Truncate to first 200 chars
  • Add config flag to disable sample collection
  • Scrub common PII patterns (emails, API keys)

2. Validate Background Job Payloads

Line 2069: Ensure background job processor validates pattern hash format to prevent injection.


✅ What's Good

  • Excellent test coverage in test/unit/Pkg/DrainSpec.hs
  • Good use of batch queries to avoid N+1 (line 1316)
  • Proper use of quasi-quoters for SQL
  • Thoughtful use of median/MAD for robustness against outliers
  • Well-structured hierarchical clustering in Drain algorithm

🎯 Priority Actions

Must Fix Before Merge:

  1. ✅ Fix SQL injection in RequestDumps.hs
  2. ✅ Add pattern metadata extraction bug fix
  3. ✅ Add partial indexes for pattern extraction

Should Fix:
4. Optimize baseline calculation (batch or simplify stats)
5. Stagger background job creation
6. Add composite indexes for common queries
7. Use record types for long parameter lists
8. Fix z-score logic inconsistency

Nice to Have:
9. Extract higher-order update function
10. Use RecordWildCards/ViewPatterns
11. Add type classes for issue prompts
12. Use explicit SELECT columns


Overall, this is solid work! The main blockers are the SQL injection vulnerability and performance concerns at scale. Once those are addressed, this will be a great addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants