-
Notifications
You must be signed in to change notification settings - Fork 8
Log patterns: Log patterns related code #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
|
||
|
|
||
| -- | Get a pattern by ID | ||
| getLogPatternById :: DB es => LogPatternId -> Eff es (Maybe LogPattern) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the _selectWhere pattern, instead of enumerating the fields one by one
| Issues.LogPattern -> | ||
| "Describe this log pattern issue and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
| Issues.LogPatternRateChange -> | ||
| "Describe this log pattern rate change and its implications.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the [text|] or any other quasiquote. so its easier to visually see the pattern of this message without the haskell semigroup noise
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Add no-focus-ring class to exclude AI search input from global :focus-visible outline styling.
This comment was marked as outdated.
This comment was marked as outdated.
| SELECT | ||
| lp.id, | ||
| lp.project_id, | ||
| lp.log_pattern, | ||
| lp.pattern_hash, | ||
| lp.baseline_state, | ||
| lp.baseline_volume_hourly_mean, | ||
| lp.baseline_volume_hourly_stddev, | ||
| COALESCE(counts.current_count, 0)::INT AS current_hour_count | ||
| FROM apis.log_patterns lp | ||
| LEFT JOIN ( | ||
| SELECT log_pattern, COUNT(*) AS current_count | ||
| FROM otel_logs_and_spans | ||
| WHERE project_id = ? | ||
| AND timestamp >= date_trunc('hour', NOW()) | ||
| AND log_pattern IS NOT NULL | ||
| GROUP BY log_pattern | ||
| ) counts ON counts.log_pattern = lp.log_pattern | ||
| WHERE lp.project_id = ? | ||
| AND lp.state != 'ignored' AND lp.baseline_state = 'established' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is log_pattern supposed to join on otel_logs_and_spans when they're not in the same database?
Or is log_patterns supposed to be a timeseries table in timefusion as well? if thats the case then you cant make queries on timeseries tables that dont depend on timestamp range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and you never join on two time series tables. some databases might support the join operation, but the performance is always horrible in that case
| [text| | ||
| SELECT lp.log_pattern, count(*) as p_count | ||
| FROM apis.log_patterns lp | ||
| INNER JOIN otel_logs_and_spans ols | ||
| ON lp.log_pattern = ols.log_pattern AND lp.project_id::text = ols.project_id | ||
| WHERE lp.project_id = ? | ||
| AND lp.state != 'ignored' | ||
| AND ${whereCondition} | ||
| GROUP BY lp.log_pattern | ||
| ORDER BY p_count DESC | ||
| OFFSET ? LIMIT 15 | ||
| |] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as here. how is log_pattern joining on otel_logs_and_spans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otel_logs_and_spans has a log_pattern column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is log_pattern going to be in timefusion or postgres? if its in timefusion, you do joins. And you can't query it without time range being part of the query.
|
|
||
| -- | Get pattern stats from otel_logs_and_spans | ||
| -- Returns median and MAD (Median Absolute Deviation) for robust baseline calculation | ||
| getPatternStats :: DB es => Projects.ProjectId -> Text -> Int -> Eff es (Maybe PatternStats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you not using our widgets or atleast KQL for stats and numbers? isnt this for display?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for detecting spikes
|
|
||
| -- | Calculate baselines for log patterns | ||
| -- Uses hourly counts from otel_logs_and_spans over the last 7 days | ||
| calculateLogPatternBaselines :: Projects.ProjectId -> ATBackgroundCtx () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to support magic alerts. If we're detecting spikes, we should implement an alert system that our users can enable on any metric as well, so its the same code anad logic for all cases. Not magic logic we run in the background.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
PR Review: Log Patterns FeatureSummaryThis is a solid implementation of log patterns functionality with anomaly detection. The code follows Haskell best practices and makes good use of the available GHC extensions. However, there are opportunities for improvement in performance, security, and code succinctness. 🔴 Critical Issues1. Unbounded Query Vulnerability (
|
| [text| | ||
| SELECT id, created_at, updated_at, project_id, issue_type::text, endpoint_hash, acknowledged_at, acknowledged_by, archived_at, title, service, critical, | ||
| CASE WHEN critical THEN 'critical' ELSE 'info' END, affected_requests, affected_clients, NULL::double precision, | ||
| CASE WHEN critical THEN 'critical' ELSE 'info' END, 0::int, 0::int, NULL::double precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you set these to 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are not part of the new issues table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we adding it to the query?
| Issues.LogPattern -> | ||
| "Generate a concise title for this log pattern issue.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service | ||
| Issues.LogPatternRateChange -> | ||
| "Generate a concise title for this log pattern rate change.\n" | ||
| <> "Title: " | ||
| <> issue.title | ||
| <> "\n" | ||
| <> "Service: " | ||
| <> fromMaybe "unknown-service" issue.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the text quasiquotes here.
| @@ -0,0 +1,68 @@ | |||
| BEGIN; | |||
|
|
|||
| CREATE TABLE IF NOT EXISTS apis.log_patterns ( | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will support patterns on different fields not just the default body/message field. This table doesn't seem aware of this expectation.
| CREATE INDEX IF NOT EXISTS idx_log_patterns_last_seen ON apis.log_patterns(project_id, last_seen_at DESC); | ||
| CREATE INDEX IF NOT EXISTS idx_log_patterns_service ON apis.log_patterns(project_id, service_name); | ||
|
|
||
| CREATE OR REPLACE FUNCTION apis.new_log_pattern_proc() RETURNS trigger AS $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the question remains. Is this a timeseries table or regular table? If its a timeseries table (to be on timefusion), then it won't be in the same database as where the background jobs is, and hence would be unable to queue jobs from within the db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a regular table.
Code Review: Log Patterns ImplementationOverall Assessment: This is solid, production-quality Haskell code with excellent architecture. The implementation demonstrates strong type safety, proper database design, and statistical rigor. However, there are some critical issues and opportunities for improvement. Critical Issues 🔴1. Division by Zero RiskLocation: let zScore = (currentRate - mean) / stddevMissing check for Fix: | stddev > 0 && mean > 0 ->
let currentRate = fromIntegral lpRate.currentHourCount
zScore = (currentRate - mean) / stddev
isSpike = abs zScore > 3.0 && currentRate > mean
| mean == 0 && currentHourCount > 10 ->
Just (lpRate.patternId, ...) -- New pattern with activity2. Trigger Performance IssueLocation: The trigger fires on ALL inserts, including Fix: CREATE TRIGGER log_pattern_created_notify
AFTER INSERT ON apis.log_patterns
FOR EACH ROW
WHEN (NEW.xmax = 0) -- Only real inserts, not upserts
EXECUTE PROCEDURE apis.new_log_pattern_proc();3. Memory Risk with Large Pattern SetsLocation: Loading all patterns into memory without pagination. For projects with 10K+ patterns, this could cause OOM. Fix: Add pagination or limit to -- Process in batches of 1000
let batchSize = 1000
patterns <- getLogPatterns pid batchSize 0High Priority Issues
|
PR #300 Review: Log Patterns FunctionalityReviewed 12 files with +1102/-374 lines. Overall: Good implementation with room for improvement. Priority Issues
Performance Optimizations
Code Succinctness (GHC Extensions)
Test CoverageGood: Comprehensive DRAIN algorithm tests Package UsageConsider using statistics package for robust median/MAD/stddev calculations Summary Score
Recommendation: Request changes for security/performance issues, then approve. |
|
Rename patterns function from 15mins to 5mins. |
PR Review: Log Pattern Anomaly DetectionGreat work on implementing log pattern anomaly detection! The implementation is thoughtful with good test coverage. However, I've identified several critical issues and optimization opportunities. 🚨 CRITICAL: Security IssueSQL Injection Vulnerability in The -- VULNERABLE CODE
target = fromMaybe "log_pattern" targetM
let q = [text|
SELECT log_pattern, count(*) as p_count
FROM otel_logs_and_spans
WHERE ${whereCondition} AND log_pattern = ANY(?)
...
|]Fix: Use a whitelist: target = case targetM of
Just "log_pattern" -> "log_pattern"
Just "summary_pattern" -> "summary_pattern"
_ -> "log_pattern"⚡ Performance Issues1. Expensive Baseline CalculationsLines 1213-1248 use Recommendations:
2. Missing Indexes Causing Table ScansLines 89-100: Query filtering on Add partial indexes: CREATE INDEX idx_otel_logs_missing_log_pattern
ON otel_logs_and_spans(project_id, timestamp)
WHERE log_pattern IS NULL;3. Inefficient Pattern LookupLines 113-119: Recommendation: Filter at database level instead of in-memory. 4. Job StormLines 76-81: Scheduling jobs for ALL projects at once creates a job storm. Recommendation: Stagger job creation: run_at = NOW() + random() * INTERVAL '1 hour'🐛 Bugs1. Pattern Metadata ExtractionLines 139-146: If the first log ID doesn't match any event, metadata is lost. -- BUG: Uses first ID which might not exist
let (serviceName, logLevel, logTraceId) = case ids V.!? 0 of
Just logId | logId /= "" ->
case V.find (\(i, _, _, _, _) -> i == logId) events ofFix: Find any matching event: let metadata = case V.find (\(i, _, _, _, _) -> V.elem i ids) events of2. Inconsistent Z-Score LogicLine 224: Using -- INCONSISTENT
let zScore = (currentRate - mean) / stddev
isSpike = abs zScore > 3.0 && currentRate > meanFix: Just check positive spikes: let isSpike = zScore > 3.0 -- Only detect upward spikes3. Variable Name CollisionLine 228: Using let spikeIds = V.fromList $ map (\(pid', _, _, _, _) -> pid') spikeDataFix: Use descriptive names: let spikeIds = V.fromList $ map (\(patternId, _, _, _, _) -> patternId) spikeData📝 Code Quality & Succinctness1. Long Parameter Lists Need Record TypesMultiple functions have 7-9 parameters. Use records: -- Before (9 parameters!)
updateTreeWithLog :: DrainTree -> Int -> Text -> V.Vector Text -> Text -> Bool -> Text -> Text -> UTCTime -> DrainTree
-- After
data LogContext = LogContext
{ tokenCount :: Int
, firstToken :: Text
, tokens :: V.Vector Text
, logId :: Text
, isSample :: Bool
, content :: Text
, field :: Text
, timestamp :: UTCTime
}
updateTreeWithLog :: DrainTree -> LogContext -> DrainTree2. Use RecordWildCards & NamedFieldPunsLines 1756-1761 can be simplified: -- Enable RecordWildCards
case V.findIndex (\DrainLevelOne{tokenCount} -> tokenCount == targetCount) levelOnes of
Just index ->
let existingLevel@DrainLevelOne{nodes} = levelOnes V.! index3. Repetitive Update PatternLines 1760-1803: updateOrCreateInVector ::
(a -> Bool) -> -- Find predicate
(a -> (a, Bool)) -> -- Update function
(V.Vector a -> a) -> -- Create function
V.Vector a -> (V.Vector a, Bool)4. Use Type Classes for Issue PromptsEnhancement.hs lines 650-950 have repetitive pattern matching. Use type classes: class IssueDataToPrompt a where
toTitlePrompt :: a -> Text
toDescriptionPrompt :: a -> Text
instance IssueDataToPrompt APIChangeData where ...
instance IssueDataToPrompt RuntimeExceptionData where ...5. Inefficient Text ProcessingLine 1903-1906: Character-by-character processing is slow. Use -- Instead of: if T.head t == '"'
case T.uncons t of
Just ('"', rest) -> ...
Just ('[', rest) -> ...🔍 SQL Optimization1. SELECT * is InefficientLine 1319: Avoid -- Replace with explicit column list
PG.query [sql| SELECT id, project_id, log_pattern, ... FROM apis.log_patterns WHERE id = ANY(?) |]2. Missing Index for Common QueryLines 1104-1115: Add composite index: CREATE INDEX idx_log_patterns_project_last_seen
ON apis.log_patterns(project_id, last_seen_at DESC);3. Filter Recently Active Patterns OnlyLines 1285-1311: Add filter to reduce unnecessary joins: WHERE lp.project_id = ?
AND lp.state != 'ignored'
AND lp.baseline_state = 'established'
AND lp.last_seen_at > NOW() - INTERVAL '1 day' -- Only check recently active4. Data Migration for Constraint ChangeMigration lines 2020-2093: Constraint changed but might have duplicates. Add deduplication: -- Before changing constraint
DELETE FROM apis.log_patterns a USING apis.log_patterns b
WHERE a.id > b.id
AND a.project_id = b.project_id
AND a.log_level = b.log_level
AND a.field_path = b.field_path
AND a.pattern_hash = b.pattern_hash;🔒 Security Concerns1. Sample Messages May Contain PIILine 257: Sample log messages might contain sensitive data. Recommendations:
2. Validate Background Job PayloadsLine 2069: Ensure background job processor validates pattern hash format to prevent injection. ✅ What's Good
🎯 Priority ActionsMust Fix Before Merge:
Should Fix: Nice to Have: Overall, this is solid work! The main blockers are the SQL injection vulnerability and performance concerns at scale. Once those are addressed, this will be a great addition. |
Saving and presenting log patterns in ui and log patterns anomaly detection
Closes #
How to test