-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[extension/healthcheckv2] Add event aggregation logic #32695
Changes from 1 commit
56561bb
7e40820
0df5dd0
d139f25
a91af2b
bc7086c
a3016a3
a0e93fe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,7 +48,16 @@ type aggregationFunc func(*AggregateStatus) Event | |
// events. The priority argument determines the priority of PermanentError | ||
// events vs RecoverableError events. Lifecycle events will have the timestamp | ||
// of the most recent event and error events will have the timestamp of the | ||
// first occurrence. | ||
// first occurrence. We use the first occurrence of an error event as this marks | ||
// the beginning of a possible failure. This is important for two reasons: | ||
// recovery duration and causality. We expect a RecoverableError to recover | ||
// before the RecoveryDuration elapses. We need to use the earliest timestamp so | ||
// that a later RecoverableError does not shadow an earlier event in the | ||
// aggregate status. Additionally, this makes sense in the case where a | ||
// RecoverableError in one component cascades to other components; the earliest | ||
// error event is likely to be correlated with the cause. For non-error stauses | ||
// we use the latest event as it represents the last time a successful status was | ||
// reported. | ||
func newAggregationFunc(priority ErrorPriority) aggregationFunc { | ||
permanentPriorityFunc := func(seen map[component.Status]struct{}) component.Status { | ||
if _, isPermanent := seen[component.StatusPermanentError]; isPermanent { | ||
|
@@ -124,10 +133,12 @@ func newAggregationFunc(priority ErrorPriority) aggregationFunc { | |
case matchingEvent == nil: | ||
matchingEvent = ev | ||
case isError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add some comments/tests for this specifically? I assume this is to return the earliest error, but would be nice to see this described more properly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I marked this unresolved because I'm not clear on this either, and I don't believe we have tests for a case where the top-level status is an error and we hit this case. I understand why we do the ordering if we hit this, but it would be nice to see more details about why this case is separate from the one below where we return |
||
// Use earliest to mark beginning of a failure | ||
if ev.Timestamp().Before(matchingEvent.Timestamp()) { | ||
matchingEvent = ev | ||
} | ||
case ev.Timestamp().After(matchingEvent.Timestamp()): | ||
// Use most recent for last successful status | ||
matchingEvent = ev | ||
jpkrohling marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this priority going to be fixed for the lifecycle of the component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is actually fixed for the aggregator as a whole (e.g. all components) for the lifetime of the collector process.