Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/healthcheckv2] Add event aggregation logic #32695

Merged
merged 8 commits into from
Jun 10, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Clarify temporal ordering rationale and add test
mwear committed Jun 10, 2024
commit 0df5dd0d00e87694948d8418ba0c2416c8099637
Original file line number Diff line number Diff line change
@@ -48,7 +48,16 @@ type aggregationFunc func(*AggregateStatus) Event
// events. The priority argument determines the priority of PermanentError
// events vs RecoverableError events. Lifecycle events will have the timestamp
// of the most recent event and error events will have the timestamp of the
// first occurrence.
// first occurrence. We use the first occurrence of an error event as this marks
// the beginning of a possible failure. This is important for two reasons:
// recovery duration and causality. We expect a RecoverableError to recover
// before the RecoveryDuration elapses. We need to use the earliest timestamp so
// that a later RecoverableError does not shadow an earlier event in the
// aggregate status. Additionally, this makes sense in the case where a
// RecoverableError in one component cascades to other components; the earliest
// error event is likely to be correlated with the cause. For non-error stauses
// we use the latest event as it represents the last time a successful status was
// reported.
func newAggregationFunc(priority ErrorPriority) aggregationFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this priority going to be fixed for the lifecycle of the component?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is actually fixed for the aggregator as a whole (e.g. all components) for the lifetime of the collector process.

permanentPriorityFunc := func(seen map[component.Status]struct{}) component.Status {
if _, isPermanent := seen[component.StatusPermanentError]; isPermanent {
@@ -124,10 +133,12 @@ func newAggregationFunc(priority ErrorPriority) aggregationFunc {
case matchingEvent == nil:
matchingEvent = ev
case isError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments/tests for this specifically? I assume this is to return the earliest error, but would be nice to see this described more properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I marked this unresolved because I'm not clear on this either, and I don't believe we have tests for a case where the top-level status is an error and we hit this case. I understand why we do the ordering if we hit this, but it would be nice to see more details about why this case is separate from the one below where we return matchingEvent if isError is set (on line 139 as of this comment).

// Use earliest to mark beginning of a failure
if ev.Timestamp().Before(matchingEvent.Timestamp()) {
matchingEvent = ev
}
case ev.Timestamp().After(matchingEvent.Timestamp()):
// Use most recent for last successful status
matchingEvent = ev
jpkrohling marked this conversation as resolved.
Show resolved Hide resolved
}
}
Original file line number Diff line number Diff line change
@@ -125,3 +125,48 @@ func TestAggregationFuncs(t *testing.T) {
})
}
}

func TestEventTemporalOrder(t *testing.T) {
// Note: ErrorPriority does not affect temporal ordering
aggFunc := newAggregationFunc(PriorityPermanent)
st := &AggregateStatus{
ComponentStatusMap: map[string]*AggregateStatus{
"c1": {
Event: component.NewStatusEvent(component.StatusOK),
},
},
}
assert.Equal(t, st.ComponentStatusMap["c1"].Event, aggFunc(st))

// Record first error
st.ComponentStatusMap["c2"] = &AggregateStatus{
Event: component.NewRecoverableErrorEvent(assert.AnError),
}

// Returns first error
assert.Equal(t, st.ComponentStatusMap["c2"].Event, aggFunc(st))

// Record second error
st.ComponentStatusMap["c3"] = &AggregateStatus{
Event: component.NewRecoverableErrorEvent(assert.AnError),
}

// Still returns first error
assert.Equal(t, st.ComponentStatusMap["c2"].Event, aggFunc(st))

// Clear first error
st.ComponentStatusMap["c2"] = &AggregateStatus{
Event: component.NewStatusEvent(component.StatusOK),
}

// Returns second error now
assert.Equal(t, st.ComponentStatusMap["c3"].Event, aggFunc(st))

// Clear second error
st.ComponentStatusMap["c3"] = &AggregateStatus{
Event: component.NewStatusEvent(component.StatusOK),
}

// Returns latest event
assert.Equal(t, st.ComponentStatusMap["c3"].Event, aggFunc(st))
}