-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkpoint_harmony_endpoint: improve handling of 404 and 503 errors #13009
checkpoint_harmony_endpoint: improve handling of 404 and 503 errors #13009
Conversation
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
🚀 Benchmarks reportTo see the full report comment with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. One point of clean-up with some options.
// 404 Not Found - Resubmit the task ID query for the same timeframe. | ||
state.with( | ||
{ | ||
"events": [{"message": {"event": {"reason": "polling"}}.encode_json()}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When want_more
is true, a dummy event is required to make it immediately re-run. If there are no events it will wait for an interval.
Here, want_more
is false, so that dummy event is unnecessary.
I think it should either be an empty array:
"events": [{"message": {"event": {"reason": "polling"}}.encode_json()}], | |
"events": [], |
or a single event with an error message (single events get logged):
"events": [{"message": {"event": {"reason": "polling"}}.encode_json()}], | |
"events": { "error": {"message": "404: task ID not found" }}, |
or the same with the polling message, so the event will be logged, but the pipeline will drop it:
"events": [{"message": {"event": {"reason": "polling"}}.encode_json()}], | |
"events": {"message": {"event": {"reason": "polling"}}.encode_json(), "error": {"message": "404: task ID not found"}}, |
The last one would be easier if the pipeline expected parsed JSON instead of { "message": "JSON string" }
, because the CEL input logging code only logs a single message if it's parsed JSON.
I don't have a strong opinion about which one. The later ones are more messy, but get some useful information in the log/ES. The first one is clean and 404s shouldn't be happening a lot, and if they are they can always be seen with request tracing.
Here's a relevant part of the CEL documentation:
The field should be an array, but in the case of an error condition in the CEL program it is acceptable to return a single object instead of an array; this will will be wrapped as an array for publication and an error will be logged. If the single object contains a key, "error", the error value will be used to update the status of the input to report to Elastic Agent. This can be used to more rapidly respond to API failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are talking about the 503 case instead of the 404 (that's the one that sets want_more
to false).
Thanks for the detailed explanation, I would choose to log the error, as receiving a 503 is not very common and so we can discover it more easily. My question is if we are going to log the error, why not use the same format as for the rest of the errors? I have submitted a commit with the changes, would you mind taking a look at it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right it was 503. The change looks good. I think that kind of logged error event could be used for all error cases. Doesn't necessarily have to be now.
For these error cases there are two choices:
- return single error event to be logged?
- keep that error event in Elasticsearch or drop it from the ingest pipeline?
So for now it's logged and not dropped, which is fine. I think in the future we could have some conventions for those choices.
💚 Build Succeeded
History
|
|
// 404 Not Found - Resubmit the task ID query for the same timeframe. | ||
state.with( | ||
{ | ||
"events": [{"message": {"event": {"reason": "polling"}}.encode_json()}], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right it was 503. The change looks good. I think that kind of logged error event could be used for all error cases. Doesn't necessarily have to be now.
For these error cases there are two choices:
- return single error event to be logged?
- keep that error event in Elasticsearch or drop it from the ingest pipeline?
So for now it's logged and not dropped, which is fine. I think in the future we could have some conventions for those choices.
Package checkpoint_harmony_endpoint - 0.5.0 containing this change is available at https://epr.elastic.co/package/checkpoint_harmony_endpoint/0.5.0/ |
Proposed commit message
Tip
Review commit by commit for a better understanding. The first one just propagates improvements from #12795 and #12158 to the rest of the data streams, while the second contains the new changes.
Note
All data streams have identical
cel.yml.hbs
files.Checklist
changelog.yml
file.