Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Response Ops][Task Manager] Propagate msearch error status code so backpressure mechanism responds correctly #197501

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ymao1
Copy link
Contributor

@ymao1 ymao1 commented Oct 23, 2024

Resolves https://github.com/elastic/response-ops-team/issues/240

Summary

Creating an MsearchError class that preserves the status code from any msearch errors. These errors are already piped to the managed configuration observable that watches for and responds to ES errors from the update by query claim strategy so I updated that filter to filter for msearch 429 and 503 errors as well.

To Verify

  1. Make sure you're using the mget claim strategy (xpack.task_manager.claim_strategy: 'mget') and start ES and Kibana.
  2. Inject a 429 error into an msearch response.
--- a/x-pack/plugins/task_manager/server/task_store.ts
+++ b/x-pack/plugins/task_manager/server/task_store.ts
@@ -571,6 +571,8 @@ export class TaskStore {
     });
     const { responses } = result;

+    responses[0].status = 429;
+
     const versionMap = this.createVersionMap([]);
  1. See task manager log the msearch errors and eventually reduce polling capacity
[2024-10-23T15:35:59.255-04:00][ERROR][plugins.taskManager] Failed to poll for work: Unexpected status code from taskStore::msearch: 429
[2024-10-23T15:35:59.756-04:00][ERROR][plugins.taskManager] Failed to poll for work: Unexpected status code from taskStore::msearch: 429
[2024-10-23T15:36:00.257-04:00][ERROR][plugins.taskManager] Failed to poll for work: Unexpected status code from taskStore::msearch: 429
[2024-10-23T15:36:00.757-04:00][ERROR][plugins.taskManager] Failed to poll for work: Unexpected status code from taskStore::msearch: 429
...

[2024-10-23T15:36:06.267-04:00][WARN ][plugins.taskManager] Poll interval configuration is temporarily increased after Elasticsearch returned 19 "too many request" and/or "execute [inline] script" error(s).
[2024-10-23T15:36:06.268-04:00][WARN ][plugins.taskManager] Capacity configuration is temporarily reduced after Elasticsearch returned 19 "too many request" and/or "execute [inline] script" error(s).

@ymao1 ymao1 changed the title Tm msearch errors [Response Ops][Task Manager] Propagate msearch error status code so backpressure mechanism responds correctly Oct 23, 2024
@ymao1 ymao1 self-assigned this Oct 23, 2024
@ymao1 ymao1 added release_note:skip Skip the PR/issue when compiling release notes Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v9.0.0 backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) v8.17.0 labels Oct 23, 2024
@ymao1 ymao1 marked this pull request as ready for review October 23, 2024 19:39
@ymao1 ymao1 requested a review from a team as a code owner October 23, 2024 19:39
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@ymao1
Copy link
Contributor Author

ymao1 commented Oct 24, 2024

@elasticmachine merge upstream

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

cc @ymao1

Copy link
Contributor

@mikecote mikecote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM! Left a few nits but testing locally worked as expected!

);

try {
await store.msearch([{}]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit: not sure if we need this to ensure the catch path goes through.

Suggested change
await store.msearch([{}]);
await store.msearch([{}]);
throw new Error('should have thrown');

@@ -574,8 +575,11 @@ export class TaskStore {
let allTasks = new Array<ConcreteTaskInstance>();

for (const response of responses) {
if (response.status !== 200) {
const err = new Error(`Unexpected status code from taskStore::msearch: ${response.status}`);
if (response.status && response.status !== 200) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It's interesting that response.status could be null. If ever that occurs, should we consider it an unexpected status code as well?

@mikecote
Copy link
Contributor

I added the v8.16.0 label so we can make this part of that release as well when we have clusters opting in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Feature:Task Manager release_note:skip Skip the PR/issue when compiling release notes Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v8.16.0 v8.17.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants