Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing test: Jest Integration Tests.src/core/server/saved_objects/migrations/integration_tests - migration from 7.13 to 7.14+ with many failed action_tasks filters out all outdated action_task_params and action tasks #118626

Closed
kibanamachine opened this issue Nov 15, 2021 · 19 comments
Assignees
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture

Comments

@kibanamachine
Copy link
Contributor

kibanamachine commented Nov 15, 2021

A test failed on a tracked branch

NoLivingConnectionsError: There are no living connections
    at KibanaTransport.request (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/node_modules/@elastic/transport/src/Transport.ts:444:17)
    at KibanaTransport.request (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/src/core/server/elasticsearch/client/configure_client.ts:55:20)
    at Cluster.putComponentTemplate (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/node_modules/@elastic/elasticsearch/src/api/api/cluster.ts:310:33)
    at putComponentTemplate (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:194:38)
    at ensureDefaultComponentTemplate (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:303:11)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at async Promise.all (index 0)
    at ensureFleetGlobalEsAssets (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/services/setup.ts:141:27)
    at createSetupSideEffects (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/services/setup.ts:76:5)
    at awaitIfPending (/opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/services/setup_utils.ts:30:20)
    at /opt/local-ssd/buildkite/builds/kb-n2-4-b6bcdd19311dafd9/elastic/kibana-hourly/kibana/x-pack/plugins/fleet/server/plugin.ts:340:9

First failure: CI Build - 8.0

@kibanamachine kibanamachine added the failed-test A test failure on a tracked branch, potentially flaky-test label Nov 15, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Nov 15, 2021
@mistic mistic added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Nov 15, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Nov 15, 2021
@kibanamachine
Copy link
Contributor Author

New failure: CI Build - 8.0

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - 8.0

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@mistic
Copy link
Member

mistic commented Nov 25, 2021

This test keeps faliling and looks flaky. Skipping for now.

main/8.1: 6730b4b
8.0: 91f06d0

@pgayvallet
Copy link
Contributor

This should have been addressed in #119236 by #119713

cc @juliaElastic / @elastic/fleet

@pgayvallet pgayvallet added Team:Fleet Team label for Observability Data Collection Fleet team Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc and removed Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Nov 26, 2021
@juliaElastic
Copy link
Contributor

The last 2 failures have a different stacktrace, no longer comes from TelemetryEventsSender.fetchClusterInfo which I fixed.
@elastic/fleet anyone has any idea which pr broke this?



FAIL  src/core/server/saved_objects/migrations/integration_tests/7_13_0_failed_action_tasks.test.ts (138.956 s)
--
  | ● migration from 7.13 to 7.14+ with many failed action_tasks › filters out all outdated action_task_params and action tasks
  |  
  | NoLivingConnectionsError: There are no living connections
  |  
  | 53 \|         opts.meta = true;
  | 54 \|       }
  | > 55 \|       return super.request(params, opts) as Promise<TransportResult<any, any>>;
  | \|                    ^
  | 56 \|     }
  | 57 \|   }
  | 58 \|
  |  
  | at KibanaTransport.request (node_modules/@elastic/transport/src/Transport.ts:444:17)
  | at KibanaTransport.request (src/core/server/elasticsearch/client/configure_client.ts:55:20)
  | at Cluster.putComponentTemplate (node_modules/@elastic/elasticsearch/src/api/api/cluster.ts:310:33)
  | at putComponentTemplate (x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:197:38)
  | at ensureDefaultComponentTemplate (x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:306:11)
  | at runMicrotasks (<anonymous>)
  | at async Promise.all (index 0)
  | at ensureFleetGlobalEsAssets (x-pack/plugins/fleet/server/services/setup.ts:141:27)
  | at createSetupSideEffects (x-pack/plugins/fleet/server/services/setup.ts:76:5)
  | at awaitIfPending (x-pack/plugins/fleet/server/services/setup_utils.ts:30:20)
  | at x-pack/plugins/fleet/server/plugin.ts:345:9


rudolf added a commit to rudolf/kibana that referenced this issue Nov 29, 2021
@juliaElastic
Copy link
Contributor

I'm seeing another occurence in a different migration test with the same stacktrace. Could it be that fleet setup takes long to finish and by that time the tests already triggered kibana stop?
https://buildkite.com/elastic/kibana-pull-request/builds/9974#24600d14-8743-47aa-90be-3349d7bb563d



Summary of all failing tests
--
  | FAIL  src/core/server/saved_objects/migrations/integration_tests/migration_from_older_v1.test.ts (96.826 s)
  |  
  |  
  | ● Test suite failed to run
  |  
  | NoLivingConnectionsError: There are no living connections
  |  
  | 53 \|         opts.meta = true;
  | 54 \|       }
  | > 55 \|       return super.request(params, opts) as Promise<TransportResult<any, any>>;
  | \|                    ^
  | 56 \|     }
  | 57 \|   }
  | 58 \|
  |  
  | at KibanaTransport.request (node_modules/@elastic/transport/src/Transport.ts:444:17)
  | at KibanaTransport.request (src/core/server/elasticsearch/client/configure_client.ts:55:20)
  | at Cluster.putComponentTemplate (node_modules/@elastic/elasticsearch/src/api/api/cluster.ts:310:33)
  | at putComponentTemplate (x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:197:38)
  | at ensureDefaultComponentTemplate (x-pack/plugins/fleet/server/services/epm/elasticsearch/template/install.ts:306:11)
  | at runMicrotasks (<anonymous>)
  | at async Promise.all (index 0)


@juliaElastic
Copy link
Contributor

juliaElastic commented Nov 30, 2021

@kpollich @joshdover I think these flaky errors are related to the change here to setupFleet in start lifecycle: #117552

I'm not sure yet how to wait in tests until setup finishes or make the setup stop immediately once the stop lifecycle method has been called (when the tests finished).

Do you have any suggestions?

@joshdover
Copy link
Contributor

joshdover commented Nov 30, 2021

It's curious this doesn't get caught by our global error handler here:

const fleetSetupPromise = (async () => {
try {
await setupFleet(
new SavedObjectsClient(core.savedObjects.createInternalRepository()),
core.elasticsearch.client.asInternalUser
);
} catch (error) {
logger.warn('Fleet setup failed');
logger.warn(error);
}
})();

I would expect that after stop gets called, all the ES calls would start failing with this exception and us to bubble up to the top-level error handler. Could we not be await-ing somewhere that we should be?

I'm curious if the changes in #118587 could also be making this worse or more apparent somehow due to the exponential backoff on retries for NoLivingConnectionsError. This could be delaying this promise chain from resolving.

I'd really like to avoid passing state around about whether or not Kibana stop has been called. I'd recommend we use the standard AbortController pattern to check when we should cancel and bail early. I think (?) the elasticsearch-js client also supports this class.

@elastic/kibana-core Feels like something that we may also want to solve at the Core level. For instance either of these changes would be helpful to us and probably improve the robustness of Kibana shutdowns:

  • Throw a different error once the client has been closed or once Kibana stop has been initiated.
  • Have a generic AbortController that aborts all Elasticsearch API calls once Kibana stop is initiated. This could be done in a wrapper level on the ElasticsearchClient, for example.

@pgayvallet
Copy link
Contributor

I would expect that after stop gets called, all the ES calls would start failing with this exception and us to bubble up to the top-level error handler.

I would expect that too. Could it be that setupFleet has another internal uncatched async block ((async () => { ... })();)?

I'd really like to avoid passing state around about whether or not Kibana stop has been called. I'd recommend we use the standard AbortController pattern to check when we should cancel and bail early

Unless there's something wrong in the way the client is handling the promises, it shouldn't be necessary. In theory, properly try/catching the requests at any level should avoid the unhandled promise rejections?

@juliaElastic
Copy link
Contributor

@pgayvallet I'm planning to dig into this more. Unfortunately this error is hard to reproduce locally, tried a few times with no luck.
I would like to reproduce first to see if any fixes actually make the error go away permanently.

@juliaElastic
Copy link
Contributor

related #120840

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - 8.0

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - 8.0

@kibanamachine
Copy link
Contributor Author

New failure: CI Build - main

@joshdover joshdover added the technical debt Improvement of the software architecture and operational architecture label Feb 8, 2022
@juliaElastic
Copy link
Contributor

closing as this failure hasn't happened for a long time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed-test A test failure on a tracked branch, potentially flaky-test Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Fleet Team label for Observability Data Collection Fleet team technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants