Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES|QL] Return data streams and indices that have data for a query #122122

Open
dgieselaar opened this issue Feb 8, 2025 · 1 comment
Open

[ES|QL] Return data streams and indices that have data for a query #122122

dgieselaar opened this issue Feb 8, 2025 · 1 comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@dgieselaar
Copy link
Member

dgieselaar commented Feb 8, 2025

Description

What:

An API endpoint, or a command that can be appended to any query, that will return all the data streams that have data for a given query (+ DSL filter), as fast as possible.

Why:

In many places in Kibana, we have "has data" calls - requests where we only care if there's any data from a specific source in a given time frame, or with a set of filters, etc. This is often used to make UX choices, like showing an onboarding screen. We usually use terminate_after: 1 and/or timeout: 1ms which works reasonably well in this case. However, ES|QL (to my knowledge) does not have an equivalent, so we can not do this for ES|QL queries.

Another reason to do this is to determine relevance of things like queries, visualizations, rules and dashboards to a subset of the data. As an example, we can get all the panels that have data for a specific host by extracting queries from an asset (like a visualization), and combining them with a filter like { terms: { host.name: my-host } }. We can then execute this combined query, and if there is any data, we consider the asset to be relevant to the given filter (or, an entity).

It is also useful to get the actual indices or data streams that match the query: for instance, to give better autocomplete suggestions, or to tell the users what data sources are available for a given entity.

Projects that we expect to need this feature for are:

  • Streams (to suggest visualizations and dashboards to attach to a Stream)
  • RCA (to suggest visualizations, dashboards, queries etc for a given entity)

How

The ideal outcome would be to have an endpoint that takes a set of queries, and for each query, return the data streams and indices that have data for the given query. This allows us to get this data for a large amount of queries, and Elasticsearch can optimize the operation, for instance by sharing field caps calls.

An alternative would be to have a command that can be attached to any query that returns just the data sources, and not the actual result of the query. This probably feels a little weird but might be useful as a user-facing feature.

Note: this does not need to be perfect. E.g. for the following query:

FROM traces-apm*
	| STATS root_transaction_name = TOP(transaction.name, 1, "DESC"), has_slow_spans = COUNT() WHERE span.duration.us > 1000000 BY trace.id
	| WHERE has_slow_spans > 0
	| STATS BY root_transaction_name

ES needs to potentially evaluate all the data to determine if there are hits. In this case, it's fine to exit early and just assume there is data. Ideally, it would understand that for this to match, span.duration.us > 1000000 needs to be true for at least one document, but I can imagine things get very complicated at that point.

@dgieselaar dgieselaar added :Analytics/ES|QL AKA ESQL >enhancement needs:triage Requires assignment of a team area label labels Feb 8, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed needs:triage Requires assignment of a team area label labels Feb 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

2 participants