[ES|QL] Return data streams and indices that have data for a query #122122
Labels
:Analytics/ES|QL
AKA ESQL
>enhancement
Team:Analytics
Meta label for analytical engine team (ESQL/Aggs/Geo)
Description
What:
An API endpoint, or a command that can be appended to any query, that will return all the data streams that have data for a given query (+ DSL filter), as fast as possible.
Why:
In many places in Kibana, we have "has data" calls - requests where we only care if there's any data from a specific source in a given time frame, or with a set of filters, etc. This is often used to make UX choices, like showing an onboarding screen. We usually use
terminate_after: 1
and/ortimeout: 1ms
which works reasonably well in this case. However, ES|QL (to my knowledge) does not have an equivalent, so we can not do this for ES|QL queries.Another reason to do this is to determine relevance of things like queries, visualizations, rules and dashboards to a subset of the data. As an example, we can get all the panels that have data for a specific host by extracting queries from an asset (like a visualization), and combining them with a filter like
{ terms: { host.name: my-host } }
. We can then execute this combined query, and if there is any data, we consider the asset to be relevant to the given filter (or, an entity).It is also useful to get the actual indices or data streams that match the query: for instance, to give better autocomplete suggestions, or to tell the users what data sources are available for a given entity.
Projects that we expect to need this feature for are:
How
The ideal outcome would be to have an endpoint that takes a set of queries, and for each query, return the data streams and indices that have data for the given query. This allows us to get this data for a large amount of queries, and Elasticsearch can optimize the operation, for instance by sharing field caps calls.
An alternative would be to have a command that can be attached to any query that returns just the data sources, and not the actual result of the query. This probably feels a little weird but might be useful as a user-facing feature.
Note: this does not need to be perfect. E.g. for the following query:
ES needs to potentially evaluate all the data to determine if there are hits. In this case, it's fine to exit early and just assume there is data. Ideally, it would understand that for this to match,
span.duration.us > 1000000
needs to be true for at least one document, but I can imagine things get very complicated at that point.The text was updated successfully, but these errors were encountered: