Late materialization of dimension fields in time-series #135961

dnhatn · 2025-10-03T17:47:27Z

This change adds an optimization rule for time-series queries that moves reading dimension fields from before the time-series operator to after, reading each dimension field once per group. This is possible because dimension field values for _tsid are identical across all documents in the same time-series.

For example:

TS .. | STATS sum(rate(r1)), sum(rate(r2)) BY cluster, host, tbucket(1m)

Without this rule:

TS ..
| EXTRACT_FIELDS(r1, r2, cluster, host)
| STATS rate(r1), rate(r2), VALUES(cluster), VALUES(host) BY _tsid, tbucket(1m)
| ...

With this rule:

TS ..
| EXTRACT_FIELDS(r1, r2)
| STATS rate(r1), rate(r2), FIRST_DOC_ID(_doc) BY _tsid, tbucket(1m)
| EXTRACT_FIELDS(cluster, host)
| ...

Ideally, dimension fields should be read once per _tsid in the final result, similar to the fetch phase. Currently, dimension fields are read once per group key in each pipeline; if there are multiple time buckets, dimensions for the same _tsid are read multiple times. This can be avoided by extending ValuesSourceReaderOperator to understand the ordinals of _tsid. I will follow up with this improvement later to keep this PR small.

kkrik-es · 2025-10-04T11:48:18Z

...rc/main/java/org/elasticsearch/compute/aggregation/FirstDocIdGroupingAggregatorFunction.java

+        segments.set(groupId, docVector.segments().getInt(valuePosition));
+        docIds = bigArrays.grow(docIds, groupId + 1);
+        docIds.set(groupId, docVector.docs().getInt(valuePosition));
+        contextRefs.computeIfAbsent(shard, s -> {


Nit: Felix noticed separately that this is too slow, mind replacing with a check and insert?

sure, I pushed e333cdc

kkrik-es · 2025-10-04T11:52:51Z

...rc/main/java/org/elasticsearch/compute/aggregation/FirstDocIdGroupingAggregatorFunction.java

+                try {
+                    blocks[offset] = new DocVector(shardRefs::get, shardVector, segmentVector, docVector, null).asBlock();
+                } catch (Exception e) {
+                    throw e;


What do we get by catching and rethrowing?

sorry, leftover

kkrik-es

So awesome.

I assume this doesn't apply if there are functions applied to dimensions in grouping, e.g.

TS metrics | STATS sum(rate(reqs)) BY substr(host, 3)

Late materialization of dimension fields in time-series

38893ef

elasticsearchmachine added the v9.3.0 label Oct 3, 2025

dnhatn requested review from kkrik-es and martijnvg October 3, 2025 18:07

Fix shard ref

643c574

kkrik-es reviewed Oct 4, 2025

View reviewed changes

kkrik-es approved these changes Oct 4, 2025

View reviewed changes

dnhatn added 2 commits October 4, 2025 07:46

check then put

e333cdc

Merge remote-tracking branch 'elastic/main' into extract-dimensions

4960953

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Late materialization of dimension fields in time-series #135961

Late materialization of dimension fields in time-series #135961

dnhatn commented Oct 3, 2025 •

edited

Loading

Uh oh!

kkrik-es Oct 4, 2025

Uh oh!

dnhatn Oct 4, 2025

Uh oh!

kkrik-es Oct 4, 2025

Uh oh!

dnhatn Oct 4, 2025

Uh oh!

kkrik-es left a comment

Uh oh!

Uh oh!

Late materialization of dimension fields in time-series #135961

Are you sure you want to change the base?

Late materialization of dimension fields in time-series #135961

Conversation

dnhatn commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkrik-es Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

dnhatn Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

kkrik-es left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnhatn commented Oct 3, 2025 •

edited

Loading