Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 3, 2025

This change adds an optimization rule for time-series queries that moves reading dimension fields from before the time-series operator to after, reading each dimension field once per group. This is possible because dimension field values for _tsid are identical across all documents in the same time-series.

For example:

TS .. | STATS sum(rate(r1)), sum(rate(r2)) BY cluster, host, tbucket(1m)

Without this rule:

TS ..
| EXTRACT_FIELDS(r1, r2, cluster, host)
| STATS rate(r1), rate(r2), VALUES(cluster), VALUES(host) BY _tsid, tbucket(1m)
| ...

With this rule:

TS ..
| EXTRACT_FIELDS(r1, r2)
| STATS rate(r1), rate(r2), FIRST_DOC_ID(_doc) BY _tsid, tbucket(1m)
| EXTRACT_FIELDS(cluster, host)
| ...

Ideally, dimension fields should be read once per _tsid in the final result, similar to the fetch phase. Currently, dimension fields are read once per group key in each pipeline; if there are multiple time buckets, dimensions for the same _tsid are read multiple times. This can be avoided by extending ValuesSourceReaderOperator to understand the ordinals of _tsid. I will follow up with this improvement later to keep this PR small.

segments.set(groupId, docVector.segments().getInt(valuePosition));
docIds = bigArrays.grow(docIds, groupId + 1);
docIds.set(groupId, docVector.docs().getInt(valuePosition));
contextRefs.computeIfAbsent(shard, s -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Felix noticed separately that this is too slow, mind replacing with a check and insert?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I pushed e333cdc

try {
blocks[offset] = new DocVector(shardRefs::get, shardVector, segmentVector, docVector, null).asBlock();
} catch (Exception e) {
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we get by catching and rethrowing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, leftover

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So awesome.

I assume this doesn't apply if there are functions applied to dimensions in grouping, e.g.

TS metrics | STATS sum(rate(reqs)) BY substr(host, 3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants