[PLAT-440] roll up routing in metrics view by pjain1 · Pull Request #9180 · rilldata/rill

pjain1 · 2026-04-03T13:15:16Z

Add rollup table config to metrics view proto and YAML parser
Implement query routing: eligible rollups are selected based on grain derivability, dimension/measure coverage, timezone match, time range alignment, and time coverage
Prefer coarsest grain among eligible rollups; break ties by smallest data range
For no-time-range queries ("all data"), verify the rollup covers the base table's full range rather than skipping coverage checks

Checklist:

Covered by tests
Ran it and it works as intended
Reviewed the diff before requesting a review
Checked for unhandled edge cases
Linked the issues it closes
Checked if the docs need to be updated. If so, create a separate Linear DOCS issue
Intend to cherry-pick into the release branch
I'm proud of this work!

pjain1 · 2026-04-06T10:46:35Z

runtime/metricsview/executor/watermark_cache.go

+		return wm.min, wm.max, true
+	}
+
+	mn, mx, err := e.fetchTimestamps(ctx, rollup.Database, rollup.DatabaseSchema, rollup.Table)


Instead of directly querying for watermarks here, I explored using metrics_time_range resolver approach so that we can rely on user defined cache_key_ttl and cache_key_sql to not fetch watermark until needed. It also helps with automatically invalidating cache if rollups are Rill managed as the resolver cache key will rely on metrics view status updated on.

However, the only issue I see is for external olap (majority of cases), mv cache is disabled by default so resolver ends up querying watermarks for all eligible rollups for every single query (contrast with current behaviour where time range is only queried once for time picker). In this case I was thinking of an L1 cache having simple time based ttl of lets say 1 or 5 minutes in this file and only if that expires then call metrics_time_range resolver. I already have the changes locally if needed. Thoughts?

This sounds alright to me. I'm eager that we get the time range caching as simple/standalone/re-usable as possible (and I'm worried about the implementation diverging too much from that in Timestamps / BindQuery). See my comments below and also in Slack.

Actually I already pushed this change

begelundmuller · 2026-04-07T13:23:14Z

proto/rill/runtime/v1/resources.proto

+    // IANA timezone the rollup was aggregated in; defaults to UTC
+    string timezone = 9;


nits:

most other places, we call it time_zone, not timezone

move it up before/after the time_grain field to group time-related fields

proto/rill/runtime/v1/resources.proto

begelundmuller · 2026-04-07T13:29:14Z

proto/rill/runtime/v1/time_grain.proto

 syntax = "proto3";
 package rill.runtime.v1;

+// note - if adding new grain, also update it in executor_rewrite_rollup.go and rollup.go


There are many other places in the code than these that also needs to be updated if a new grain is added. Not sure if it's worth calling out all the exact files, I think it's implicit that if you refacto an enum, you have to check all the code that uses it.

begelundmuller · 2026-04-07T13:30:17Z

runtime/parser/parse_metrics_view.go

+		Database          string             `yaml:"database"`
+		DatabaseSchema    string             `yaml:"database_schema"`
+		TimeGrain         string             `yaml:"time_grain"`
+		Timezone          string             `yaml:"timezone"`


nit: time_zone instead of timezone for consistency with other props

begelundmuller · 2026-04-07T14:24:43Z

runtime/parser/parse_metrics_view.go

+		Dimensions        *FieldSelectorYAML `yaml:"dimensions"`
+		Measures          *FieldSelectorYAML `yaml:"measures"`
+	} `yaml:"rollups"`
+	WatermarkCacheTTL string `yaml:"watermark_cache_ttl"`


Since we have a cache: key, it feels a little weird for this property not to be part of that. I understand it's different, just feels a little weird, in case you have any better ideas.

begelundmuller · 2026-04-07T15:56:57Z

runtime/metricsview/executor/executor_validate.go

+		// Check time dimension column exists
+		if mv.TimeDimension != "" {
+			if !cols[strings.ToLower(mv.TimeDimension)] {
+				res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: time dimension column %q not found in table %q", i, mv.TimeDimension, rollup.Table))
+			}
+		}
+
+		// Check dimension columns exist
+		for _, dim := range rollup.Dimensions {


Since we add the default time dimension to the list of dimensions in the spec, doesn't the normal dimension check cover the time dimension as well?

What about time dimensions that use a custom expression?

begelundmuller · 2026-04-07T15:57:16Z

runtime/metricsview/executor/executor_validate.go

+		// Check dimension columns exist
+		for _, dim := range rollup.Dimensions {
+			colName := dim
+			for _, d := range mv.Dimensions {
+				if strings.EqualFold(d.Name, dim) {
+					if d.Column != "" {
+						colName = d.Column
+					}
+					break
+				}
+			}
+			if !cols[strings.ToLower(colName)] {
+				res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: dimension column %q not found in table %q", i, colName, rollup.Table))
+			}
+		}


What about dimensions that use expressions instead of fixed column name?

begelundmuller · 2026-04-07T15:59:04Z

runtime/metricsview/executor/executor_validate.go

+		if len(measureExprs) > 0 {
+			query := fmt.Sprintf(
+				"SELECT 1, %s FROM %s GROUP BY 1",


Did you consider refactoring/re-using validateAllDimensionsAndMeasures and validateIndividualDimensionsAndMeasures to enable checking rollup tables as well?

Seems like it might be possible by passing a different table name to those functions, and passing an optional dimension/measure selector.

It would be nice to not have validation logic duplicated/diverge.

begelundmuller · 2026-04-07T16:02:22Z

runtime/metricsview/executor/watermark_cache.go

+type watermarkEntry struct {
+	min       time.Time
+	max       time.Time
+	fetchedAt time.Time
+}


I find the word "watermark" confusing here since it's actually a time range / timestamp set. In other places, the watermark is a single timestamp that defaults to MAX(<time dimension>), not a range.

In other places in this package, we call it "timestamps" (see Executor.Timestamps and metricsview.TimestampsResult).

Hmm ok can use that.

begelundmuller · 2026-04-07T16:05:49Z

runtime/metricsview/executor/watermark_cache.go

+var watermarkCache = struct {
+	mu    sync.Mutex
+	items map[string]watermarkEntry
+}{items: make(map[string]watermarkEntry)}


I would prefer we avoid having a global variable that caches data like this. The entire executor package is currently stateless, which is a very nice guarantee.

We have had a similar problem of needing to cache time ranges previously, which we solved with caching outside the package and optional binding – see calls to BindQuery for an example. Maybe something similar can be applied here?

It's also worth considering if/how this could be leveraged in the Timestamps function to ensure a consistent treatment of time ranges across the package.

Agree and this is entirely removed in favor of using resolver and thus uses the global resolver cache.

pjain1 added 2 commits April 3, 2026 17:46

roll up routing in metrics view

f05fa72

Merge branch 'main' into rollup_mv

818fa6f

pjain1 closed this Apr 3, 2026

pjain1 reopened this Apr 3, 2026

pjain1 added 2 commits April 4, 2026 16:06

go fmt

219db2b

go fmt

deda429

pjain1 changed the title ~~roll up routing in metrics view~~ [PLAT-440] roll up routing in metrics view Apr 5, 2026

pjain1 added 4 commits April 5, 2026 19:33

docs

cfe26fa

reference docs

757b289

Merge branch 'main' into rollup_mv

48d125a

more self review

ed0c6b5

pjain1 requested a review from begelundmuller April 6, 2026 05:32

pjain1 commented Apr 6, 2026

View reviewed changes

pjain1 added 4 commits April 7, 2026 10:20

metrics time range for watermarks

5f63487

more tests

e3e1374

self review

f85d667

schema

dbc9a21

begelundmuller requested changes Apr 7, 2026

View reviewed changes

		// IANA timezone the rollup was aggregated in; defaults to UTC
		string timezone = 9;

Conversation

pjain1 commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pjain1 Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pjain1 Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pjain1 commented Apr 3, 2026 •

edited

Loading

pjain1 Apr 6, 2026 •

edited

Loading

pjain1 Apr 7, 2026 •

edited

Loading