Conversation
| return wm.min, wm.max, true | ||
| } | ||
|
|
||
| mn, mx, err := e.fetchTimestamps(ctx, rollup.Database, rollup.DatabaseSchema, rollup.Table) |
There was a problem hiding this comment.
Instead of directly querying for watermarks here, I explored using metrics_time_range resolver approach so that we can rely on user defined cache_key_ttl and cache_key_sql to not fetch watermark until needed. It also helps with automatically invalidating cache if rollups are Rill managed as the resolver cache key will rely on metrics view status updated on.
However, the only issue I see is for external olap (majority of cases), mv cache is disabled by default so resolver ends up querying watermarks for all eligible rollups for every single query (contrast with current behaviour where time range is only queried once for time picker). In this case I was thinking of an L1 cache having simple time based ttl of lets say 1 or 5 minutes in this file and only if that expires then call metrics_time_range resolver. I already have the changes locally if needed. Thoughts?
There was a problem hiding this comment.
This sounds alright to me. I'm eager that we get the time range caching as simple/standalone/re-usable as possible (and I'm worried about the implementation diverging too much from that in Timestamps / BindQuery). See my comments below and also in Slack.
There was a problem hiding this comment.
Actually I already pushed this change
| // IANA timezone the rollup was aggregated in; defaults to UTC | ||
| string timezone = 9; |
There was a problem hiding this comment.
nits:
- most other places, we call it
time_zone, nottimezone - move it up before/after the
time_grainfield to group time-related fields
| syntax = "proto3"; | ||
| package rill.runtime.v1; | ||
|
|
||
| // note - if adding new grain, also update it in executor_rewrite_rollup.go and rollup.go |
There was a problem hiding this comment.
There are many other places in the code than these that also needs to be updated if a new grain is added. Not sure if it's worth calling out all the exact files, I think it's implicit that if you refacto an enum, you have to check all the code that uses it.
runtime/parser/parse_metrics_view.go
Outdated
| Database string `yaml:"database"` | ||
| DatabaseSchema string `yaml:"database_schema"` | ||
| TimeGrain string `yaml:"time_grain"` | ||
| Timezone string `yaml:"timezone"` |
There was a problem hiding this comment.
nit: time_zone instead of timezone for consistency with other props
| Dimensions *FieldSelectorYAML `yaml:"dimensions"` | ||
| Measures *FieldSelectorYAML `yaml:"measures"` | ||
| } `yaml:"rollups"` | ||
| WatermarkCacheTTL string `yaml:"watermark_cache_ttl"` |
There was a problem hiding this comment.
Since we have a cache: key, it feels a little weird for this property not to be part of that. I understand it's different, just feels a little weird, in case you have any better ideas.
| // Check time dimension column exists | ||
| if mv.TimeDimension != "" { | ||
| if !cols[strings.ToLower(mv.TimeDimension)] { | ||
| res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: time dimension column %q not found in table %q", i, mv.TimeDimension, rollup.Table)) | ||
| } | ||
| } | ||
|
|
||
| // Check dimension columns exist | ||
| for _, dim := range rollup.Dimensions { |
There was a problem hiding this comment.
- Since we add the default time dimension to the list of dimensions in the spec, doesn't the normal dimension check cover the time dimension as well?
- What about time dimensions that use a custom expression?
| // Check dimension columns exist | ||
| for _, dim := range rollup.Dimensions { | ||
| colName := dim | ||
| for _, d := range mv.Dimensions { | ||
| if strings.EqualFold(d.Name, dim) { | ||
| if d.Column != "" { | ||
| colName = d.Column | ||
| } | ||
| break | ||
| } | ||
| } | ||
| if !cols[strings.ToLower(colName)] { | ||
| res.OtherErrs = append(res.OtherErrs, fmt.Errorf("rollup[%d]: dimension column %q not found in table %q", i, colName, rollup.Table)) | ||
| } | ||
| } |
There was a problem hiding this comment.
What about dimensions that use expressions instead of fixed column name?
| if len(measureExprs) > 0 { | ||
| query := fmt.Sprintf( | ||
| "SELECT 1, %s FROM %s GROUP BY 1", |
There was a problem hiding this comment.
Did you consider refactoring/re-using validateAllDimensionsAndMeasures and validateIndividualDimensionsAndMeasures to enable checking rollup tables as well?
Seems like it might be possible by passing a different table name to those functions, and passing an optional dimension/measure selector.
It would be nice to not have validation logic duplicated/diverge.
| type watermarkEntry struct { | ||
| min time.Time | ||
| max time.Time | ||
| fetchedAt time.Time | ||
| } |
There was a problem hiding this comment.
I find the word "watermark" confusing here since it's actually a time range / timestamp set. In other places, the watermark is a single timestamp that defaults to MAX(<time dimension>), not a range.
In other places in this package, we call it "timestamps" (see Executor.Timestamps and metricsview.TimestampsResult).
| var watermarkCache = struct { | ||
| mu sync.Mutex | ||
| items map[string]watermarkEntry | ||
| }{items: make(map[string]watermarkEntry)} |
There was a problem hiding this comment.
I would prefer we avoid having a global variable that caches data like this. The entire executor package is currently stateless, which is a very nice guarantee.
We have had a similar problem of needing to cache time ranges previously, which we solved with caching outside the package and optional binding – see calls to BindQuery for an example. Maybe something similar can be applied here?
It's also worth considering if/how this could be leveraged in the Timestamps function to ensure a consistent treatment of time ranges across the package.
There was a problem hiding this comment.
Agree and this is entirely removed in favor of using resolver and thus uses the global resolver cache.
Checklist: