-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the bug:
We are observing an intermittent issue where pre-aggregated data for a specific date (consistently 2 days prior to the current date) is missing when queried as part of a date range (e.g., "Last 30 Days"). This behavior is sporadic and does not occur every day. However, when the issue is present, querying that specific date individually returns the correct data, confirming that the partition exists and is populated.
The issue appears to be specific to the current "T-2" window. Once the calendar day changes, the data for the previously problematic date returns correctly in range queries. The problem does not immediately shift to the new T-2 date; rather, the system returns to a consistent state for all days.
To Reproduce
Steps to reproduce the behavior:
-
Configure Database & Timezone:
- Connect Cube to a Postgres database.
- Set
CUBEJS_SCHEDULED_REFRESH_TIMEZONEStoUTC. - Define a time dimension in the Cube schema that explicitly casts the timestamp to 'Asia/Kolkata' (IST) similar to the following SQL:
e.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Asia/Kolkata' as created_at
-
Define Pre-aggregation:
- Add a
rollup_lambdathat unions with arolluppre-aggregation to your schema (using the configuration provided below). - Ensure the
rollupusespartitionGranularity: 'day',incremental: true, andupdateWindow: '7 days'. - Set
buildRangeEndtoSELECT CURRENT_DATE.
- Add a
-
Wait for Data Accumulation:
- Allow the pre-aggregations to build and accumulate data over several days.
-
Wait 2 Days:
- Wait until the current date is Date X + 2 (e.g., if you generated data for Feb 12th, wait until Feb 14th).
-
Run Range Query (Check Main Issue):
- Run a query with a date range that includes the date from 2 days ago (T-2).
- Example: Query "Last 7 Days" or "Last 30 Days".
- Observation: Note that data for the T-2 date (e.g., Feb 12th) is intermittently missing from the results.
-
Run Single-Date Query:
- Immediately execute a query specifically for that T-2 date alone (e.g.,
dateRange: ['2026-02-12', '2026-02-12']). - Observation: Note that the data for that date returns correctly, confirming the partition exists and is populated.
- Immediately execute a query specifically for that T-2 date alone (e.g.,
Environment:
- Database: Postgres
- Timezone Configuration: CUBEJS_SCHEDULED_REFRESH_TIMEZONES=UTC
(Note: We initially tried using Asia/Kolkata but reverted to UTC for the refresh timezone, handling conversion in the SQL definition.) - Dimension Definition: The time dimension is cast to the target timezone in SQL:
e.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Asia/Kolkata' as created_at
Schema Configuration:
We are using a rollup_lambda that unions with a rollup.
preAggregations: {
dailyByStationLambda: {
type: rollup_lambda,
unionWithSourceData: true,
rollups: [CUBE.dailyByStationBatch]
},
dailyByStationBatch: {
type: rollup,
measures: [
CUBE.count,
CUBE.totalAmount,
],
dimensions: [
CUBE.stationId,
CUBE.cityId,
CUBE.clusterId
],
timeDimension: CUBE.createdAt,
granularity: day,
partitionGranularity: day,
buildRangeEnd: {
// Intended to align with Timezone
sql: SELECT CURRENT_DATE
},
refreshKey: {
every: 30 20 * * *, // 8:30 PM UTC daily (corresponds to night in Asia/Kolkata)
updateWindow: 7 days,
incremental: true
}
}
}Observed Behavior:
Query Range (Intermittent Failure): When querying a time range that includes the date T-2 days ago via v1/load, the data for that specific day is sometimes missing.
- Example: On Feb 14th, when querying the last 30 days, the data for Feb 12th (T-2) is missing.
Query Single Date (Success): When querying the problematic date (T-2) directly, the correct value is always returned.
- Example: On Feb 14th, when we pass the date range specifically as Feb 12th to Feb 12th, the data comes through correctly.
Self-Correction / Resolution:
- Calendar Change: Once the calendar date moves forward, the issue resolves completely for all observed dates.
- Example: On Feb 15th, when querying the past 30 days, various data points (including Feb 12th) appear correctly. The system fully recovers.
2.Forced Rebuild: If we force the pre-aggregation partitions to rebuild while the issue is occurring, the missing data reappears immediately and the issue is resolved.
Expected Behavior: The pre-aggregation should consistently return results for a specific date regardless of whether it is queried as part of a range or individually.