Skip to content

Intermittent Data Inconsistency: T-2 Day Data Missing in Range Queries but Available in Single-Date Lookups #10398

@evsrujan2002

Description

@evsrujan2002

Describe the bug:
We are observing an intermittent issue where pre-aggregated data for a specific date (consistently 2 days prior to the current date) is missing when queried as part of a date range (e.g., "Last 30 Days"). This behavior is sporadic and does not occur every day. However, when the issue is present, querying that specific date individually returns the correct data, confirming that the partition exists and is populated.

The issue appears to be specific to the current "T-2" window. Once the calendar day changes, the data for the previously problematic date returns correctly in range queries. The problem does not immediately shift to the new T-2 date; rather, the system returns to a consistent state for all days.

To Reproduce
Steps to reproduce the behavior:

  1. Configure Database & Timezone:

    • Connect Cube to a Postgres database.
    • Set CUBEJS_SCHEDULED_REFRESH_TIMEZONES to UTC.
    • Define a time dimension in the Cube schema that explicitly casts the timestamp to 'Asia/Kolkata' (IST) similar to the following SQL: e.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Asia/Kolkata' as created_at
  2. Define Pre-aggregation:

    • Add a rollup_lambda that unions with a rollup pre-aggregation to your schema (using the configuration provided below).
    • Ensure the rollup uses partitionGranularity: 'day', incremental: true, and updateWindow: '7 days'.
    • Set buildRangeEnd to SELECT CURRENT_DATE.
  3. Wait for Data Accumulation:

    • Allow the pre-aggregations to build and accumulate data over several days.
  4. Wait 2 Days:

    • Wait until the current date is Date X + 2 (e.g., if you generated data for Feb 12th, wait until Feb 14th).
  5. Run Range Query (Check Main Issue):

    • Run a query with a date range that includes the date from 2 days ago (T-2).
    • Example: Query "Last 7 Days" or "Last 30 Days".
    • Observation: Note that data for the T-2 date (e.g., Feb 12th) is intermittently missing from the results.
  6. Run Single-Date Query:

    • Immediately execute a query specifically for that T-2 date alone (e.g., dateRange: ['2026-02-12', '2026-02-12']).
    • Observation: Note that the data for that date returns correctly, confirming the partition exists and is populated.

Environment:

  • Database: Postgres
  • Timezone Configuration: CUBEJS_SCHEDULED_REFRESH_TIMEZONES=UTC
    (Note: We initially tried using Asia/Kolkata but reverted to UTC for the refresh timezone, handling conversion in the SQL definition.)
  • Dimension Definition: The time dimension is cast to the target timezone in SQL:
    e.created_at AT TIME ZONE 'UTC' AT TIME ZONE 'Asia/Kolkata' as created_at

Schema Configuration:
We are using a rollup_lambda that unions with a rollup.

preAggregations: {
  dailyByStationLambda: {
    type: rollup_lambda,
    unionWithSourceData: true,
    rollups: [CUBE.dailyByStationBatch]
  },
  dailyByStationBatch: {
    type: rollup,
    measures: [
      CUBE.count,
      CUBE.totalAmount,
    ],
    dimensions: [
      CUBE.stationId,
      CUBE.cityId,
      CUBE.clusterId
    ],
    timeDimension: CUBE.createdAt,
    granularity: day,
    partitionGranularity: day,
    buildRangeEnd: {
      // Intended to align with Timezone
      sql: SELECT CURRENT_DATE
    },
    refreshKey: {
      every: 30 20 * * *, // 8:30 PM UTC daily (corresponds to night in Asia/Kolkata)
      updateWindow: 7 days,
      incremental: true
    }
  }
}

Observed Behavior:

Query Range (Intermittent Failure): When querying a time range that includes the date T-2 days ago via v1/load, the data for that specific day is sometimes missing.

  • Example: On Feb 14th, when querying the last 30 days, the data for Feb 12th (T-2) is missing.

Query Single Date (Success): When querying the problematic date (T-2) directly, the correct value is always returned.

  • Example: On Feb 14th, when we pass the date range specifically as Feb 12th to Feb 12th, the data comes through correctly.

Self-Correction / Resolution:

  1. Calendar Change: Once the calendar date moves forward, the issue resolves completely for all observed dates.
  • Example: On Feb 15th, when querying the past 30 days, various data points (including Feb 12th) appear correctly. The system fully recovers.

2.Forced Rebuild: If we force the pre-aggregation partitions to rebuild while the issue is occurring, the missing data reappears immediately and the issue is resolved.

Expected Behavior: The pre-aggregation should consistently return results for a specific date regardless of whether it is queried as part of a range or individually.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions