Skip to content

Bug: Migrations for Vector break when using it as the xAPI source #1126

@bmtcril

Description

@bmtcril

For testing purposes we've maintained a separate set of migrations for the xapi_events_all table in Vector's openedx ClickHouse database which allows us to run both the Celery / event bus pipeline and Vector pipeline at the same time and compare the data, and allows Vector permissions to be limited to just that database. However, in order to use the Vector table as the source of truth for the rest of Aspects we need to change the ASPECTS_XAPI_DATABASE setting from it's default of xapi to openedx.

When we do this, the xapi_events_all migrations all try to operate on the same table in openedx causing collisions and errors (specifically in migration 0011). This has rendered the Vector configuration unusable for some time.

When migrations run to completion using the default settings, the xapi_events_all tables in openedx and xapi are structurally identical.

I can see 3 potential ways forward:

1. Consolidate xAPI in one table

  • Drop the openedx version of the tables and have all paths write to a single table in xapi.
  • This simplifies migrations and configuration at the cost of not being able to run both pipelines at the same time. It also means that we're always writing xAPI to the database with the name that makes sense for that, and we won't have extra, probably empty, versions of that table lying around.
  • We would also need to grant write access to Vector there.

2. Separate config variable for dbt

  • Right now dbt's use of the ASPECTS_XAPI_DATABASE setting to determine where to look for events is what's preventing us from simply creating the table in the right place.
  • Creating a new variable that dbt can use to choose between the source tables here would allow us to continue to create both tables and therefore load the same data into both and test, while untangling the migrations... at the cost of even more configuration.
  • There is also an issue with dbt always using the ASPECTS_VECTOR_RAW_XAPI_TABLE setting to find the table name, which may not always be correct, that we could try to fix as part of this as well.

3. Branch in migrations

  • We could skip the Vector migrations if ASPECTS_XAPI_DATABASE == ASPECTS_VECTOR_DATABASE, which would have the effect of creating xapi_events_all only in the openedx database.
  • This would allow migrations to run at the cost of not being able to run both pipelines simultaneously.
  • Additionally, switching configurations to even try to use the Celery/event bus pipelines would result in a broken schema with no recourse.

I'm leaning towards 2 since it has the fewest moving pieces and simplifies the testing story without having to muck with migrations, but I think 1 is viable and more intuitive. @saraburns1 what do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions