-
Notifications
You must be signed in to change notification settings - Fork 27
Description
For testing purposes we've maintained a separate set of migrations for the xapi_events_all table in Vector's openedx ClickHouse database which allows us to run both the Celery / event bus pipeline and Vector pipeline at the same time and compare the data, and allows Vector permissions to be limited to just that database. However, in order to use the Vector table as the source of truth for the rest of Aspects we need to change the ASPECTS_XAPI_DATABASE setting from it's default of xapi to openedx.
When we do this, the xapi_events_all migrations all try to operate on the same table in openedx causing collisions and errors (specifically in migration 0011). This has rendered the Vector configuration unusable for some time.
When migrations run to completion using the default settings, the xapi_events_all tables in openedx and xapi are structurally identical.
I can see 3 potential ways forward:
1. Consolidate xAPI in one table
- Drop the
openedxversion of the tables and have all paths write to a single table inxapi. - This simplifies migrations and configuration at the cost of not being able to run both pipelines at the same time. It also means that we're always writing xAPI to the database with the name that makes sense for that, and we won't have extra, probably empty, versions of that table lying around.
- We would also need to grant write access to Vector there.
2. Separate config variable for dbt
- Right now dbt's use of the
ASPECTS_XAPI_DATABASEsetting to determine where to look for events is what's preventing us from simply creating the table in the right place. - Creating a new variable that dbt can use to choose between the source tables here would allow us to continue to create both tables and therefore load the same data into both and test, while untangling the migrations... at the cost of even more configuration.
- There is also an issue with dbt always using the ASPECTS_VECTOR_RAW_XAPI_TABLE setting to find the table name, which may not always be correct, that we could try to fix as part of this as well.
3. Branch in migrations
- We could skip the Vector migrations if
ASPECTS_XAPI_DATABASE == ASPECTS_VECTOR_DATABASE, which would have the effect of creatingxapi_events_allonly in theopenedxdatabase. - This would allow migrations to run at the cost of not being able to run both pipelines simultaneously.
- Additionally, switching configurations to even try to use the Celery/event bus pipelines would result in a broken schema with no recourse.
I'm leaning towards 2 since it has the fewest moving pieces and simplifies the testing story without having to muck with migrations, but I think 1 is viable and more intuitive. @saraburns1 what do you think?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status