Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge popularity calculations and data refresh into a single DAG (#496)
* First pass at combining popularity calc, matview refresh, and data refresh DAGs * Remove the old popularity refresh DAGs * Use the data_refresh timeout from the configuration in the DAG * Only force a single pool slot for the wait_for_data_refresh task * Refactor TaskGroups out into factories and update docs * Update docs, fix conditional * Add config option to force metrics refresh when not the first of the month * Fix log message * Remove deleted DAG tests * Use timeout from data refresh config * Document the force_refresh_metrics option * Add link to this issue in the docs for reference * Safely get options from config, fix timeout * Consider dagrun first of the month if previous runs failed * Don't refresh metrics is option is explicitly configured to False If `force_refresh_metrics` is explictly configured to False (rather than omitted), then do not refresh the popularity metrics even if this is the first successful run of the month. This option could be helpful if, for example, the first dagrun of the month succeeds during the popularity steps but fails during the data refresh. When we manually re-run the DAG, we can save time by skipping this step. * Use current dagrun start_date instead of datetime.now() * Test the month_check operator * Fix dates in tests, only clean up DagRuns associated to the test Dag * Handle case where there isn't a successful previous run * Make task names more explicit * Clean up unused param * Remove unused param in tests as well * Fix type, pull out constant * Clean up queries and operators * Add docs to tasks * Add type for media_type * Inline docs and CamelCase type * Remove MediaType type for now * Clarify flow of data through the data refresh in comments * Fix type string * Update timeouts for popularity refresh tasks
- Loading branch information