Use the batched_update
DAG with stored CSVs to update Catalog URLs
#3415
Labels
💻 aspect: code
Concerns the software code in the repository
🌟 goal: addition
Addition of new feature
🟧 priority: high
Stalls work on the project or its dependents
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Milestone
Problem
We have generated some CSVs with
identifier
and another column that we need to use to update the Catalog media table, but we don't have a way to efficiently run the media table updates.Description
The batched update DAG is reusable DAG which can be used to perform an arbitrary batched update on a Catalog media table, while handling deadlocking and timeout concerns.
During the cleanup process in data refresh, we generate the CSVs that contain the item
identifier
and the cleaned up version of another column (title
,url
,foreign_landing_url
,creator_url
and). We need a DAG that is similar to the batched update DAG, but can use a CSV table for selecting the items that need to be updated.tags
It is important that this work does not delete any tags. The tag column, while present in the CSVs, should not be used.
Additional context
The CSV files are saved in the docker container of the ingestion server when we run data refresh.
The text was updated successfully, but these errors were encountered: