-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetcher DAG development #496
Conversation
for local development use file://<path to rikolti data>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted in Slack, we have to change lines 11-12 of metadata_fetcher/lambda_function.py
to be:
fetcher_module = importlib.import_module(
f".fetchers.{harvest_type}_fetcher", package="rikolti.metadata_fetcher")
in order for this to run in aws-mwaa-local-runner.
However, this causes a command line run such as the following to fail:
python3 -m metadata_fetcher.fetch_registry_collections "https://registry.cdlib.org/api/v1/rikoltife tcher/?format=json&mapper_type=tv_academy_oai_dc&ready_for_publication=true"
Same thing goes for using importlib.import_module
in metadata_mapper/lambda_function.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution to the importlib.import_module package name problem!
Argh sorry, my bad -- please disregard everything below!
A problem I ran across is that when running the fetcher_dag
in aws-mwaa-local-runner
is that the Docker container needs to write to a location inside the container, rather than the mounted location on the host disk. So in metadata_fetcher/Fetcher.py on lines 48-52, it needs to be:
def get_local_path(self):
local_path = os.sep.join([
'/usr/local/airflow/rikolti_data/vernacular_metadata',
str(self.collection_id),
])
breaking: update local env variables
The merge-base changed after approval.
The merge-base changed after approval.
This is a PR representing the work @barbarahui and I did in writing a first fetcher DAG. Since writing to s3 is not fully supported by the codebase, this work is coordinated with some updates to
ucldc/aws-mwaa-local-runner
described here: ucldc/aws-mwaa-local-runner#3 and is built on top of the DATA_DEST environment variable work here: #495This PR, in contrast to the DATA_DEST environment variable work, is exclusively DAG-specific work.
dags/operate_sample_dag.py
is a DAG written using the classic airflow PythonOperator.dags/taskflow_sample_dag.py
is a DAG written using the airflow taskflow model. This file also includes a variety of small tasks demonstrating specific, but relevant airflow features.dags/fetcher_dag.py
is a functional fetcher DAG.In order to run the fetcher DAG, you must have:
ucldc/aws-mwaa-local-runner:dags-development
branchaws-mwaa-local-runner/docker/.env
file:./mwaa-local-env build-image
dags
folder:# startup.sh FETCHER_DATA_DEST=file:///usr/local/airflow/rikolti_data MAPPER_DATA_DEST=file:///usr/local/airflow/rikolti_data MAPPER_DATA_SRC=file:///usr/local/airflow/rikolti_data
./mwaa-local-env start