Skip to content

Add instructions how to run ingestion script from command line #4083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
13 changes: 13 additions & 0 deletions documentation/catalog/guides/adding_a_new_provider.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,19 @@ in the
as well as a corresponding test file. Complete the TODOs detailed in the
generated files to implement behavior specific to your API.

You can run the provider script directly from the command line to run the
ingestion outside of the workflow. The TSV file will be saved in your /tmp/tmp
folder.

- First make sure that the directory with the scripts is in your PYTHONPATH
```
export PYTHONPATH=<path_to_openverse>/catalog/dags
```
- Then cd into the Openverse dir and run the script as so:
```
python catalog/dags/providers/provider_api_scripts/<your_script>.py
```

Some APIs may not fit perfectly into the established `ProviderDataIngester`
pattern. For advanced use cases and examples of how to modify the ingestion
flow, see the
Expand Down