Skip to content

Commit

Permalink
Merge pull request #7 from georgetown-cset/fix-multi-json
Browse files Browse the repository at this point in the history
Exclude schema
  • Loading branch information
jmelot authored Apr 25, 2024
2 parents ea3cc70 + a9d2f7f commit 07e4819
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# ror-etl
Automates ETL of [ROR](https://ror.org) via `ror_dag.py`.
Automates ETL of [ROR](https://ror.org) via `ror_dag.py`.

(CSET users) To update Airflow artifacts, run `bash push_to_airflow.sh`.
4 changes: 3 additions & 1 deletion ror_scripts/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,9 @@ def fetch(output_bucket: str, output_loc: str) -> None:
f.write(zip_resp.content)
ZipFile(zip_f).extractall(td)
print(f"Downloaded content: {os.listdir(td)}")
json_files = [js for js in os.listdir(td) if js.endswith(".json")]
json_files = [
js for js in os.listdir(td) if js.endswith(".json") and ("schema" not in js)
]
assert len(json_files) == 1
output_file = os.path.join(td, output_loc.split("/")[-1])
with open(os.path.join(td, json_files[0])) as f:
Expand Down

0 comments on commit 07e4819

Please sign in to comment.