Skip to content

Commit

Permalink
Ensure done files cleared out in case of retry
Browse files Browse the repository at this point in the history
  • Loading branch information
jmelot committed Jan 23, 2024
1 parent a4436c8 commit ec804ef
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 0 deletions.
1 change: 1 addition & 0 deletions utils/run_ids_scripts.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
cd /mnt/disks/data/run
gsutil rm gs://airflow-data-exchange/article_linkage/tmp/done_files/ids_are_done
python3 create_merge_ids.py --match_dir usable_ids --prev_id_mapping_dir prev_id_mapping --merge_file id_mapping.jsonl --current_ids_dir article_pairs
/snap/bin/gsutil -m cp id_mapping.jsonl gs://airflow-data-exchange/article_linkage/tmp/
/snap/bin/gsutil -m cp simhash_results/* gs://airflow-data-exchange/article_linkage/simhash_results/
Expand Down
1 change: 1 addition & 0 deletions utils/run_simhash_scripts.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
cd /mnt/disks/data/run
gsutil rm gs://airflow-data-exchange/article_linkage/tmp/done_files/simhash_is_done
python3 run_simhash.py simhash_input simhash_results --simhash_indexes simhash_indexes --new_simhash_indexes new_simhash_indexes
cp -r article_pairs usable_ids
cp simhash_results/* article_pairs/
Expand Down

0 comments on commit ec804ef

Please sign in to comment.