Skip to content

Commit

Permalink
BUG: catch another situation where invalid manifest lines were not ha…
Browse files Browse the repository at this point in the history
…ndled

added test to address this
  • Loading branch information
fedorov committed Nov 25, 2024
1 parent 8b9d192 commit 7968c36
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 1 deletion.
4 changes: 3 additions & 1 deletion idc_index/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -916,7 +916,9 @@ def _validate_update_manifest_and_get_download_size(
REGEXP_EXTRACT(manifest_cp_cmd, '(?:.*?\\/){{3}}([^\\/?#]+)', 1) AS manifest_crdc_series_uuid,
REGEXP_REPLACE(regexp_replace(manifest_cp_cmd, 'cp ', ''), '\\s[^\\s]*$', '') AS s3_url,
FROM
manifest_df )
manifest_df
WHERE
REGEXP_REPLACE(regexp_replace(manifest_cp_cmd, 'cp ', ''), '\\s[^\\s]*$', '') IS NOT NULL)
SELECT
seriesInstanceuid,
index_crdc_series_uuid,
Expand Down
1 change: 1 addition & 0 deletions tests/study_manifest_aws.s5cmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# To download the files in this manifest, first install s5cmd (https://github.com/peak/s5cmd),
# then run the following command:
# s5cmd --no-sign-request --endpoint-url https://s3.amazonaws.com run study_manifest_aws.s5cmd
study_manifest_cp_command
cp s3://idc-open-data/28621ba9-1aca-4aab-a2a1-f6d2c3e2ab19/* .
cp s3://idc-open-data/f0b76401-c6d1-4b61-a5fd-3fa596e6cc41/* .
cp s3://idc-open-data/4ea3bbe6-98da-4b92-abe6-2ee18927e3c9/* .

0 comments on commit 7968c36

Please sign in to comment.