-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DRS URI localization. #148
Conversation
…racting with APIs that authenticate with Google credentials.
…e GDC localization command
@@ -376,13 +395,14 @@ def localization_command(self, dest): | |||
## GDC HTTPS URLs {{{ | |||
class HandleGDCHTTPURL(FileType): | |||
localization_mode = "url" | |||
gdc_drs_root = "drs://dg.4dfc:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will DRS URLs always have this root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GDC ones will, yes, that is the one for NCI CRDC.
self._size = int(headers["Content-Length"]) | ||
self._hash = headers["Content-MD5"] | ||
self.uri = type(self).gdc_drs_root + self.uuid | ||
self.drs_obj = HandleDRSURI(self.uri, **self.extra_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to handle the logic of checking if a drs:// URL exists for a GDC HTTP URL besides try
/catch
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the Python way! It takes advantage of the fact that a straight call to DRS will fail if it doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a little smarter and look for the 404 or 500 error return code, but 404 means the DRS isn't there, and 500 would be a server error, in both of which cases I'd think we'd want to fall back to using the API instead.
Maybe the 500 might be worth several retries, but they just tested the API recently to show it can handle 30K requests/s without falling over.
signed_url = f'$(curl -S -X POST --url "{type(self).drs_resolver}" ' + \ | ||
'-H "authorization: Bearer $(gcloud auth print-access-token)" ' + \ | ||
f'-H "content-type: application/json" --data \'{data_str}\' | ' + \ | ||
'python3 -c \'import json,sys; print(json.load(sys.stdin)["accessUrl"]["url"])\')' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol nice hack
Update to use DRS URIs, and attempt to use them on detecting GDC API URLs.
This feature takes advantage of Terra's DRSHub. So long as the user is using the same account they registered with to Terra, and has their external identities properly linked in their profile, this will enable acquiring signed URLs for data download, which is several times quicker than downloading via the GDC API.
This also improves compatibility with Terra workspaces, which may use DRS URIs in place of bucket paths.