Configure generic_https_download for non-preemptible vms#742
Configure generic_https_download for non-preemptible vms#742amyminiter merged 5 commits intomainfrom
Conversation
|
@EddieLF especially want feedback on the name of the new My question is whether the naming of this option will make sense to us in a week's time. |
scripts/generic_https_transfer.py
Outdated
| for idx, url in enumerate(presigned_urls): | ||
| filename = names[idx] if names else os.path.basename(url).split('?')[0] | ||
| j = batch.new_job(f'URL {idx} ({filename})') | ||
| j.spot(is_spot=non_preemptible_vm) |
There was a problem hiding this comment.
“Spot” means the same as “preemptible”… so with the variable being true for a non-preemptible VM this would need to be is_spot=not non_preemptible_vm.
|
FYI analysis-runner has a similar option, like this: and is the same as This is implemented as a flag option named |
Drive-by: I also think that it's good style to frame options in the positive, which would avoid --no-non-preemptible-vm being a thing. |
|
@jmarshall Fixed to use env_config and removed @folded agreed! |
scripts/generic_https_transfer.py
Outdated
| output_prefix = env_config['workflow']['output_prefix'] | ||
| preemptible_vm = env_config['workflow'].get('preemptible_vm', False) | ||
|
|
||
| assert all({billing_project, cpg_driver_image, dataset, output_prefix}) |
There was a problem hiding this comment.
Now open question @jmarshall and @folded
How do we evaluate the truthiness of a variable which is likely to be False the majority of the time? Is it a good alternative to use the get method (like I have done) with False as the default value to ensure that this variable is populated?
There was a problem hiding this comment.
Yes, get like that with the default is a good approach, and you'll see it used in a few places in e.g. server/ar.py.
Do you want the default when preemptible_vm is absent from the config to be preemptible spot VMs like this script has previously used, or do you want to change the default to be non-preemptible?
There was a problem hiding this comment.
We want default to be pre-emptible (i.e. preemptible_vm = true) because this is the common use case across the CPG data team.
There was a problem hiding this comment.
Makes sense. This means you want the default to be ….get('preemptible_vm', True).
(The natural state of a job is job.spot(True). On a newly created job, only job.spot(False) actually has an effect in changing the state, to non-preemptible. Me, I find this very confusing and have to look it up every time!)
|
@EddieLF this is good to go (pending my question on Slack RE the security checks). With @jmarshall , changed the default variable for the preemptible machines to make this |
This PR adds a
non-preemptible-vmflag forgeneric_https_upload.pyto mitigate against jobs being preempted in Hail batch.This change is required to support download of 10x long read uBAMs (~400GB each), which in a previous attempt to download were preempted several times before the job was cancelled.
Changes
click.option--non-preemptible-vmto allow the user to toggle between preemptible and non-preemptible machines. This is automatically set toFalse.