Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to overwrite file retention level for specific executables from config #4435

Merged

Conversation

GarethCabournDavies
Copy link
Contributor

@GarethCabournDavies GarethCabournDavies commented Jul 13, 2023

Use case:

  • We are reusing a large bank / phase-time-amp files for many analyses

Problem:

  • We don't want to copy the file into every run directory as it is wasting disk space, and removing the "single source of truth" by having multiple copies of the same information

Solution:

  • Add an option in the pegasus_profile section for the executables to overwrite the retention level with the pycbc|retention_level flag, i.e.:
[pegasus_profile-bank2hdf]
pycbc|retention_level = do_not_keep
  • Add an extra retention level into the choices which is 0, and therefore less than the global_retention_threshold, so the file is not retained (this comparison is False)

Testing:

  • File is not retained in the standard offline workflow
  • File is not retained from a --cache-file during generation

@GarethCabournDavies GarethCabournDavies changed the title Draft: Add ability to overwrite file retention level for specific executables from config Add ability to overwrite file retention level for specific executables from config Jul 14, 2023
@@ -47,6 +47,15 @@
from .configuration import WorkflowConfigParser, resolve_url
from .pegasus_sites import make_catalog

# FIXME: Are these names suitably descriptive?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there are two distinct "values" here, which are somewhat confused and confusing. See also the declarations on lines 73 - 76.

There's the global retention level "This workflow will keep jobs if they are level X or higher". For the global retention level (which these options are), lower numbers mean "keep more", so the level 0 choice here doesn't really make much sense, as level 0 should be "keep everything" at the global level.

Then there's the local level (lines 73-76), which is "How important are the files produced by this Job?". In that case, higher means more likely to be stored. (From 4=always store me, to 0=never store me). I think a DO_NOT_KEEP level 0 option might belong at line 72, but not here.

Then I just need one more comment on a later line .....

cfg_ret_level = cp.get_opt_tags('pegasus_profile-%s' % name,
'pycbc|retention_level', tags)
self.update_current_retention_level(
_retention_choices[cfg_ret_level]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the local not the global retention level. It should therefore refer to the stuff in lines 73-76. Be aware that there is a "level 5" for the local retention level, which is "user needs to set this, warn them and set to 1" (see line 497, 95 and 100). This should be moved to 6 if you wanted a "NEVER_KEEP" possibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Rethinking that, as level 4 is "ALWAYS_KEEP", so "NEVER_KEEP" would be 0 and the default remains the special case at 5.

@GarethCabournDavies
Copy link
Contributor Author

I have update to use the local level rather than global - (this actually seems a lot neater)

@GarethCabournDavies GarethCabournDavies enabled auto-merge (squash) July 19, 2023 08:45
Copy link
Contributor

@titodalcanton titodalcanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems functionally ok to me. I am assuming @spxiwh's comments above have been addressed/answered. You could remove some duplication by introducing a variable

retention_opt_args = (
    f'pegasus_profile-{name}',
    'pycbc|retention_level',
    tags
)

and then passing *retention_opt_args to the two functions that need them.

@GarethCabournDavies GarethCabournDavies merged commit ba7273d into gwastro:master Aug 4, 2023
36 checks passed
@GarethCabournDavies GarethCabournDavies deleted the reuse_nocopy branch August 4, 2023 09:53
Copy link
Contributor

@maxtrevor maxtrevor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me

@maxtrevor
Copy link
Contributor

All looks good to me

Never mind, I see this was already reviewed and merged by Tito lol

PRAVEEN-mnl pushed a commit to PRAVEEN-mnl/pycbc that referenced this pull request Nov 3, 2023
…s from config (gwastro#4435)

* Add ability to overwrite file retention level for specific instances

* Allow different retention levels rather than juts not keeping

* CC

* Global and Executable retention levels are not the same

* Fix incorrect indent
bhooshan-gadre pushed a commit to bhooshan-gadre/pycbc that referenced this pull request Mar 4, 2024
…s from config (gwastro#4435)

* Add ability to overwrite file retention level for specific instances

* Allow different retention levels rather than juts not keeping

* CC

* Global and Executable retention levels are not the same

* Fix incorrect indent
acorreia61201 pushed a commit to acorreia61201/pycbc that referenced this pull request Apr 4, 2024
…s from config (gwastro#4435)

* Add ability to overwrite file retention level for specific instances

* Allow different retention levels rather than juts not keeping

* CC

* Global and Executable retention levels are not the same

* Fix incorrect indent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants