Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically remove parallelization files #285

Closed
callistosp opened this issue Aug 11, 2022 · 8 comments
Closed

automatically remove parallelization files #285

callistosp opened this issue Aug 11, 2022 · 8 comments
Milestone

Comments

@callistosp
Copy link

When running a NONMEM model in parallel, many folders with the prefix WK_ are produced. It would be nice if bbi automatically removed these files. There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid. I think these could also be automatically handled by bbi to prevent buildup of junk files in version control when saving a whole model directory.

@shairozan
Copy link
Contributor

Would also probably make sense to update the .gitignore file so that even if the files get left around they wouldn't be committed in a git environment

@seth127 seth127 added this to the next milestone Aug 16, 2022
@jacobleander
Copy link

Picking up this thread again. I would also prefer having these files deleted as part of the clean up of the model directory. IF there are users who would really like these files to not be deleted, could there be a global option in bbi where this can be turned on or off (default setting should then be that the files are removed). Thoughts @seth127 ?

@seth127
Copy link
Collaborator

seth127 commented Sep 22, 2023

Thanks for getting this going again @jacobleander. I think this is the right idea:

could there be a global option in bbi where this can be turned on or off (default setting should then be that the files are removed)

We currently have a --clean_lvl flag that controls our deletion of other NONMEM temp files. I would think we could use that to control this behavior as well. Any thoughts on that @kyleam ?

@seth127
Copy link
Collaborator

seth127 commented Sep 22, 2023

Also, @kyleam if we decide to do this through --clean_lvl, we should consider revisiting this old issue while we're in there: #194

@jacobleander if you or anyone else have any thoughts on that ^, feel free to comment on that issue as well. The original intention of the --clean_lvl and --copy_lvl flags was related to legacy PsN behavior that we were trying to either mimic or intentionally change. However, we've gotten a bit away from that in past few years (primarily because most bbi users are actually using it via bbr, instead of using the CLI directly). If we do revisit this, it would be useful to have any user commentary on when these flags might be useful and what the most useful behavior would be.

kyleam added a commit that referenced this issue Sep 22, 2023
Under --parallel, NONMEM creates several WK_* files.  These are worker
files created for parallelization problems and are used as file
buffers.  The NONMEM Users Guide Introduction to NONMEM 7.5.0 (pg 73)
says

  One can alternatively assess empirically whether file buffers are
  used, by beginning the run, allowing perhaps one iteration to
  transpire, then from another command window do a directory search
  for FILE*, (or WK* for worker files in parallelization problems,
  section I.72 Parallel Computing (NM72)

  If any of the FILExx do not have 0 size, then they are being
  used. Interrupt the analysis, then increase the appropriate LIM
  value with the $SIZES record [...]

So, a non-empty file is a signal that the user probably wants to make
adjustments so that everything fits into memory.  An empty file, on
the other hand, is safe for us to clean up [*].

Due to the regexp match and the extra size condition, this doesn't
work nicely as part of getCleanableFileList().  filesToCleanup()
already has special handling for some parallelization, so add the WK
handling there.

[*] NONMEM cleans up FILE* buffer files after a run.  I'm not sure why
    it doesn't also clean up WK*.

Re: #285
@kyleam
Copy link
Collaborator

kyleam commented Sep 22, 2023

I've put up a draft at gh-306. There's an issue with the CI that needs to be sorted out before considering merging that in.

@seth127:

We currently have a --clean_lvl flag that controls our deletion of other NONMEM temp files. I would think we could use that to control this behavior as well. Any thoughts on that @kyleam ?

Conceptually I think that makes sense, but there's already a spot in filesToCleanup that takes care of cleaning up some parallel file related things. It doesn't consider the clean level, so in gh-306 I decided to just follow that. I suppose we could update that whole block to be skipped when clean_level=0, but I think in practice unconditionally cleaning empty WK_ files is probably fine, at least until we take on something like the more comprehensive rework proposed in gh-194.
Let me know if you think the clean_level guard is worth doing at this point.


@callistosp:

There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid

With gh-306, I focused just on the WK_* files. What to do in terms of cleaning the .o# (should always have output, I think) and .po# (in bbi context, perhaps just has output if parallel environment setup fails, not sure) files seems less clear-cut to me. Either way, I think it makes sense to do separately from gh-306.

kyleam added a commit that referenced this issue Oct 6, 2023
Under --parallel, NONMEM creates several WK_* files.  These are worker
files created for parallelization problems and are used as file
buffers.  The NONMEM Users Guide Introduction to NONMEM 7.5.0 (pg 73)
says

  One can alternatively assess empirically whether file buffers are
  used, by beginning the run, allowing perhaps one iteration to
  transpire, then from another command window do a directory search
  for FILE*, (or WK* for worker files in parallelization problems,
  section I.72 Parallel Computing (NM72)

  If any of the FILExx do not have 0 size, then they are being
  used. Interrupt the analysis, then increase the appropriate LIM
  value with the $SIZES record [...]

So, a non-empty file is a signal that the user probably wants to make
adjustments so that everything fits into memory.  An empty file, on
the other hand, is safe for us to clean up [*].

Due to the regexp match and the extra size condition, this doesn't
work nicely as part of getCleanableFileList().  filesToCleanup()
already has special handling for some parallelization, so add the WK
handling there.

[*] NONMEM cleans up FILE* buffer files after a run.  I'm not sure why
    it doesn't also clean up WK*.

Re: #285
@kyleam
Copy link
Collaborator

kyleam commented Oct 6, 2023

With the merge of gh-306, WK_* files are automatically cleaned up. I've left this issue open due to this part, which has not been addressed:

There are also files produced after every run with the extension .o# and .po# where the # is the number assigned by the grid

As I said above, I'm not sure about what, if anything, to do with those.

I'll wait a bit for further feedback and thoughts on that, but my preference would be to close this issue and either 1) leave the .o# and .po# files be or 2) open a dedicated issue for how to handle them.

@seth127
Copy link
Collaborator

seth127 commented Oct 10, 2023

Thanks @kyleam

my preference would be to close this issue and either 1) leave the .o# and .po# files be or 2) open a dedicated issue for how to handle them.

I agree with closing this issue now, and I would lean towards just opening a fresh one to consider what to do with the .o# and .po# files. Those contain output from the process running on the grid, and deserve some more investigation and probably discussion before making any moves on cleaning them up.

@kyleam
Copy link
Collaborator

kyleam commented Oct 10, 2023

Thanks @seth127. Follow-up issue posted at gh-312.

@kyleam kyleam closed this as completed Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants