Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mako Batch/Interactive Templates & Build Fixes #307

Merged
merged 8 commits into from
Jan 14, 2024

Conversation

henryleberre
Copy link
Member

@henryleberre henryleberre commented Jan 14, 2024

The last PR before the Benchmarking one.

Closes #292, #240, and #287.

@henryleberre henryleberre force-pushed the handshake branch 2 times, most recently from f12bd33 to b395b86 Compare January 14, 2024 01:06
docs/documentation/running.md Outdated Show resolved Hide resolved
misc/run-phoenix-release-cpu.sh Outdated Show resolved Hide resolved
toolchain/mfc/args.py Show resolved Hide resolved
toolchain/mfc/run/run.py Show resolved Hide resolved
toolchain/templates/default.mako Outdated Show resolved Hide resolved
@sbryngelson
Copy link
Member

I think this PR was going to fix the issue of not using multiple ranks with ./mfc.sh test -j X on the Phoenix CPU runner. does it?

@henryleberre
Copy link
Member Author

I think this PR was going to fix the issue of not using multiple ranks with ./mfc.sh test -j X on the Phoenix CPU runner. does it?

I had forgotten to include it. Just did. I added:

               --bind-to none                             \

So when using GT Phoenix, you just need to use -c phoenix while testing and/or running (interactive & batch) MFC.

@henryleberre henryleberre force-pushed the handshake branch 4 times, most recently from 5e71cfe to aa1db8f Compare January 14, 2024 02:09
@sbryngelson
Copy link
Member

sbryngelson commented Jan 14, 2024

so the new command to test on an mpirun system is ./mfc.sh test -j X -- --binary mpirun --gpu? this seems to be what I need when testing your PR on Phoenix.

it is also re-building even though the GLOBs are the same?

@henryleberre
Copy link
Member Author

henryleberre commented Jan 14, 2024

so the new command to test on an mpirun system is ./mfc.sh test -j X -- --binary mpirun --gpu? this seems to be what I need when testing your PR on Phoenix.

it is also re-building even though the GLOBs are the same?

For Phoenix, I would recommend just:

./mfc.sh test -j X --gpu -- -c phoenix

But this also works now:

./mfc.sh test -j X --gpu -- -b mpirun -f --bind-to none

I looked into #288 and it's not quite fixed yet. It is really quite strange. I think it has to do with CMake expecting its binary directory to be empty upon creation and thus skipping GLOB checks within that directory.

@sbryngelson
Copy link
Member

so the new command to test on an mpirun system is ./mfc.sh test -j X -- --binary mpirun --gpu? this seems to be what I need when testing your PR on Phoenix.
it is also re-building even though the GLOBs are the same?

For Phoenix, I would recommend just:

./mfc.sh test -j X --gpu -- -c phoenix

But this also works now:

./mfc.sh test -j X --gpu -- -b mpirun -f --bind-to none

I looked into #288 and it's not quite fixed yet. It is really quite strange. I think it has to do with CMake expecting its binary directory to be empty upon creation and thus skipping GLOB checks within that directory.

got it

@henryleberre
Copy link
Member Author

Part of the appeal of this PR (other than there being a template per system) is that:

  • Templates are now used for both Batch & Interactive execution. This gives us more flexibility and makes it a lot easier to understand and modify the behavior of these template files. You no longer need to parse through toolchain/mfc/run/engines.py. It doesn't exist anymore!
  • Instead of my handcrafted and regex-powered template engine, I switched to an all-around better solution, Mako.
  • A lot of indirection was removed (through the support of -- in argument parsing in many places).
  • Refactored system() in toolchain/mfc/ so that whenever we execute a command, we no longer depend on the shell. We provide a list of arguments rather a command-string. It's a lot more difficult to create a bug by introducing spaces in a command or variable name.

@sbryngelson
Copy link
Member

This looks great - glad to merge once tests pass. We can also now run CPU tests much faster.

@henryleberre henryleberre force-pushed the handshake branch 2 times, most recently from aa2eac8 to ed8dd7a Compare January 14, 2024 05:36
@henryleberre
Copy link
Member Author

@sbryngelson I just fixed the rebuild issue! Closes #288.

@henryleberre henryleberre marked this pull request as ready for review January 14, 2024 05:39
@henryleberre henryleberre changed the title Batch files per computer Mako Batch/Interactive Templates & Build Fixes Jan 14, 2024
@sbryngelson
Copy link
Member

@sbryngelson I just fixed the rebuild issue! Closes #288.

Thank goodness! It was very strange. What was it?

@sbryngelson
Copy link
Member

@henryleberre you have access to NCSA Delta and Purdue Anvil as well. Can you add appropriate batch templates for those computers?

@henryleberre
Copy link
Member Author

@sbryngelson I just fixed the rebuild issue! Closes #288.

Thank goodness! It was very strange. What was it?

There were two issues:

  • We only generated the case-optimization FPP files while building, not also while configuring. This causes a GLOB mismatch.
  • When I added their generation to the configure step, I had forgotten we also delete the binary directory before running CMake (we have good reasons for doing this). So the files were actually never there while CMake was configuring the MFC targets.

@henryleberre
Copy link
Member Author

@henryleberre you have access to NCSA Delta and Purdue Anvil as well. Can you add appropriate batch templates for those computers?

Could these be added later? It might be a good idea for users to get acquainted with this way of doing things, adding Mako files as they need them.

@sbryngelson
Copy link
Member

@henryleberre you have access to NCSA Delta and Purdue Anvil as well. Can you add appropriate batch templates for those computers?

Could these be added later? It might be a good idea for users to get acquainted with this way of doing things, adding Mako files as they need them.

Yes that's fine. Merge when appropriate.

@henryleberre henryleberre merged commit 3a36844 into MFlowCode:master Jan 14, 2024
17 checks passed
@henryleberre henryleberre deleted the handshake branch January 14, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment