Skip to content

Conversation

@scheibelp
Copy link
Collaborator

Closes #485

Say you've created several experiments in several Benchpark/Ramble workspaces:

benchpark system init --dest elcap llnl-elcapitan cluster=elcapitan
benchpark experiment init --dest amg2023 amg2023+rocm
benchpark experiment init --dest kripke kripke+rocm
...
bin/benchpark setup amg2023/ elcap/ workspace1/
bin/benchpark setup kripke/ elcap/ workspace2/
...
ramble --workspace-dir `pwd`/workspace1/amg2023/elcap/workspace workspace setup

(i.e. you've run ramble workspace setup several times).

This adds a script called aggregate.py, which you run like:

$ python aggregate.py groups workspace1/ workspace2/
# workspace1 and workspace2 are benchpark workspaces
# "groups" is a pathway to a new directory this command will create

this finds all execute_experiment scripts in each workspace, and partitions them based on their batch requests: all execute_experiment scripts with the same batch allocation are placed together in a single script inside of the specified groups directory like:

$ head groups/*
==> groups/0.sh <==
# flux: -N 2
.../workspace/amg2023/elcap/workspace1/experiments/amg2023/problem1/amg2023_problem1_single_node_rocm_caliper_none_2_2_2_80_80_80_8/execute_experiment

==> groups/1.sh <==
# flux: -N 1
.../workspace/kripke/elcap/workspace2/experiments/kripke/kripke/kripke_kripke_single_node_rocm_caliper_none_64_1_128_128_4_2_2_1_64_64_32_4/execute_experiment
/a/path/to/some/other/execute_experiment

==> groups/2.sh <==
# flux: -N 8
.../workspace/lammps/elcap/workspace1/experiments/lammps/hns-reaxff/lammps_hns-reaxff_single_node_rocm_20_40_32_8_8_64/execute_experiment

@scheibelp scheibelp marked this pull request as draft June 30, 2025 20:22
@scheibelp scheibelp requested a review from pearce8 June 30, 2025 20:37
@michaelmckinsey1
Copy link
Collaborator

@scheibelp Could you do python aggregate.py groups workspace1/ workspace1/ to run the same experiment in the same allocation like 2 trials?

@scheibelp
Copy link
Collaborator Author

Could you do python aggregate.py groups workspace1/ workspace1/ to run the same experiment in the same allocation like 2 trials?

It will do that (although that wasn't my original intent), but if you rerun the exact same experiment I think it keeps dumping its output (e.g. data including FOM) to the same file, so results from all but the last run would be lost. The experiment template could be rewritten to potentially distinguish output for successive runs.

If workspace1 and workspace2 contain the same experiment then aggregate.py would run both instances and the results would be distinct (e.g. if you had run benchpark setup systemx experimenty workspace1; benchpark setup systemx experimenty workspace2).

@pearce8
Copy link
Collaborator

pearce8 commented Jul 28, 2025

@scheibelp we also need this described in the docs.

@pearce8 pearce8 added the changes requested Changes requested label Jul 28, 2025
@michaelmckinsey1
Copy link
Collaborator

superseded by #1036

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes requested Changes requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-experiment jobs

3 participants