Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6e74f8d
Update schema to support new exclusive syntax for separate launcher/a…
jwhite242 Aug 13, 2025
4bee81e
Initial prototype of optional launcher/allocation args pass through h…
jwhite242 Aug 13, 2025
4ccc470
Properly apply step exclusive key as override to batch/adapter settin…
jwhite242 Aug 14, 2025
ea2d3b4
update method type on exclusive sanitizer
jwhite242 Aug 14, 2025
cd632b9
Initial documentation for new optional argument handling in batch block
jwhite242 Aug 15, 2025
1e824e1
Fix stray refactoring debris
jwhite242 Aug 26, 2025
0a79699
Fixtures and tests for verifying flux script writing correctly passes…
jwhite242 Sep 10, 2025
dfb0aea
Guard flux script writer to only run when flux is present
jwhite242 Sep 11, 2025
4e70971
Add helper for flattening dictionaries into dot syntax strings
jwhite242 Sep 19, 2025
d450bbd
Update flux script test and test spec to use new dot string util for …
jwhite242 Sep 19, 2025
af62de2
Add test for verifying live jobspecs get correct allocation and launc…
jwhite242 Sep 19, 2025
567e51c
Add funcs for working with dotpath dicts/tuples and tests
jwhite242 Oct 7, 2025
cebd217
Add tests of non scalar leaf values in dotpath machinery
jwhite242 Oct 7, 2025
20d3409
Add value coercion for dotpath utils, recursive update
jwhite242 Oct 10, 2025
c923e74
Refactor additional arg handling to deal with python api end points n…
jwhite242 Oct 10, 2025
5236603
Replace fail with assertionerror to avoid stopping entire test sessio…
jwhite242 Oct 10, 2025
def5b65
Add extra info to failed jobspec tests to show full spec in case thin…
jwhite242 Oct 10, 2025
9bcc059
Finish wiring up queue/bank and update test specs/expected outputs
jwhite242 Oct 15, 2025
93c05f0
Move INFO lines to allow in step directives, fix test data to reflect…
jwhite242 Oct 16, 2025
fb11228
Add docstrings/tests for dict utils
jwhite242 Oct 16, 2025
4e50a92
Remove deepcopy comments
jwhite242 Oct 16, 2025
d4079bd
Misc tweaks and fixes, and update batch script outputs to reflect nor…
jwhite242 Oct 16, 2025
25d7482
Rework exclusive handling for proper layering of steps ontop of batch…
jwhite242 Oct 22, 2025
74c4361
Address review feedback to clarify examples
jwhite242 Oct 23, 2025
85864b6
Wire up proper filtering and logging of unhandled allocation args in …
jwhite242 Oct 24, 2025
69aa1cb
Cleanup comments/misc formatting fixes
jwhite242 Oct 24, 2025
5869383
Fix coverage to work properly when not at project root
jwhite242 Oct 24, 2025
8673a42
Add emacs tmp/backup files to gitignore
jwhite242 Oct 24, 2025
b6dd8f7
Tick dev version
jwhite242 Oct 27, 2025
122a97b
Rewrite flux jobspec diff fixture for improved flexibility on filteri…
jwhite242 Oct 30, 2025
1742833
Add target dependent coercion/normalization such that more user frien…
jwhite242 Oct 30, 2025
a3f2a97
Add jobname and output file templates to batch script target to match…
jwhite242 Oct 30, 2025
af938fb
Add job name, output files, and procs to header. Update INFO directi…
jwhite242 Nov 24, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ sample_output/
# VSCode
.vscode

# Emacs backup files
*~
# Emacs auto-save files
\#*#
.#*

# Doxygen output
docs/html/

Expand Down
86 changes: 85 additions & 1 deletion docs/Maestro/scheduling.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,13 @@ steps:
| `procs` | No | int | Optional number of tasks in batch allocations: note this is also a per step key |
| `flux_uri` | Yes* | str | URI of the Flux instance to schedule jobs to. * Only used with `type`=`flux`. NOTE: It is recommended to rely on environment variables instead, as URIs are very ephemeral and may change frequently.|
| `version` | No | str | Optional version of flux scheduler; for accommodating api changes |
| `args` | No | dict | Optional additional args to pass to scheduler; keys are arg names, values are arg values |
| :warning: `args` | No | dict | Optional additional args to pass to scheduler; keys are arg names, values are arg values |
| `allocation_args` | No | dict | Optional scheduler specific options/flags to add to allocations :material-information-slab-circle: flux only in :material-tag:`1.1.12` |
| `launcher_args` | No | dict | Optional scheduler specific options/flags to add to launcher commands :material-information-slab-circle: flux only in :material-tag:`1.1.12` |

!!! warning "`args` deprecated"

`args` has been marked deprecated in :material-tag: `1.1.12` in favor of the more flexible `allocation_args` and `launcher_args`

The information in this block is used to populate the step specific batch scripts with the appropriate
header comment blocks (e.g. '#SBATCH --partition' for slurm). Additional keys such as step specific
Expand All @@ -40,6 +44,28 @@ run locally unless at least the ``nodes`` or ``procs`` key in the step is popula

See [queues and banks](how_to_guides/running_with_flux.md#queues-and-banks) section in the how-to guide on running with flux for more discussion.

### Extra Arguments
---

There are new groups in the batch block in :material-tag:`1.1.12` that facilitate adding custom options to both allocations and the `$(LAUNCHER)` invocations independently. These are grouped into two dictionaries in the batch block which are meant to enable passing in options that Maestro cannot abstract across schedulers more generally:

* `allocation_args` for the allocation target (batch directives such as `#Flux: --setopt=foo=bar`)
* `launcher_args` for the `$(LAUNCHER)` target (`flux run --setopt=foo=bar`)

These are ~structured mappings which are designed to mimic cli and batch script directive syntax for intuitive mapping from raw scheduler usage to Maestro. Each of these dictionaries' keys correspond to a scheduler specific CLI argument/option or flag. The serialization rules are as follows, with specific examples here shown for the initial implementation in the flux adapter (other schedulers will yield prefix/separator rules specific to their implementation):

| **Key Type** | **Prefix** | **Separator** | **Example YAML** | **Example CLI Input/Directive** |
| :- | :- | :- | :- | :- |
| Single letter | `-` | `" "` (space) | <pre><code><span>o:</span></br><span> bar: 42</span></code></pre> | `-o bar=42` |
| Multi-letter | `--` | `=` | <pre><code><span>setopt:</span></br><span> foo: bar</span></code></pre> | `--setopt=foo=bar` |
| Boolean flag w/key | as above | as above | <pre><code><span>setopt:</span></br><span> foobar: #</span></code></pre> | `--setopt=foobar` |
| Boolean flag w/o key | as above | as above | `exclusive: #` | `--exclusive` |

!!! note "Boolean/Flag type arguments"

In the boolean flag strategies, a space is required after the `:` after `foobar: ` or `exclusive`, otherwise yaml will fail to parse and assign the Null value used to tag a key as a boolean flag. See [flux](#Flux) for special considerations for the `allocation_args`. See the above examples where a '#' is added after a space to ensure there is such a space after the `:`.


## LAUNCHER Token
---

Expand Down Expand Up @@ -187,6 +213,64 @@ See the [flux framework](https://flux-framework.readthedocs.io/en/latest/index.h
The Flux scheduler itself and Maestro's flux adapter are still in a state of flux and may go through breaking changes more frequently than the Slurm and LSF scheduler adapters.


### Extra Flux Args
----

As of :material-tag:`1.1.12`, the flux adapter takes advantage of new argument pass through for scheduler options that Maestro cannot abstract away. This is done via `allocation_args` and `launcher_args` in the batch block, which expand upon the previous `args` input which only applied to `$(LAUNCHER)`. There are some caveat's here due to the way Maestro talks to flux. The current flux adapters all use the python api's from Flux to build the batch jobs, with the serialized batch script being serialized separately instead of submitted directly as with the other schedulers. A consequence of this is the `allocation_args` map to specific call points on that python api, and thus the option pass through is not quite arbitrary. There are 4 currently supported options for allocations which cover a majority of usecases (open an issue and let us know if there is a usecase you need that is not covered!):

* shell options: `-o/--setopt` prefixed arguments
* attributes: `-S/--setattr` prefixed arguments
* conf: `--conf` prefixed arguments
* exclusive flags: `-x, --exclusive` are used to set defaults, with step exclusive keys overriding

!!! warning

All other flags will be allowed in `allocation_args`, but they will essentially be ignored when serializing the step scripts and submitting jobs

The `launcher_args` (`$(LAUNCHER)`) will pass through anything as it is a string generator just like other script adapters. :warning: These are not validated! Passing arguments that flux doesn't know what to do with may result in errors.

#### Example Batch Block
---

``` yaml
batch:
type: flux
host: machineA
bank: guests
queue: debug
allocation_args:
setopt:
foo: bar
o:
bar: 42
setattr:
foobar: "whoops"
conf:
resource.rediscover: "true" # Use string "true" for Flux compatibility, not "True" or bool True
launcher_args:
setopt:
optiona: # Boolean flag, no value needed. NOTE: This is a made up key for demonstration
```

#### Example Batch Script
---
Assuming the step has keys `{procs: 1, nodes: 1, cores per task: 1, walltime: "5:00"}`:

``` console
#flux: -q debug
#flux: --bank=guests
#flux: -t 300s
#flux: --setopt=foo=bar
#flux: --setopt=bar=42
#flux: --setattr=foobar=whoops
#flux: --conf=resource.rediscover=true

flux run -n 1 -N 1 -c 1 --setopt=optiona myapplication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question, "option a" or "optional"? From this context I'm assuming the former in which case we may want to change the naming convention for clarity?

```

!!! note

Using flux directives here to illustrate even though python api is used. These directives will be in the step scripts, retaining repeatability/record of what was submitted and viewable with the dry run feature. The batch/allocation arguments are normalized to the long form (`--setattr` instead of `-S`) and will show up that way in the serialized batch scripts.

## LSF: a Tale of Two Launchers
----
Expand Down
105 changes: 105 additions & 0 deletions maestrowf/abstracts/interfaces/flux.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,3 +248,108 @@ def key(self):
:return: A string of the name of a FluxInterface class.
"""
...

@classmethod
def addtl_alloc_arg_types(cls):
"""
Return set of additional allocation args that this adapter knows how
to wire up to the jobspec python apis, e.g. 'attributes',
'shell_options', ... This is aimed specifically at the repeated types,
which collect many flags/key=value pairs which go through a specific
jobspec call. Everything not here gets dumped into a 'misc' group
for individual handling.

:return: List of string

.. note::

Should we have an enum for these or something vs random strings?
Comment on lines +264 to +266
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Could probably help with safeguarding against unsupported options

"""
# default nothing, overriding in implementation
return []

@classmethod
def render_additional_args(cls, args_dict):
"""
Helper to render additional argument sets to flux cli format for
use in constructing $(LAUNCHER) line and flux batch directives.
This default implementation yields a single empty string.

:param args_dict: Dictionary of flux arg keys and name: value pairs
:yield: formatted strings of cli options/values

.. note::

Promote this to the general/base adapters to handle non-normalizable
scheduler/machine specific options
"""
yield ""

@classmethod
def addtl_alloc_arg_type_map(cls, option):
"""
Map verbose/brief cli arg option name (o from -o, setopt from --setopt)
onto known alloc arg types this interface implements

:param option: option string corresponding to flux cli input
:return: string, one of known_alloc_arg_types
"""
# Default to pass through, override in implementation
return option

@classmethod
def get_addtl_arg_cli_key(cls, arg_type):
"""
Return expected cli key associated with each normalized arg type.
`arg_type` not in known_arg_types are assumed to be the key already
to facilitate flexible pass through to launcher

:param arg_type: string noting arg group or cli key
:returns: cli key used for this arg

.. note::

Can we find a reasonable default prefix (where are things put
by default in flux, attributes.system?)
"""
# Default to pass through, handling known types/mapping in implementation
return arg_type

@staticmethod
def get_cli_arg_prefix_sep(cli_key):
"""
Helper for rendering extra options on cli/batch directives. Sets prefix
and value separator based on length of cli key. Flux has two conventions:
single letter cli_key has prefix of '-' and separator of ' ' while
multiletter cli_key has prefix of '--' and separator of '='. Examples
'-o foo=2' or '--setopt=foo=2' for single letter cli_key (o) and
multiletter (setopt) forms to set the same option.

:param cli_key: the key to use on the cli form of an argument
:type cli_key: str
:returns: dict containing 'prefix' and 'sep' for use in rendering
"""
if len(cli_key) == 1:
return {"prefix": "-", "sep": " "}
else:
return {"prefix": "--", "sep": "="}

@classmethod
def normalize_additional_args(cls, args_dict, group_name=None, filter_unknown=False):
"""
Helper to normalize additional arguments to known types and an
unflattened nested dictionary structure. This unflattens any
dotpath encoded nested dictionary keys.

:param args_dict: Dictionary of flux arg keys and name: value pairs
:type args_dict: dict
:param group_name: Optional name of group/tag to use in log messages
when filtering_unknown is on
:type group_name: str
:param filter_unknown: flag to block pass through of unknown args, e.g.
for allocation where we can't handle arbitrary
:type filter_unknown: bool
:return: dict of packed args with top level keys being the adapter
version specific addtl_alloc_arg_types
"""
return args_dict
69 changes: 69 additions & 0 deletions maestrowf/abstracts/interfaces/schedulerscriptadapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,75 @@ def add_batch_parameter(self, name, value):
"""
self._batch[name] = value

def get_exclusive(self, step_exclusive):
"""
Helper for normalizing new/legacy exclusive syntax in the step keys.
None used as sentinel for 'not provided' to facilitate layering
on top of default values.

:param step_exclusive: value of 'exclusive' key in a StudyStep.run
:return dict: normalized dict with 'allocation' and 'launcher' keys
and bool or None values for each

.. note::

Move this upstream into a future studystep validator? also add
hooks for per scheduler normalizing of the extra args from batch
blocks
"""
# Handle old scalar syntax which applied to allocatios only
if not isinstance(step_exclusive, dict):
if step_exclusive is not None:
return {
"allocation": step_exclusive,
"launcher": None,
}
else:
return {
"allocation": None,
"launcher": None,
}
else:
# Yaml schema limits keys already
exclusive_dict = step_exclusive
if 'allocation' not in step_exclusive:
exclusive_dict['allocation'] = None
if 'launcher' not in step_exclusive:
exclusive_dict['launcher'] = None
return exclusive_dict

def resolve_exclusive(self, adapter_exclusive, step_exclusive):
"""
Helper layering step exclusive config ontop of adapter, treating
adapter as default values to override.

:param adapter_exclusive: 'default' exclusive settings from batch
config. This is a per queue/machine constant.
:type adapter_exclusive: dict
:param step_exclusive: value of 'exclusive' key in StudyStep.run
:type adapter_exclusive: dict
:return dict: normalized dict with 'allocation' and 'launcher' keys
and bool values for each

.. note::

Move this upstream into a future studystep validator? also add
hooks for per scheduler normalizing of the extra args from batch
blocks
"""
# Normalize the step's exclusive to gracefully handle old behavior
exclusive_updates = self.get_exclusive(step_exclusive)

# Apply update such that step wins if not None
exclusive_dict = {}
for key, default_val in adapter_exclusive.items():
if key in exclusive_updates and exclusive_updates[key] is not None:
exclusive_dict[key] = exclusive_updates[key]
else:
exclusive_dict[key] = default_val

return exclusive_dict

@abstractmethod
def get_header(self, step):
"""
Expand Down
Loading
Loading