Skip to content

Conversation

@hppritcha
Copy link
Member

@hppritcha hppritcha commented Nov 25, 2025

i can't directly push to the v3.0 branch (so can't sync via github web api) so here's the PR to do so.
This is second step of what we discussed yesterday.

rhc54 and others added 30 commits April 3, 2024 07:00
Correct the protection to use static versions of
pmix_getline if PMIx version is less than v4.2.5

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 5cde35d)
Always default the number of slots to the available cpus
in the topology. Ensure that we always display some form
of the resulting proces map, or else we will silently
exit.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit f01e2a2)
It should be `help-hostfile.txt`, not `help-hostfiles.txt`

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit c34d91a)
If we use one cpu from an object, then we will get a NULL
response if we ask for the next object of that type within
the remaining cpuset since not all of the cpus in the object
are still available. This problem resulted from the recent
change to only use available cpus in PRRTE topologies.

So instead scan across the cpus, check to see if it is
inside the object of interest - if so, then we can bind
to that cpu, if not then we keep searching.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 2d0a840)
Only automatically set the display map flag if we are not
launching the job.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 7c3ae98)
Attempt to make it clearer that the binding failed
due to a lack of cpus for the given map/bind
policies.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit dfcc9a7)
PRRTE itself no longer requires specific resilience settings.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 8b95fe2)
Add a new cmd line option that corresponds to this
attribute. Add the attribute to the prun payload.
When received, it will default to including in the
job info for the spawned job. Add query support for it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 3957789)
Homebrew has broken something and I cannot figure
out how to fix it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 2ac45f3)
Changes will need to be made to Open MPI to parse the contents of
the OMPI_MCA_mpi_memory_alloc_kinds environment variable to
determine how to use the user supplied memory-alloc-kinds information.

See section 11.4.3 of the MPI 4.1 standard.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit c5953e1)
Get takes a (pmix_value_t**), so don't cast it to (void**)

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 475df02)
If we haven't requested LSF support, then don't warn
about not finding yp_all - we didn't ask for LSF,
so no need to warn us if support cannot be built.
It will show in the summary at end of configure.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit fcaa417)
Now that we have a broader group of contributors starting
to show up, we probably need to start paying more attention
to code quality of contributions. Enable devel-check by
default in Git clones that are configured with enable-debug.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 5ffc3d4)
Try adding a build using latest Clang

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 1a3dc29)
When building against older PMIx

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 940a474)
Signed-off-by: Ralph Castain <rhc@pmix.org>
It has been reported (and confirmed) that building against
one version of PMIx and then running with another version
will cause PRRTE to segfault. This isn't a universal rule.
For example, one can switch v5.0 and master without a
problem. However, switching v5.0 and v4.2 is a definite
segfault.

The root cause of the problem is a change in the layout
of the base pmix_object_t definition. This renders all
PMIx objects binary incompatible when crossing between
the v5 and v4 (and below) series.

Changing the v5 definition back to match v4 is an
overly complex task. The changes were required to
accommodate the new shared memory support that
was introduced in v5.

So instead, we check the runtime version of PMIx against
the build version. If the runtime version is incompatible
with the build version, then we print an explanatory
error message and error out.

Signed-off-by: Ralph Castain <rhc@pmix.org>

dd

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d02ad07)
Refs open-mpi/ompi#12540

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 7e0ff9b)
We had problems in the past with quoted params, but stripping
quotes also has consequences - not clear of the best solution.
For now, let's try going the other way and see how many
problems we encounter.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit be840ab)
Take only the piece that is applicable to v3.0.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry-pick of openpmix@891bad8)
Fix the issues with the MacOS builds so that they work again in Github
Action environments.

Signed-off-by: Jeff Squyres <jeff@squyres.com>
(cherry picked from commit 4a682ef)
Enables build against v1.11.8 and above.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit ac80553)
If we are trying to bind to an HWLOC object type that is not
defined on a given node, then (a) if the binding policy was
specified by user, then error out; and (b) if we are using
a default binding policy, then simply do not bind.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 5d21059)
Signed-off-by: Ralph Castain <rhc@pmix.org>
In some recent Slurm versions, the Slurm runtime is inserting
custom arguments to the PRRTE launcher's `srun` cmd line without
the user being aware of it. In many cases, this may not be a
problem - but in some cases (where the user or the system
admin needs/wants particular cmd line arguments used) this can
cause problems as it happens silently, without the user being
aware of it.

Make this visible when it happens, and provide a mechanism by
which the user/admin can override it. Provide a fairly long
help message explaining what happened and offering advice on
resolution, along with a param for disabling the warning. Add
a param for overriding the "args" param if necessary, along
with a caution as to possible consequences.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 092cd7c)
RTD is rolling out some changes. Per
https://about.readthedocs.com/blog/2024/07/addons-by-default/, these are the changes we need to make.

Port of open-mpi/ompi#12687

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 584845f)
We currently do not support the LTO optimizer
as it is incompatible with our plugin component
architecture. So detect it has been specified
in configure and error out with an explanation.

Includes suggestions from @jsquyres

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit dd7706c)
Break the multi-loop thru loading of param files
that caused us to overwrite values. Defer to the
PMIx pmdl components for obtaining envars and for
checking MCA param overlaps across projects.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit a68d647)
Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit ce25672)
Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit e204c73)
rhc54 and others added 27 commits September 4, 2025 12:22
Make the remote connection and foreign tool settings be via
MCA param so they can be globally set. Don't set the remote
connection option unless someone specified it so that PMIx
can use the default behavior if necessary.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit ce2f0c2)
Signed-off-by: Ralph Castain <rhc@pmix.org>
When fixing a merge conflict, some code was inadvertently
removed, so replace it

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Ralph Castain <rhc@pmix.org>
Provide MCA params to control the ability for a client to connect
even if it has a different pid than what we started. This happens
when an intermediate script or executable is being used to fork
the client - e.g., in the case of a debugger. Set this to not
require pid match by default.

Also provide a switch to enable/disable client clones - i.e., for
a client process to fork a child that also connects back to the
PMIx server since it will use the same nspace/rank as its parent.
This is currently an unusual use-case, but allowed by the Standard.
Set this to not allow clones by default.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 52c8851)
We captured the HNP's aliases in prte_process_info, but that
happened _after_ we had already copied them to the HNP's
node object. So when we then checked the node aliases, they
were missing from that node.

Ensure we capture the HNP's aliases on the node object. Simplify
the check for local node by including the "localhost" and
"127.0.0.1" aliases, being sure not to include them in the
nidmap. Correct the check in dash-host for matching node
names.

Thanks to Alexey Novikov for the report

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 8070277)
Signed-off-by: Ralph Castain <rhc@pmix.org>
If someone specifies that child jobs inherit from their
parents, then have them inherit any env directives as
well as job-level directives.

Have children inherit their parent's inheritance directive,
unless directed not to do so.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit eb577d4)
If we are inheriting envar directives from our parent job, then
extend that to inheriting envar directives for the application
of the proc that spawned us. Shift processing of inheritance
directives to the mapper, and ensure that the child inherits
the inheritance directive so that the grandchildren will also
inherit.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit a63791f)
Check RAS components for compile errors by shimming
the environment-specific functions

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 17399cd)
Therer were two compensating errors that wound up yielding the
correct map, but had a flaw in it should a certain condition
exist. So rework the code to fix the errors and remove the
flaw.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit bdbf4db)
Work from left-to-right across the cmd line, applying env-related
options as we go. When one operation affects the result of another,
this preserves a user's common expectation.

Add a "--set-env" option if the corresponding PMIx CLI is defined.
Seemed a little weird that we had "prepend-env", "append-env", etc.,
but no "set-env". It's the equivalent of "-x foo=val".

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 805e130)
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
(cherry picked from commit 0b1ada9)
This error is also displayed in cases where files or directories do not
exist and is not only caused by missing permissions.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
(cherry picked from commit ac77387)
Allow the target node list to follow the ordering inside a provided hostfile
and dash-host specification by not assigning a bookmark based on the DVM job.

Add support for missing default-hostfile cmd line option We have the support
for the user to specify it via MCA param, but somehow we lost the integration
to pick it up off of the prte and prterun cmd lines.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 16d8412)
PPR placement policy requests are uniform - i.e., the specified
number of procs must be placed on every object of the directed
type. When the request includes a cpu/proc directive, then there
must also be enough CPUs to meet the request on every object.

When that isn't the case, then we need to error out and not
just place the proc without binding it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 665c38e)
If we are using the seq or rankfile mapper and have multiple
apps on the cmd line, then allow the mappers to compute
their own num procs if one or more are not given.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit cb17cce)
The empty nodes were not properly being added to the list
of names to be used by the mapper.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 58130c6)
Per note in the OMPI project, at least one compiler family is removing the "sprintf" function. Replace all uses of that function with the safer "snprintf" version.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 2ff7d6b)
When a timeout is specified and the primary job is timed-out,
then we need to ensure we also report and kill any child jobs
it started. This includes reporting any requested stack
traces.

Also all inheritance of output directives like tag and timestamp.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d072f27)
Port the "launching-apps" section from the OMPI docs over
to PRRTE since it specifically deals with prterun usage.
Add some updates about gridengine support courtesy of
open-mpi/ompi#13450.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 424480d)
Use the hwloc synthetic topology string as the signature
instead of our custom attempt at counting number of types
of objects - the synthetic retains some hierarchical info
and hopefully does a little better job of detecting hetero
nodes are in use.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 7e5d030)
Update the MCA param help message to clarify what the param
does and what values it supports. Cleanup an error where we
would overwrite the resulting list of signals to forward.
Cleanup the return value so we don't generate spurious
error log output. Provide verbose output showing the
signals being forwarded.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 2845dcd)
Further improve automatic handling of hetero nodes
by making the non-symmetric signature unique, thereby
forcing collection of the full topology from each
such node. Fix an error in the topology retrieval
procedure whereby we double-counted cached nodes,
thereby causing us to quit collecting topologies early.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 4671290)
Need to init the ess framework to have the signal forwarding list initialized

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit bff13fb)
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Matthew Whitlock <mwhitlo@sandia.gov>
(cherry picked from commit da3ca98)
@hppritcha hppritcha requested a review from janjust November 25, 2025 18:18
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

5b8889e: Update NEWS and VERSION

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2e89339: Final update of NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

0267dfe: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2cb071d: Replace some incorrectly removed code

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

25d98c8: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b33a522: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b6c3a01: Protect against running with PMIx versions too hig...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

f86c858: Check for PMIx version too high

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

222f03f: Update VERSION and NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

0ff51bd: Update NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

c4f6f78: Roll version to 3.0.10

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

d20e10c: Update NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

20f2c2a: Constrain PMIx versions

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

7b9b2aa: Protect against stone age HWLOC

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

648aa78: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

253f60a: Minor cleanups

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

1141770: Add mpi4py CI

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

5a42463: Add build against older PMIx CI

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

be9ad17: Minor cleanups

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

ac40d3f: Remove the group CI as this release branch doesn't...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

611f87b: Roll VERSION for end of release branch

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

f6f5c18: Final update for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

35270a9: Update NEWS and VERSION

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2828a49: Revert "configure.ac: generate prte_version.h prop...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

37e0525: Revert "configure.ac: generate prte_version.h prop...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b2f4163: Update NEWS and VERSION for final release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

1b6e6d7: Update NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

e9507eb: Protect against old PMIx versions

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

50147d8: Pull a couple of fixes from master branch

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

289e6ab: 3.0: fix support for MPIEXEC_TIMEOUT

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b68a0ac: Update NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

c6c9d12: Tailored backport of "various fixes for singleton ...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

fce79e9: Cleanup issues surfaced by devel-check

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

1 similar comment
@github-actions
Copy link

Hello! The Git Commit Checker CI bot found a few problems with this PR:

5b8889e: Update NEWS and VERSION

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2e89339: Final update of NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

0267dfe: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2cb071d: Replace some incorrectly removed code

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

25d98c8: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b33a522: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b6c3a01: Protect against running with PMIx versions too hig...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

f86c858: Check for PMIx version too high

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

222f03f: Update VERSION and NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

0ff51bd: Update NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

c4f6f78: Roll version to 3.0.10

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

d20e10c: Update NEWS for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

20f2c2a: Constrain PMIx versions

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

7b9b2aa: Protect against stone age HWLOC

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

648aa78: Update NEWS

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

253f60a: Minor cleanups

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

1141770: Add mpi4py CI

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

5a42463: Add build against older PMIx CI

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

be9ad17: Minor cleanups

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

ac40d3f: Remove the group CI as this release branch doesn't...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

611f87b: Roll VERSION for end of release branch

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

f6f5c18: Final update for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

35270a9: Update NEWS and VERSION

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

2828a49: Revert "configure.ac: generate prte_version.h prop...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

37e0525: Revert "configure.ac: generate prte_version.h prop...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b2f4163: Update NEWS and VERSION for final release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

1b6e6d7: Update NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

e9507eb: Protect against old PMIx versions

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

50147d8: Pull a couple of fixes from master branch

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

289e6ab: 3.0: fix support for MPIEXEC_TIMEOUT

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

b68a0ac: Update NEWS and VERSION for release

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

c6c9d12: Tailored backport of "various fixes for singleton ...

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

fce79e9: Cleanup issues surfaced by devel-check

  • check_cherry_pick: does not include a cherry pick message (did you need to bot:notacherrypick?)

Please fix these problems and, if necessary, force-push new commits back up to the PR branch. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants