You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently had some issues running on a slurm-based IB system. The problem was I was using the default launcher for IB when COMM=gasnet, which is gasnetrun_ibv. However, that launcher requires you to make your own slurm allocation using salloc (see https://chapel-lang.org/docs/main/usingchapel/launcher.html#using-any-ssh-based-launcher-with-slurm). The solution was either to manually make salloc calls, or just use slurm-gasnetrun_ibv which handles that for you.
This is a simple solution, but why is it necessary? It seems like if we can detect a slurm based system, we should default to a slurm based launcher. This led me to investigate util/chplenv/chpl_launcher.py where we do actually have that detection, but only on cray-cs and hpe-apollo.
I went looking though the history for this and found two PRs making this change, #17314 for gasnet and #17305 for other comm layers. Based on these PR messages, we only default to slurm based launchers on cray/hpe systems because it was messing with internal testing systems that want to use a different launcher but have slurm.
This feels like optimizing for the wrong case, we should default to what is common for users.
In my opinion this is a simple change, just remove the checks for the target platform and adjust automated testing systems as needed. However, there may be other cases I am not thinking of where we would not want to default to a slurm-based launcher.
The text was updated successfully, but these errors were encountered:
I recently had some issues running on a slurm-based IB system. The problem was I was using the default launcher for IB when COMM=gasnet, which is
gasnetrun_ibv
. However, that launcher requires you to make your own slurm allocation usingsalloc
(see https://chapel-lang.org/docs/main/usingchapel/launcher.html#using-any-ssh-based-launcher-with-slurm). The solution was either to manually makesalloc
calls, or just useslurm-gasnetrun_ibv
which handles that for you.This is a simple solution, but why is it necessary? It seems like if we can detect a slurm based system, we should default to a slurm based launcher. This led me to investigate
util/chplenv/chpl_launcher.py
where we do actually have that detection, but only oncray-cs
andhpe-apollo
.I went looking though the history for this and found two PRs making this change, #17314 for gasnet and #17305 for other comm layers. Based on these PR messages, we only default to slurm based launchers on cray/hpe systems because it was messing with internal testing systems that want to use a different launcher but have slurm.
This feels like optimizing for the wrong case, we should default to what is common for users.
In my opinion this is a simple change, just remove the checks for the target platform and adjust automated testing systems as needed. However, there may be other cases I am not thinking of where we would not want to default to a slurm-based launcher.
The text was updated successfully, but these errors were encountered: