Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to slurm based launchers when salloc/srun is available #26170

Open
jabraham17 opened this issue Oct 30, 2024 · 1 comment
Open

Default to slurm based launchers when salloc/srun is available #26170

jabraham17 opened this issue Oct 30, 2024 · 1 comment

Comments

@jabraham17
Copy link
Member

I recently had some issues running on a slurm-based IB system. The problem was I was using the default launcher for IB when COMM=gasnet, which is gasnetrun_ibv. However, that launcher requires you to make your own slurm allocation using salloc (see https://chapel-lang.org/docs/main/usingchapel/launcher.html#using-any-ssh-based-launcher-with-slurm). The solution was either to manually make salloc calls, or just use slurm-gasnetrun_ibv which handles that for you.

This is a simple solution, but why is it necessary? It seems like if we can detect a slurm based system, we should default to a slurm based launcher. This led me to investigate util/chplenv/chpl_launcher.py where we do actually have that detection, but only on cray-cs and hpe-apollo.

I went looking though the history for this and found two PRs making this change, #17314 for gasnet and #17305 for other comm layers. Based on these PR messages, we only default to slurm based launchers on cray/hpe systems because it was messing with internal testing systems that want to use a different launcher but have slurm.

This feels like optimizing for the wrong case, we should default to what is common for users.

In my opinion this is a simple change, just remove the checks for the target platform and adjust automated testing systems as needed. However, there may be other cases I am not thinking of where we would not want to default to a slurm-based launcher.

@bradcray
Copy link
Member

This sounds great to me, thanks for investigating, Jade!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants