-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: schedulers #47
Open
wikfeldt
wants to merge
5
commits into
swcarpentry:gh-pages
Choose a base branch
from
wikfeldt:carpcon-schedulers
base: gh-pages
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
WIP: schedulers #47
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
91474fa
add what and why section
wikfeldt 1a5d635
add first section of interactive work
wikfeldt b6cf038
add questions to lead up to batch jobs
wikfeldt a9bf827
add reservation on interactive jobs
wikfeldt a61a953
transition to batch jobs
wikfeldt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,83 @@ | ||
--- | ||
title: "Using the Batch System 101" | ||
teaching: 25 | ||
exercises: 20 | ||
questions: | ||
- "To be filled." | ||
keypoints: | ||
- "To be filled." | ||
--- | ||
|
||
# What is a scheduler and why do we need it? | ||
|
||
An HPC system might have thousands of nodes and thousands of users. | ||
How do we decide who gets what and when? | ||
How do we ensure that a task is run with the resources it needs? | ||
This job is handled by a special piece of software called the scheduler. | ||
On an HPC system, the scheduler manages which jobs run where and when. | ||
|
||
**here we need a simple schematic image showing what a scheduler does** | ||
|
||
- | ||
|
||
# Working interactively | ||
|
||
A first exercise would be to submit a job that does nothing else but print "Hello World!". | ||
|
||
~~~ | ||
{% include /snippets/02/submit_hello_world_to_void.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .bash} | ||
|
||
~~~ | ||
{% include /snippets/02/output_hello_world_to_void.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .output} | ||
|
||
|
||
That worked out pretty well. The problem is, it's not very helpful and doesn't help Lola or anyone to do her job. But Lola wonders if the job really was executed on another node. She thinks of a little experiment to explore the scheduler a bit. | ||
|
||
~~~ | ||
{% include /snippets/02/submit_hostname_experiment.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .bash} | ||
|
||
~~~ | ||
{% include /snippets/02/output_hostname_experiment.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .output} | ||
|
||
If she repeats this command, over and over again, the output changes. | ||
So these commands must be running on another node. | ||
|
||
The above instructions may not work on all sites, i.e. they are configurable for different | ||
schedulers and may be deactivated on your cluster. | ||
|
||
### Limitation of interactive work | ||
- what if we have a very long job, and we don't have time to wait for it to finish? | ||
- what if I need to run a lot of different commands? | ||
|
||
|
||
# Batch jobs | ||
|
||
We have seen how to run interactively, but this is often not enough for more complex tasks, | ||
i.e. executing a couple of commands after one another, | ||
or if a job takes too long. | ||
To solve this, we write a small script that can be run on the node. | ||
|
||
~~~ | ||
{% include /snippets/02/submit_hostname_date.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .bash} | ||
|
||
~~~ | ||
{% include /snippets/02/output_hostname_date.{{ site.workshop_scheduler }} %} | ||
~~~ | ||
{: .output} | ||
|
||
|
||
# Managing jobs | ||
|
||
- errors, cancelling | ||
- monitoring | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "interactively" here, are we referring to those special confgurations many HPC systems have where users are often allowed to acquire a tiny allocation of the machine and run jobs there in the same way they might run jobs at the shell prompt? Or, are we really only using "interactively" as a adjective for those kinds of batch job submission cases where the job is extremely short running and the queue is empty enough such that the batch system is able to schedule and run the job so quickly it feels interactive to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are referring to the former case, the situation where the user can get book an interactive session and run jobs on a compute node from the command line. We know that this isn't an option at many sites, and the plan was to make this section straightforward to leave out. But we wanted to keep it there for those sites where it's possible since it's a smaller conceptual step to go from own laptop to interactive node on cluster, rather than directly to batch script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or what do you think? I'm unaware of the statistics on this, slurm is configured with this interactive option on most sites in Sweden, but I don't know how it looks globally