Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: schedulers #47

Open
wants to merge 5 commits into
base: gh-pages
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions _episodes/30-batch-system.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,83 @@
---
title: "Using the Batch System 101"
teaching: 25
exercises: 20
questions:
- "To be filled."
keypoints:
- "To be filled."
---

# What is a scheduler and why do we need it?

An HPC system might have thousands of nodes and thousands of users.
How do we decide who gets what and when?
How do we ensure that a task is run with the resources it needs?
This job is handled by a special piece of software called the scheduler.
On an HPC system, the scheduler manages which jobs run where and when.

**here we need a simple schematic image showing what a scheduler does**

-

# Working interactively
Copy link

@markcmiller86 markcmiller86 Jun 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "interactively" here, are we referring to those special confgurations many HPC systems have where users are often allowed to acquire a tiny allocation of the machine and run jobs there in the same way they might run jobs at the shell prompt? Or, are we really only using "interactively" as a adjective for those kinds of batch job submission cases where the job is extremely short running and the queue is empty enough such that the batch system is able to schedule and run the job so quickly it feels interactive to the user?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are referring to the former case, the situation where the user can get book an interactive session and run jobs on a compute node from the command line. We know that this isn't an option at many sites, and the plan was to make this section straightforward to leave out. But we wanted to keep it there for those sites where it's possible since it's a smaller conceptual step to go from own laptop to interactive node on cluster, rather than directly to batch script

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or what do you think? I'm unaware of the statistics on this, slurm is configured with this interactive option on most sites in Sweden, but I don't know how it looks globally


A first exercise would be to submit a job that does nothing else but print "Hello World!".

~~~
{% include /snippets/02/submit_hello_world_to_void.{{ site.workshop_scheduler }} %}
~~~
{: .bash}

~~~
{% include /snippets/02/output_hello_world_to_void.{{ site.workshop_scheduler }} %}
~~~
{: .output}


That worked out pretty well. The problem is, it's not very helpful and doesn't help Lola or anyone to do her job. But Lola wonders if the job really was executed on another node. She thinks of a little experiment to explore the scheduler a bit.

~~~
{% include /snippets/02/submit_hostname_experiment.{{ site.workshop_scheduler }} %}
~~~
{: .bash}

~~~
{% include /snippets/02/output_hostname_experiment.{{ site.workshop_scheduler }} %}
~~~
{: .output}

If she repeats this command, over and over again, the output changes.
So these commands must be running on another node.

The above instructions may not work on all sites, i.e. they are configurable for different
schedulers and may be deactivated on your cluster.

### Limitation of interactive work
- what if we have a very long job, and we don't have time to wait for it to finish?
- what if I need to run a lot of different commands?


# Batch jobs

We have seen how to run interactively, but this is often not enough for more complex tasks,
i.e. executing a couple of commands after one another,
or if a job takes too long.
To solve this, we write a small script that can be run on the node.

~~~
{% include /snippets/02/submit_hostname_date.{{ site.workshop_scheduler }} %}
~~~
{: .bash}

~~~
{% include /snippets/02/output_hostname_date.{{ site.workshop_scheduler }} %}
~~~
{: .output}


# Managing jobs

- errors, cancelling
- monitoring
- email