From c3012c69ce33a7105c70902a879888e35a92b25c Mon Sep 17 00:00:00 2001 From: vsoch Date: Tue, 16 Jul 2024 18:46:07 -0600 Subject: [PATCH] radiuss 2024: flatten structure The notebook structure should primarily be organized by command, since this is what the new user will interact with. This change set better does that, and flattens / organizes things a bit better overall. Signed-off-by: vsoch --- .../tutorial/01_flux_tutorial.ipynb | 336 +++++++----------- .../tutorial/02_flux_framework.ipynb | 2 +- 2 files changed, 138 insertions(+), 200 deletions(-) diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb index c91aee7e..36370ad4 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/01_flux_tutorial.ipynb @@ -279,6 +279,67 @@ "! flux uptime" ] }, + { + "cell_type": "markdown", + "id": "ec052119", + "metadata": {}, + "source": [ + "## Flux Resources\n", + "\n", + "When you are interacting with Flux, you will commonly want to know what resources are available to you. Flux uses [hwloc](https://github.com/open-mpi/hwloc) to detect the resources on each node and then to populate its resource graph.\n", + "\n", + "You can access the topology information that Flux collects with the `flux resource` subcommand. Let's run `flux resource list` to see the resources available to us in this notebook:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "scenic-chassis", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " STATE NNODES NCORES NGPUS NODELIST\n", + " free 4 40 0 f5af[12550686,12550686,12550686,12550686]\n", + " allocated 0 0 0 \n", + " down 0 0 0 \n" + ] + } + ], + "source": [ + "!flux resource list" + ] + }, + { + "cell_type": "markdown", + "id": "0086e47e", + "metadata": {}, + "source": [ + "Flux can also bootstrap its resource graph based on static input files, like in the case of a multi-user system instance setup by site administrators. [More information on Flux's static resource configuration files](https://flux-framework.readthedocs.io/en/latest/adminguide.html#resource-configuration). Flux provides a more standard interface to listing available resources that works regardless of the resource input source: `flux resource`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "prime-equilibrium", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " STATE UP NNODES NODELIST\n", + " avail \u001b[01;32m ✔\u001b[0;0m 4 f5af[12550686,12550686,12550686,12550686]\n" + ] + } + ], + "source": [ + "# To view status of resources\n", + "!flux resource status" + ] + }, { "cell_type": "markdown", "id": "dee2d6af-43fa-490e-88e9-10f13e660125", @@ -288,7 +349,7 @@ "source": [ "# Submitting Jobs to Flux 💼️\n", "\n", - "How to submit jobs to Flux? Let us count the ways! Here are how Flux commands map to other schedulers you are familiar with.\n", + "How to submit jobs to Flux? Let us count the ways! Here are how Flux commands map to other schedulers you are familiar with. You can use the `flux` `submit`, `run`, `bulksubmit`, `batch`, and `alloc` commands.\n", "\n", "\n", " \n", @@ -328,10 +389,9 @@ " \n", "
\n", "\n", - "## Submission Client\n", - "### `flux`: the Job Submission Tool\n", + "## flux submit\n", "\n", - "To submit jobs to Flux, you can use the `flux` `submit`, `run`, `bulksubmit`, `batch`, and `alloc` commands. The `flux submit` command submits a job to Flux and prints out the jobid. " + "The `flux submit` command submits a job to Flux and prints out the jobid. " ] }, { @@ -416,7 +476,9 @@ "id": "ac798095", "metadata": {}, "source": [ - "The `flux run` command submits a job to Flux (similar to `flux submit`) but then attaches to the job with `flux job attach`, printing the job's stdout/stderr to the terminal and exiting with the same exit code as the job:" + "## flux run\n", + "\n", + "The `flux run` command submits a job to Flux (similar to `flux submit`) but then attaches to the job with `flux job attach`, printing the job's stdout/stderr to the terminal and exiting with the same exit code as the job. It's basically doing an interactive submit, because you will be able to watch the output in your terminal, and it will block your terminal until the job completes." ] }, { @@ -516,6 +578,8 @@ "id": "91e9ed6c", "metadata": {}, "source": [ + "## flux bulksubmit\n", + "\n", "The `flux bulksubmit` command enqueues jobs based on a set of inputs which are substituted on the command line, similar to `xargs` and the GNU `parallel` utility, except the jobs have access to the resources of an entire Flux instance instead of only the local system." ] }, @@ -547,7 +611,7 @@ "id": "392a8056-1661-4b76-9ca3-5e536c687e82", "metadata": {}, "source": [ - "The `--cc` option to `submit` makes repeated submission even easier via, `flux submit --cc=IDSET`:" + "The `--cc` option (akin to \"carbon copy\") to `submit` makes repeated submission even easier via, `flux submit --cc=IDSET`:" ] }, { @@ -594,7 +658,7 @@ "source": [ "Try it in the JupyterLab terminal with a progress bar and jobs/s rate report: `flux submit --cc=1-100 --watch --progress --jps hostname`\n", "\n", - "Note that `--wait` is implied by `--watch`." + "Note that `--wait` is implied by `--watch`, meaning that when you are watching jobs, you are also waiting for them to finish." ] }, { @@ -632,7 +696,7 @@ "id": "641f446c-b2e8-40d8-b6bd-eb6b9dba3c71", "metadata": {}, "source": [ - "### `flux watch` to watch jobs\n", + "## flux watch\n", "\n", "Wouldn't it be cool to submit a job and then watch it? Well, yeah! We can do this now with flux watch. Let's run a fun example, and then watch the output. We have sleeps in here interspersed with echos only to show you the live action! 🥞️\n", "Also note a nice trick - you can always use `flux job last` to get the last JOBID.\n", @@ -675,7 +739,9 @@ "id": "3f8c2af2", "metadata": {}, "source": [ - "### Listing job properties with `flux jobs`\n", + "## flux jobs\n", + "\n", + "> Used for listing job properties\n", "\n", "We can now list the jobs in the queue with `flux jobs` and we should see both jobs that we just submitted. Jobs that are instances are colored blue in output, red jobs are failed jobs, and green jobs are those that completed successfully. Note that the JupyterLab notebook may not display these colors. You will be able to see them in the terminal." ] @@ -701,12 +767,32 @@ "!flux jobs" ] }, + { + "cell_type": "markdown", + "id": "f7228e0e-557c-455c-9903-073ef40a56a5", + "metadata": {}, + "source": [ + "You might also want to see \"all\" jobs with `-a`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70dd1459-e21f-46b5-84a4-bd165cf97f4b", + "metadata": {}, + "outputs": [], + "source": [ + "!flux jobs -a" + ] + }, { "cell_type": "markdown", "id": "77ca4277", "metadata": {}, "source": [ - "Since those jobs won't ever exit (and we didn't specify a timelimit), let's cancel them all now and free up the resources." + "## flux cancel\n", + "\n", + "Since some of the jobs we see in the table above won't ever exit (and we didn't specify a timelimit), let's cancel them all now and free up the resources." ] }, { @@ -735,6 +821,8 @@ "id": "544aa0a9", "metadata": {}, "source": [ + "## flux batch\n", + "\n", "We can use the `flux batch` command to easily created nested flux instances. When `flux batch` is invoked, Flux will automatically create a nested instance that spans the resources allocated to the job, and then Flux runs the batch script passed to `flux batch` on rank 0 of the nested instance. \"Rank\" refers to the rank of the Tree-Based Overlay Network (TBON) used by the [Flux brokers](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man1/flux-broker.html).\n", "\n", "While a batch script is expected to launch parallel jobs using `flux run` or `flux submit` at this level, nothing prevents the script from further batching other sub-batch-jobs using the `flux batch` interface, if desired.\n", @@ -1067,7 +1155,7 @@ "id": "f4e525e2-6c89-4c14-9fae-d87a0d4fc574", "metadata": {}, "source": [ - "To list all completed jobs, run `flux jobs -a`:" + "We can again see a list all completed jobs with `flux jobs -a`:" ] }, { @@ -1165,7 +1253,7 @@ "\n", "And for an interesting detail, you can vary the scheduler algorithm or topology within each sub-instance, meaning that you can do some fairly interesting things with scheduling work, and all without stressing the top level system instance. Next, let's look at a prototype tool called `flux-tree` that you can use to see how this works.\n", "\n", - "## Flux tree\n", + "## flux tree\n", "\n", "Flux tree is a prototype tool that allows you to easily submit work to different levels of your flux instance, or more specifically, creating a nested hierarchy of jobs that scale out. Let's run the command, look at the output, and talk about it." ] @@ -1220,11 +1308,11 @@ "or more likely, you would want to use `flux batch` to submit multiple commands within a single flux instance to take advantage of the same\n", "hierarchy. \n", "\n", - "## Flux batch\n", + "## flux batch\n", "\n", - "Next, let's look at an example that doesn't use `flux tree` but instead uses `flux batch`, which is how you will likely interact with your nested instances. Let's start with a batch script `hello-batch.sh`.\n", + "Let's return to flux batch, but now with our new knowledge about flux instances! Flux tree is actually an experimental command that you won't encounter in the wild. Instead, you will likely interact with your nested flux instances with `flux batch`. Let's start with a batch script `hello-batch.sh`.\n", "\n", - "##### hello-batch.sh\n" + "### hello-batch.sh\n" ] }, { @@ -1701,7 +1789,7 @@ "id": "03e2ae62-3e3b-4c82-a0c7-4c97ff1376d2", "metadata": {}, "source": [ - "# Flux Process and Job Utilities ⚙️\n", + "# Process and Job Utilities ⚙️\n", "## Flux top \n", "Flux provides a feature-full version of `top` for nested Flux instances and jobs. In the JupyterLab terminal, invoke `flux top` to see the \"sleep\" jobs. If they have already completed you can resubmit them. \n", "\n", @@ -1712,9 +1800,11 @@ "\n", "## Flux proxy\n", "\n", - "### Interacting with a job hierarchy with `flux proxy`\n", + "### flux proxy\n", + "\n", + "> To interact with a job hierarchy\n", "\n", - "Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. You may want to edit `sleep_batch.sh` with the JupyterLab text editor (double click the file in the window on the left) to sleep for `60` or `120` seconds. Then from the JupyterLab terminal, run, you'll want to run the below. Yes, we really want you to open a terminal in the Jupyter launcher FILE-> NEW -> TERMINAL and run the commands below!" + "Flux proxy is used to route messages to and from a Flux instance. We can use `flux proxy` to connect to a running Flux instance and then submit more nested jobs inside it. You may want to edit `sleep_batch.sh` with the JupyterLab text editor (double click the file in the window on the left) to sleep for `60` or `120` seconds. Then from the run the commands below!" ] }, { @@ -1960,7 +2050,9 @@ "id": "73bbc90e", "metadata": {}, "source": [ - "We can then replicate our previous example of submitting multiple heterogeneous jobs and testing that Flux co-schedules them." + "### `flux.job.JobspecV1` to create job specifications\n", + "\n", + "Flux represents work as a standard called the [Jobspec](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_25.html). While you could write YAML or JSON, it's much easier to use provided Python functions that take high level metadata (command, resources, etc) to generate them. We can then replicate our previous example of submitting multiple heterogeneous jobs using these Python helpers, and testing that Flux co-schedules them." ] }, { @@ -1979,12 +2071,16 @@ } ], "source": [ + "# Here we create our job specification from a command\n", "compute_jobreq = JobspecV1.from_command(\n", " command=[\"./compute.py\", \"120\"], num_tasks=4, num_nodes=2, cores_per_task=2\n", ")\n", + "\n", + "# This is the \"current working directory\" (cwd)\n", "compute_jobreq.cwd = os.path.expanduser(\"~/flux-workflow-examples/job-submit-api/\")\n", "print(JobID(flux.job.submit(f, compute_jobreq)))\n", "\n", + "# Here is a second I/O job\n", "io_jobreq = JobspecV1.from_command(\n", " command=[\"./io-forwarding.py\", \"120\"], num_tasks=1, num_nodes=1, cores_per_task=1\n", ")\n", @@ -2018,7 +2114,9 @@ "id": "a8051640", "metadata": {}, "source": [ - "We can use the FluxExecutor class to submit large numbers of jobs to Flux. This method uses python's `concurrent.futures` interface. Example snippet from `~/flux-workflow-examples/async-bulk-job-submit/bulksubmit_executor.py`:" + "### `FluxExecutor` for bulk submission\n", + "\n", + "We can use the FluxExecutor class to submit large numbers of jobs to Flux. This method uses python's `concurrent.futures` interface. Here is an example snippet from `~/flux-workflow-examples/async-bulk-job-submit/bulksubmit_executor.py`:" ] }, { @@ -2061,70 +2159,13 @@ }, { "cell_type": "markdown", - "id": "ec052119", + "id": "5ee1c49d", "metadata": {}, "source": [ - "# Diving Deeper Into Flux's Internals\n", + "# Deeper Dive into Flux Internals 🧐️\n", "\n", - "Flux uses [hwloc](https://github.com/open-mpi/hwloc) to detect the resources on each node and then to populate its resource graph.\n", + "## flux queue\n", "\n", - "You can access the topology information that Flux collects with the `flux resource` subcommand:" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "id": "scenic-chassis", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " STATE NNODES NCORES NGPUS NODELIST\n", - " free 3 30 0 993a4f[746854,746854,746854]\n", - " allocated 1 10 0 993a4f746854\n", - " down 0 0 0 \n" - ] - } - ], - "source": [ - "!flux resource list" - ] - }, - { - "cell_type": "markdown", - "id": "0086e47e", - "metadata": {}, - "source": [ - "Flux can also bootstrap its resource graph based on static input files, like in the case of a multi-user system instance setup by site administrators. [More information on Flux's static resource configuration files](https://flux-framework.readthedocs.io/en/latest/adminguide.html#resource-configuration). Flux provides a more standard interface to listing available resources that works regardless of the resource input source: `flux resource`." - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "id": "prime-equilibrium", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " STATE UP NNODES NODELIST\n", - " avail \u001b[01;32m ✔\u001b[0;0m 4 993a4f[746854,746854,746854,746854]\n" - ] - } - ], - "source": [ - "# To view status of resources\n", - "!flux resource status" - ] - }, - { - "cell_type": "markdown", - "id": "5ee1c49d", - "metadata": {}, - "source": [ "Flux has a command for controlling the queue within the `job-manager`: `flux queue`. This includes disabling job submission, re-enabling it, waiting for the queue to become idle or empty, and checking the queue status:" ] }, @@ -2162,7 +2203,11 @@ "id": "67aa7559", "metadata": {}, "source": [ - "Each Flux instance has a set of attributes that are set at startup that affect the operation of Flux, such as `rank`, `size`, and `local-uri` (the Unix socket usable for communicating with Flux). Many of these attributes can be modified at runtime, such as `log-stderr-level` (1 logs only critical messages to stderr while 7 logs everything, including debug messages)." + "## flux getattr\n", + "\n", + "> Get attributes about your system and environment\n", + "\n", + "Each Flux instance has a set of attributes that are set at startup that affect the operation of Flux, such as `rank`, `size`, and `local-uri` (the Unix socket usable for communicating with Flux). Many of these attributes can be modified at runtime, such as `log-stderr-level` (1 logs only critical messages to stderr while 7 logs everything, including debug messages). Here is an example set that you might be interested in looking at:" ] }, { @@ -2239,6 +2284,8 @@ "id": "d74fdfcf", "metadata": {}, "source": [ + "## flux module\n", + "\n", "Services within a Flux instance are implemented by modules. To query and manage broker modules, use `flux module`. Modules that we have already directly interacted with in this tutorial include `resource` (via `flux resource`), `job-ingest` (via `flux` and the Python API) `job-list` (via `flux jobs`) and `job-manager` (via `flux queue`), and we will interact with the `kvs` module in a few cells. For the most part, services are implemented by modules of the same name (e.g., `kvs` implements the `kvs` service and thus the `kvs.lookup` RPC). In some circumstances, where multiple implementations for a service exist, a module of a different name implements a given service (e.g., in this instance, `sched-fluxion-qmanager` provides the `sched` service and thus `sched.alloc`, but in another instance `sched-simple` might provide the `sched` service)." ] }, @@ -2281,43 +2328,7 @@ "id": "ad7090eb", "metadata": {}, "source": [ - "We can actually unload the Fluxion modules (the scheduler modules from flux-sched) and replace them with `sched-simple` (the scheduler that comes built-into flux-core) as a demonstration of this functionality:" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "id": "df4bc2d5", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Module Idle S Service\n", - "job-info 4 R \n", - "heartbeat 0 R \n", - "job-manager 0 R \n", - "connector-local 0 R \n", - "content-sqlite 3 R content-backing\n", - "kvs 0 R \n", - "resource 0 R \n", - "kvs-watch 4 R \n", - "job-exec 4 R \n", - "sched-simple 0 R sched\n", - "barrier idle R \n", - "job-list 5 R \n", - "content 3 R \n", - "cron idle R \n", - "job-ingest 0 R \n" - ] - } - ], - "source": [ - "!flux module unload sched-fluxion-qmanager\n", - "!flux module unload sched-fluxion-resource\n", - "!flux module load sched-simple\n", - "!flux module list" + "See the [Flux Management Notebook](02_flux_framework.ipynb) for a small tutorial of unloading and reloading the Fluxion (flux scheduler) modules." ] }, { @@ -2325,94 +2336,19 @@ "id": "722c4ecf", "metadata": {}, "source": [ - "We can now reload the Fluxion scheduler, but this time, let's pass some extra arguments to specialize our Flux instance. In particular, let's populate our resource graph with nodes, sockets, and cores and limit the scheduling depth to 4." + "## flux dmesg\n", + "\n", + "If you need some additional help debugging your Flux setup, you might be interested in `flux dmesg`, which is akin to the [Linux dmesg](https://man7.org/linux/man-pages/man1/dmesg.1.html) but delivers messages for Flux." ] }, { "cell_type": "code", - "execution_count": 50, + "execution_count": 7, "id": "c34899ba", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Module Idle S Service\n", - "job-info 6 R \n", - "heartbeat 1 R \n", - "job-manager 0 R \n", - "connector-local 0 R \n", - "content-sqlite 0 R content-backing\n", - "kvs 0 R \n", - "resource 0 R \n", - "kvs-watch 6 R \n", - "job-exec 0 R \n", - "barrier idle R \n", - "job-list 6 R \n", - "sched-fluxion-resource 0 R \n", - "sched-fluxion-qmanager 0 R sched\n", - "content 0 R \n", - "cron idle R \n", - "job-ingest 1 R \n", - "2024-04-12T05:02:45.207563Z sched-fluxion-qmanager.debug[0]: effective queue params (queue=default): queue-depth=4\n" - ] - } - ], - "source": [ - "!flux dmesg -C\n", - "!flux module unload sched-simple\n", - "!flux module load sched-fluxion-resource load-allowlist=node,socket,core\n", - "!flux module load sched-fluxion-qmanager queue-params=queue-depth=4\n", - "!flux module list\n", - "!flux dmesg | grep queue-depth" - ] - }, - { - "cell_type": "markdown", - "id": "ed4b0e04", - "metadata": {}, - "source": [ - "The key-value store (KVS) is a core component of a Flux instance. The `flux kvs` command provides a utility to list and manipulate values of the KVS. Modules of Flux use the KVS to persistently store information and retrieve it later on (potentially after a restart of Flux). One example of KVS use by Flux is the `resource` module, which stores the resource set `R` of the current Flux instance:" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "id": "nervous-broadcast", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "admin archive job resource\n", - "R eventlog\n", - "\u001b[1;39m{\n", - " \u001b[0m\u001b[34;1m\"version\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m1\u001b[0m\u001b[1;39m,\n", - " \u001b[0m\u001b[34;1m\"execution\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", - " \u001b[0m\u001b[34;1m\"R_lite\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", - " \u001b[1;39m{\n", - " \u001b[0m\u001b[34;1m\"rank\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"0-3\"\u001b[0m\u001b[1;39m,\n", - " \u001b[0m\u001b[34;1m\"children\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m{\n", - " \u001b[0m\u001b[34;1m\"core\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"0-9\"\u001b[0m\u001b[1;39m\n", - " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", - " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", - " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", - " \u001b[0m\u001b[34;1m\"starttime\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m0\u001b[0m\u001b[1;39m,\n", - " \u001b[0m\u001b[34;1m\"expiration\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;39m0\u001b[0m\u001b[1;39m,\n", - " \u001b[0m\u001b[34;1m\"nodelist\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", - " \u001b[0;32m\"993a4f[746854,746854,746854,746854]\"\u001b[0m\u001b[1;39m\n", - " \u001b[1;39m]\u001b[0m\u001b[1;39m\n", - " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", - "\u001b[1;39m}\u001b[0m\n" - ] - } - ], + "outputs": [], "source": [ - "!flux kvs ls \n", - "!flux kvs ls resource\n", - "!flux kvs get resource.R | jq" + "!flux dmesg" ] }, { @@ -2420,6 +2356,8 @@ "id": "c3920f9e", "metadata": {}, "source": [ + "## flux exec\n", + "\n", "Flux provides a built-in mechanism for executing commands on nodes without requiring a job or resource allocation: `flux exec`. `flux exec` is typically used by sys admins to execute administrative commands and load/unload modules across multiple ranks simultaneously." ] }, diff --git a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb index f540653e..6d6305f8 100644 --- a/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb +++ b/2024-RADIUSS-AWS/JupyterNotebook/tutorial/02_flux_framework.ipynb @@ -22,7 +22,7 @@ "
\n", "\n", "
\n", - "Image created by Ian Lumsden for this tutorial
\n", + "Image created by Ian Lumsden for the Flux tutorials\n", "
\n", "\n", "Each broker is a program built on top of the ∅MQ networking library. The broker contains two main components. First, the broker implements Flux-specific networking abstractions over ∅MQ, such as remote-proceedure call (RPC) and publication-subscription (pub-sub). Second, the broker contains several core services, such as PMI (for MPI support), run control support (for enabling automatic startup of other services), and, most importantly, broker module management. The remainder of a Flux broker's functionality comes from broker modules: specially designed services that the broker can deploy in independent OS threads. Some examples of broker modules provided by Flux include:\n",