Skip to content

Commit

Permalink
Merge branch 'main' of github.com:cloudmesh/cloudmesh-ee
Browse files Browse the repository at this point in the history
  • Loading branch information
laszewsk committed Oct 5, 2023
2 parents 83f5cb4 + 98e829a commit 9651ee3
Show file tree
Hide file tree
Showing 32 changed files with 175 additions and 403 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 4.3.20
current_version = 4.3.22
commit = True
tag = False

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
with:
python-version: 3.10.4

- name: Install cloudmesh-sbatch
- name: Install cloudmesh-ee
run: |
python -m pip install -r requirements.txt
python -m pip install -r requirements-dev.txt
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package=sbatch
package=ee
UNAME=$(shell uname)
VERSION=`head -1 VERSION`

Expand Down
57 changes: 28 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Cloudmesh Sbatch
# Cloudmesh ee

A general purpose HPC Template and Experiment management system

Expand All @@ -25,17 +25,17 @@ number of parallel accessible resources. In some cases these
restrictions are soo established that removing them is impractical and
takes weks to implement on temporary basis.

Cloudmesh Sbatch is a framework that wraps the SLURM batch processor
into a templated framework such that experiments can be generated
based on configuration files focusing on the livecycle of generating
many permutations of experiments with standard tooling, so that you
can focus more on modeling your experiments than how to orchestrate
them with tools. A number of batch scripts can be generated that than
can be executed according to center policies.
Cloudmesh Experiment Executor (ee) is a framework that wraps the SLURM
batch processor into a templated framework such that experiments can
be generated based on configuration files focusing on the livecycle
of generating many permutations of experiments with standard tooling,
so that you can focus more on modeling your experiments than how to
orchestrate them with tools. A number of batch scripts can be
generated that than can be executed according to center policies.

## Dependencies

When you install cloudmesh-sbatch, you will also be installing a
When you install cloudmesh-ee, you will also be installing a
minimum baseline of the `cms` command (as part of the Cloudmesh
ecosystem). For more details on Cloudmesh, see its documentation on
[read the docs](https://cloudmesh.github.io/cloudmesh-manual/). However
Expand All @@ -46,18 +46,18 @@ need to initialize cloudmesh with the command
$ cms help
```

While SLURM is not needed to run the `cloudmesh sbatch` command, the
generated output will not exectue unless your system has slurm installed
While SLURM is not needed to run the `cloudmesh ee` command, the
generated output will not execute unless your system has slurm installed
and you are able to run jobs via the `slurm sbatch` command.

## Documentation

### Running Cloudmesh SBatch
### Running Cloudmesh ee

The `cloudmesh sbatch` command takes one of two forms of execution. It is started with
The `cloudmesh ee` command takes one of two forms of execution. It is started with

```bash
$ cms sbatch <command> <parameters>
$ cms ee <command> <parameters>
```

Where the command invokes a partiuclar action and parameters include a
Expand All @@ -68,7 +68,7 @@ functions as expected and as intended.
In general, configuration arguments that appear in multiple locations are
prioritized in the following order (highest priority first)

1. CLI Arguments with `cms sbatch`
1. CLI Arguments with `cms ee`
2. Configuration Files
3. Preset values

Expand All @@ -79,8 +79,8 @@ configuration file, or via CLI arguments. You can issue the command using
either of the below forms:

```text
cms sbatch generate SOURCE --name=NAME [--verbose] [--mode=MODE] [--config=CONFIG] [--attributes=PARAMS] [--out=DESTINATION] [--dryrun] [--noos] [--nocm] [--dir=DIR] [--experiment=EXPERIMENT]
cms sbatch generate --setup=FILE [SOURCE] [--verbose] [--mode=MODE] [--config=CONFIG] [--attributes=PARAMS] [--out=DESTINATION] [--dryrun] [--noos] [--nocm] [--dir=DIR] [--experiment=EXPERIMENT] [--name=NAME]
cms ee generate SOURCE --name=NAME [--verbose] [--mode=MODE] [--config=CONFIG] [--attributes=PARAMS] [--out=DESTINATION] [--dryrun] [--noos] [--nocm] [--dir=DIR] [--experiment=EXPERIMENT]
cms ee generate --setup=FILE [SOURCE] [--verbose] [--mode=MODE] [--config=CONFIG] [--attributes=PARAMS] [--out=DESTINATION] [--dryrun] [--noos] [--nocm] [--dir=DIR] [--experiment=EXPERIMENT] [--name=NAME]
```

If you have prepared a configuration file that conforms to the schema
Expand Down Expand Up @@ -110,7 +110,7 @@ form which overrides the default values.
### Form 2 - Generating Submission Scripts

```text
sbatch generate submit --name=NAME [--verbose]
ee generate submit --name=NAME [--verbose]
```

This command uses the output of the
Expand All @@ -121,7 +121,7 @@ outputs to SLURM as a sequence of sbatch commands.
* `--name=NAME` - specifies the name used in the
[generate command](#command-1---generating-experiments).
The generate command will inspect the `<NAME>.json` file and build the
necessary commands to run all permutations that the cloudmesh sbatch
necessary commands to run all permutations that the cloudmesh ee
command generated.

Note that this command only generates the script, and you must run the
Expand All @@ -134,10 +134,10 @@ run your jobs.
This command requires a YAML file which is configured for the host and gpu.
The YAML file also points to the desired slurm template.

```python
```yaml
slurm_template: 'slurm_template.slurm'

sbatch_setup:
ee_setup:
<hostname>-<gpu>:
- card_name: "a100"
- time: "05:00:00"
Expand All @@ -150,14 +150,13 @@ sbatch_setup:
- num_cpus: 6
- num_gpus: 1


```

example:

```
cms sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment=\"epoch=[1-3] x=[1,4] y=[10,11]\"
sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
cms ee slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment=\"epoch=[1-3] x=[1,4] y=[10,11]\"
ee slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
# ERROR: Importing python not yet implemented
epoch=1 x=1 y=10 sbatch example/slurm.sh
epoch=1 x=1 y=11 sbatch example/slurm.sh
Expand All @@ -171,7 +170,7 @@ epoch=3 x=1 y=10 sbatch example/slurm.sh
epoch=3 x=1 y=11 sbatch example/slurm.sh
epoch=3 x=4 y=10 sbatch example/slurm.sh
epoch=3 x=4 y=11 sbatch example/slurm.sh
Timer: 0.0022s Load: 0.0013s sbatch slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
Timer: 0.0022s Load: 0.0013s ee slurm.in.sh --config=a.py,b.json,c.yaml --attributes=a=1,b=4 --noos --dir=example --experiment="epoch=[1-3] x=[1,4] y=[10,11]"
```

## Slurm on a single computer ubuntu 20.04
Expand Down Expand Up @@ -301,18 +300,18 @@ JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
```

### sbatch slurm manageement commands for localhost
### sbatch slurm management commands for localhost

start slurm deamons
start slurm daemons

```bash
cms sbatch slurm start
cms ee slurm start
```

stop surm deamons

```bash
cms sbatch slurm stop
cms ee slurm stop
```

BUG:
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
4.3.20
4.3.22
2 changes: 1 addition & 1 deletion cloudmesh/ee/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""Management of custom batch script for experiments while exploring parameter sets."""
__version__ = "4.3.20"
__version__ = "4.3.22"
2 changes: 1 addition & 1 deletion cloudmesh/ee/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
version = "4.3.20"
version = "4.3.22"
2 changes: 1 addition & 1 deletion cloudmesh/ee/experimentexecutor.py
Original file line number Diff line number Diff line change
Expand Up @@ -611,7 +611,7 @@ def generate_submit(self, name=None, job_type='slurm'):
name = f"{name}.json"

if job_type == 'slurm':
cmd = 'ee'
cmd = 'sbatch'
elif job_type == 'lsf':
cmd = 'bsub'
else:
Expand Down
1 change: 1 addition & 0 deletions cloudmesh/ee/tools/parallel_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import subprocess
import yaml
import json
import concurrent.futures
import logging
from docopt import docopt
Expand Down
4 changes: 2 additions & 2 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ BUILDDIR = build

man:
mkdir -p source/manual
cms help sbatch | grep -v "# Timer" | grep -v "patch enabled so applying the patch" | grep -v "Alpha Channel fix" > source/manual/sbatch.rst
cms help ee | grep -v "# Timer" | grep -v "patch enabled so applying the patch" | grep -v "Alpha Channel fix" > source/manual/ee.rst


# Put it first so that "make" without argument is like "make help".
Expand All @@ -28,5 +28,5 @@ help:

#man:
# mkdir -p source/manual
# cms help sbatch | grep -v "# Timer" > source/manual/sbatch.rst
# cms help ee | grep -v "# Timer" > source/manual/ee.rst

6 changes: 3 additions & 3 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Cloudmesh sbatch
Cloudmesh ee
================

.. autosummary::
:toctree: generated

cloudmesh.sbatch
manual/sbatch
cloudmesh.ee
manual/ee

.. rubric:: Modules

Expand Down
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
cloudmesh.sbatch.command package
cloudmesh.ee.command package
================================

Submodules
----------

cloudmesh.sbatch.command.sbatch module
cloudmesh.ee.command.ee module
--------------------------------------

.. automodule:: cloudmesh.sbatch.command.sbatch
.. automodule:: cloudmesh.ee.command.ee
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: cloudmesh.sbatch.command
.. automodule:: cloudmesh.ee.command
:members:
:undoc-members:
:show-inheritance:
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cloudmesh.sbatch package
cloudmesh.ee package
========================

Subpackages
Expand All @@ -7,31 +7,31 @@ Subpackages
.. toctree::
:maxdepth: 4

cloudmesh.sbatch.command
cloudmesh.ee.command

Submodules
----------

cloudmesh.sbatch.sbatch module
cloudmesh.ee.experimentexecutor module
------------------------------

.. automodule:: cloudmesh.sbatch.sbatch
.. automodule:: cloudmesh.ee.experimentexecutor
:members:
:undoc-members:
:show-inheritance:

cloudmesh.sbatch.slurm module
cloudmesh.ee.slurm module
-----------------------------

.. automodule:: cloudmesh.sbatch.slurm
.. automodule:: cloudmesh.ee.slurm
:members:
:undoc-members:
:show-inheritance:

Module contents
---------------

.. automodule:: cloudmesh.sbatch
.. automodule:: cloudmesh.ee
:members:
:undoc-members:
:show-inheritance:
2 changes: 1 addition & 1 deletion docs/source/cloudmesh.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Subpackages
.. toctree::
:maxdepth: 4

cloudmesh.sbatch
cloudmesh.ee

Module contents
---------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
import sphinx_rtd_theme
import os
import sys
import cloudmesh.sbatch
from cloudmesh.sbatch.__version__ import version as cc_version
import cloudmesh.ee
from cloudmesh.ee.__version__ import version as cc_version

rtd = True
# rtd = False
Expand All @@ -25,7 +25,7 @@
html_theme = "sphinx_rtd_theme"


project = 'cloudmesh-sbatch'
project = 'cloudmesh-ee'
copyright = '2022, Gregor von Laszewski'
author = 'Gregor von Laszewski'
release = cc_version
Expand Down
6 changes: 3 additions & 3 deletions docs/source/generated/cloudmesh.sbatch.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cloudmesh.sbatch
================
cloudmesh.ee
============

.. automodule:: cloudmesh.sbatch
.. automodule:: cloudmesh.ee



Expand Down
4 changes: 2 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
cloudmesh-sbatch
cloudmesh-ee
================

Generation of experiment submission scripts based on parameter permutations.
Expand All @@ -9,7 +9,7 @@ Generation of experiment submission scripts based on parameter permutations.
:caption: Documentation

readme
manual/sbatch
manual/ee
slurm

.. toctree::
Expand Down
Loading

0 comments on commit 9651ee3

Please sign in to comment.