title | author | date |
---|---|---|
Quick start guide to `hpcflow` |
Adam J. Plowman |
April 2019 |
- Package is to be installed on the CSF with
pip install --user hpcflow
- A console entry point
submit
will be installed, which can be used within a directory that contains at least one file like<submission_filename_fmt>
, as specified in the configuration file. - Store profiles by default in in
~/.hpcflow/profiles
(maybe allow changing.hpcflow
directory with an environment variable) - Store configuration file by default in
~/.hpcflow/config.yml
- If a
<directory_list_filename_fmt>
file is found in the submission spec directory, job arrays are submitted
hpcflow --version
hpcflow make
- Generate a Workflow.
hpcflow submit
ORsubmit
- Generate a Workflow if it doesn't exist and then submit (write jobscripts and execute) all job specs in the current working directory
hpcflow submit <spec>
ORsubmit <spec>
- Submit a named job spec
hpcflow install-example <name>
- Install an example set of profiles from the
examples
directory (files with the same name will be overwritten).
- Install an example set of profiles from the
hpcflow add-inputs <name> <dirname>
:- Add an example set of input files to be used with the example profile the
examples
directory - This involves merging
_variable_lookup.yml
and_archive_locations.yml
from the example into the user's profile directory
- Add an example set of input files to be used with the example profile the
hpcflow write-cmd <job_name>
:- Write the command file for a given jobscript. This script is invoked within jobscripts at execution-time and is not expected to be invoked by the user. The
write-cmd
process involves opening the JSON representation of the profile set and resolving variables for the set of commands within a given command group.
- Write the command file for a given jobscript. This script is invoked within jobscripts at execution-time and is not expected to be invoked by the user. The
hpcflow show-stats
- Show statistics like CPU walltime for the profile set in the current directory.
hpcflow clean
- Remove all
hpcflow
-generated files from the current directory (use confirmation?)
- Remove all
hpcflow hfstat
ORhfstat
- Show status of running tasks and how many completed tasks within this directory
- Of the above commands, the following interact with the local database:
submit
write-cmd
show-stats
hfstat
- Invoking any of these commands should therefore set up the relevant database connection.
- Only
submit
should invoke thecreate_all(engine)
method, all other commands should fail if no database exists.
profile
:- The name of a profile file (without its extension, if it exists) in the profiles directory from which this specification should inherit.
options
:- Options to be passed directly to the jobscript
command_groups
:- Groupings of command to be executed within the jobscripts
commands
:- List of commands to execute within one jobscript
parallel
:variables
:- If
True
, and variables have more than one value, the command set will be executed within a job array (i.e in ~parallel). IfFalse
, and variables have more than one value, the command set will be executed within afor
loop in the jobscript. This can only be set toTrue
if the number of variable values is known at submission time.
- If
variables
:- Definitions of variables that can appear in the commands of
command_groups
.
- Definitions of variables that can appear in the commands of
- Scheduler using e.g. SGE or directly executed
- [If scheduled] Job array versus single job with some sort of loop
- Parallel versus serial job execution
- There are two parameters that dictate when job arrays are used versus a loop:
job_array
job_array_variables
- Both parameters can be set at the profile level.
- For a given profile, if
job_array_variables
is set toTrue
, thenjob_array
must also be set toTrue
. - For a given profile, if
job_array
is set toTrue
, thenjob_array
must be set toTrue
for all subsequent profiles. (May be relaxed in future.)
- Variables are defined in the
variables
dict of the spec file.
file_regex
:pattern
:- A valid regex pattern that matches one or more file names.
group
:- The regex match group index to extract from the regex pattern. Default is
0
.
- The regex match group index to extract from the regex pattern. Default is
type
:- Determines what data type the regex match group should be cast to; one of
str
,int
,float
orbool
. Default isstr
.
- Determines what data type the regex match group should be cast to; one of
subset
:- A list of values of type
file_regex --> type
to include.
- A list of values of type
data
:- A list of data to be formatted in
value
. Specifydata
orfile_regex
(or neither), but not both. |
- A list of data to be formatted in
value
:- The formatted values of the variable. This is a string that may include other variables (using the
<<variable_name>>
syntax) and, if eitherfile_regex
ordata
is specified, at least one Python-like format specifier must be included. If multiple Python-like format specifier is included, the same value will appear in all instances of that format specifier. Default is{:s}
.
- The formatted values of the variable. This is a string that may include other variables (using the
- The
_variable_lookup.yml
file provides a place to store frequently used variables.
- The
_variable_lookup.yml
file also includes variable templates, which allows a simple parametrisation of variable generation.