- Introduction
- Installing Enzyme
- Getting Started with Enzyme
- Enzyme User Guide
- Additional Examples
- Cloud Provider Quick Reference
- This is a relatively new project and should be considered Alpha level software
Enzyme is a software tool that helps provide an accelerated path to spinning up and utilizing high-performance compute clusters in the cloud. Enzyme offers a simple command-line mechanism for users to launch workloads pointing to templates that abstract the orchestration and Operating System image generation of the HPC cluster. It creates operating system images that follow the Intel® HPC Platform Specification to provide a standard base solution that enables a wide range of popular HPC applications. Enzyme aims to accelerate the path for users wanting to migrate to a public cloud by abstracting the learning curve of a supported cloud provider. This allows users a "rapid" path to start using cloud resources, and it will enable the Enzyme community to collaborate to provide optimal environments for the underlying HPC solutions.
There are many reasons for running HPC and compute-intensive workloads in a cloud environment. The following are some of the top motivators behind Enzyme, but the list is not exhaustive.
- Local HPC cluster resource capacity is typically fixed while demand is variable. Cloud resources provide augmentation to local resources that help meet spikes in resource needs on-demand.
- Cloud-based HPC clusters can simplify and accelerate access for new HPC users and new businesses, resulting in faster time to results.
- Cloud provides a means to access massive resources or specialized resources for short periods, to address temporary or intermittent business needs.
- Cloud provides access to the newest technologies, allowing evaluation and use ahead of long-term ownership
- Datasets may already exist in the cloud, and utilizing cloud resources may be the best option for performance and/or cost.
The Intel HPC Platform Specification captures industry best practices, optimized Intel runtime requirements, and broad application compatibility needs. These requirements form a foundation for high-performance computing solutions to provide enhanced compatibility and performance across a range of popular HPC workloads. Intel developed this specification by collaborating with many industry partners, incorporating feedback, and curating the specification since 2007.
You need to install:
- Go
- make: for Windows, for Linux
Clone Enzyme repository from Github
Enzyme uses open source tools from Hashicorp, and those tools are included as sub-modules in the Enzyme git repository. To ensure all the required source is cloned, it is suggested to use the following:
git clone --recurse-submodules https://github.com/intel-go/Enzyme
If needed, the sub-modules can be downloaded after cloning by using
git submodule init
git submodule update
Note: some firewall configurations can impact access to git repositories.
Enzyme uses make to build the binary from Go source code. Build Enzyme by specifying the make command and optionally including the target OS platform using the GOOS
command-line option. Options are currently windows
or linux
. If no OS is specified, the default build assumes linux
Note: the make file does not currently support building for Windows under a Windows cmd shell. To build to run Enzyme from a Windows platform, use a Windows Bash implementation and run the following make command:
make GOOS=windows
If the build completes successfully, the Enzyme build will create a sub-directory called package-{GOOS}-amd64
that includes the binaries and supporting directory structures for executing Enzyme. In addition, the sub-directory package is archived into package-{GOOS}-amd64-{version}-{hash}.tar.gz
for easy distribution.
The binary package name for Linux is Enzyme
, and the binary package name for Windows is Enzyme.exe
. The command-line examples in this guide all use the Linux binary name. For use from a Windows system, substitute the Enzyme
command with Enzyme.exe
.
Enzyme takes several input parameters that provide user credentials for a target cloud account, templates for the cloud provider, and templates for the desired image to run on top of in the cloud. These JSON inputs are combined into a single structure to drive the Hashicorp tools, terraform and packer, to create machine images and spin up the cluster. The combined structure is saved in the .Enzyme/
of the Enzyme package directory.
Enzyme requires an active account for the desired cloud provider. Access to that user account is utilized by providing access keys and account information in a credentials JSON file. Cloud providers typically offer mechanisms to create this credentials file. See the appropriate Cloud Provider Quick Reference section for referencing how to create a user credentials file for a specific provider. Please note that these provider-specific mechanisms may change.
The user credentials file needs to be copied to the user's host system, where Enzyme will execute. To use Enzyme without specifying a full path to the desired user credentials file, copy the cloud provider credentials file to ./user_credentials/credentials.json
in the Enzyme binary directory. Enzyme uses this file as the default to access the desired cloud provider account. Enzyme does provide a command-line option to use a different path and filename for credentials if desired. For example, a user may have more than one accounts and thus have multiple user credentials files that are specified by the command line option with each run.
Enzyme uses template files to direct how to build a cluster and how to build the compute node operating system to run workloads on the desired type of instance within the desired cloud provider. These templates are JSON files that provide variables that control how Enzyme uses the Hashicorp tools. These templates may be curated and expanded to provide additional user options and customizations.
Cloud provider templates are provided under the .\templates
directory and are typically named after the cloud service provider. The templates cluster_template.json
and image_template.json
under a given cloud provider directory control instance and image creation, respectively.
For example, a hypothetical cloud provider called MyCloud would have:
./templates/mycloud/cluster_template.json
./templates/mycloud/image_template.json
A user specifies which cloud provider templates to use with the-p
or --provider
command line parameter. Enzyme currently defaults to using the Google Cloud Provider templates. To use the hypothetical MyCloud providers then, a user includes -p mycloud
or --provider mycloud
on the command line.
Now let's execute a real job as a "Hello, World" test that Enzyme is working. To do this, we'll use the well-known High-Performance LINPACK benchmark that is included in the ./examples
folder. This example will use the default cloud provider. To test a different or multiple cloud providers, insert the -p
option with the name of the directory of the desired provider in the example below.
To execute a workload through Enzyme, the user specifies the job to launch (or typically a launch script) and points to the Enzyme parameter file and any potential input data files. The Enzyme parameter file fills out and replaces default values used in the execution. This allows a user to modify some aspects of execution without needing to modify the cloud provider or image templates themselves. An important parameter is the project name associated with the user account. This must be set correctly in the project parameter file.
With this in mind, three steps are all that is required to test execution using HP-LINPACK.
-
Copy the user credentials file to
./user_credentials/credentials.json
. This is the default credentials file Enzyme uses. -
Modify the
./examples/linpack/linpack-cluster.json
file to set theproject_name
value to the actual name of the cloud project. For example, if the cloud project name is My-Hpc-Cloud-Cluster, modify the key-value pair in the JSON file to beproject_name: "My-Hpc-Cloud-Cluster",
-
Execute the command to run HP-LINPACK through Enzyme on the default cloud provider. The following command uses both the default cloud provider as well as the default user credentials file (from Step 1).
Enzyme run examples/linpack/linpack-cluster.sh --parameters examples/linpack/linpack-cluster.json --upload-files examples/linpack/HPL.dat
Enzyme will begin building a compute node operating system and installing on the desired instance types in the cloud provider. If that is successful, Enzyme will launch HP-LINPACK on the cluster. Enzyme reports progress along the way, so there should be periodic output displayed on console.
If the end of output should looks like this:
*Finished 1 tests with the following results:*
*1 tests completed and passed residual checks,*
*0 tests completed and failed residual checks,*
*0 tests skipped because of illegal input values.*
--------------------------------------------------------------------------------
*End of Tests.*
then HP-LINPACK successfully executed in the cloud. Congratulations!
Unfortunately, if there is an issue, Enzyme does not have a well-documented debug section yet. That is a work in progress! Stay tuned. Troubleshooting areas to check:
-
TerraForm and Packer executables exist under the
./tools
directory. If not, there was a problem building those tools during the Enzyme build. -
A cluster does not appear in the cloud provider dashboard while running Enzyme. Potential problems could be a problem with the user account permissions, incorrect user credentials file, or incorrect project name identified in the
./examples/linpack/linpack-cluster.json
file.
Enzyme run task.sh --parameters path/to/parameters.json
This command will instantiate a cloud-based cluster and run the specified task. On first use, the machine image will be automatically created. After the task is completed, the cluster will be destroyed, but the machine image will be left intact for future use.
Enzyme run task.sh --parameters path/to/parameters.json --keep-cluster
This command will instantiate the requested cluster and storage for the specified task. The required images will be created on first use. Using --use-storage
option allows you to access data living on the storage node. NOTICE: make sure you don't change parameters in configuration except storage_disk_size
, otherwise, a new storage will be created after parameters are changed. Currently, changing storage_disk_size
has no effect, and the disk keeps its previous size to force it to resize, destroy the storage node and delete the disk in the cloud provider interface.
You can create a persistent cluster without running a task. For this, just use the create cluster command.
Enzyme run task.sh --parameters path/to/parameters.json --use-storage
This command will instantiate the requested cluster and storage and then run the specified task. As before, the required images will be created on first use. Using --use-storage
option allows you to access to storage data. NOTICE: make sure you didn't change parameters in configuration except storage_disk_size
. Otherwise, a new storage will be created after parameters are changed. storage_disk_size
changing is ignored, disk keeps the previous size.
You can create storage without running a task. For this, just use the create storage command.
Enzyme destroy destroyObjectID
You can destroy a cluster or storage by destroyObjectID, which can be found by checking state.
NOTICE: The disk is kept when the storage is destroyed. Only the VM instances will be removed, and the "storage" Enzyme entity will change its status from XXXX to configured. You can delete a disk manually through a selected provider if you want to.
Enzyme creates image --parameters path/to/parameters.json
This command tells Enzyme to create a VM image from a single configuration file. You can check for created images in the cloud provider interface if you want to.
Enzyme creates cluster --parameters path/to/parameters.json
This command tells Enzyme to spawn VM instances and form a cluster. It also creates the needed image if it doesn't yet exist.
Enzyme create storage --parameters path/to/parameters.json
This command tells Enzyme to create VM instance based on a disk that holds your data. You can use storage to organize your data and control access to it. Storage locates in /storage
folder on VM instance. It also creates the needed image if it doesn't exist yet.
Uploading data into the storage is outside the scope of Enzyme. Enzyme only provides information allowing you to connect to the storage using rhoc state
state command.
Enzyme state
This command enumerates all manageable entities (images, clusters, storage, etc.) and their respective status. For cluster and storage entities, additional information about SSH/SCP connection (user name, address, and security keys) is provided in order to facilitate access to these resources.
Enzyme version
Use this command with one of the additional arguments: image, cluster, task.
Enzyme print-vars image
You can use --provider
flag to check parameters specific for the certain provider (default: GCP)
Enzyme help
This command prints a short help summary. Also, each Enzyme command has a --help
switch for providing command-related help.
Use -v
or --verbose
flag with any command to get extended info.
Use -s
or --simulate
flag with any command to simulate running the execution without actually running any commands that can modify anything in the cloud or locally. Useful for checking what Enzyme would perform without actually performing it.
-
-p, --provider
select provider (default:gcp
)gcp
- Google Cloud Platformaws
- Amazon Web Services -
-c, --credentials
path to credentials file (default:user_credentials/credentials.json
) -
-r, --region
location of your cluster for selected provider (default:us-central1
) -
-z, --zone
location of your cluster for selected provider (default:a
) -
--parameters
path to file with user parameters
You can define the above parameters only via command line.
Parameters presented below can be used in the configuration file and command line. When specified in the command line, they override parameters from the configuration file.
For applying them by command line use
--vars
list of user's variables (example:"image_name=Enzyme,disk_size=30"
)
A task combines parameters from all entities it might need to create. For individual entities see:
--keep-cluster
keep the cluster running after script is done--use-storage
allow accessing to storage data--newline-conversion
enable conversion of DOS/Windows newlines to UNIX newlines for the uploaded script (useful if you're running Enzyme on Windows)--overwrite
overwrite the content of the remote file with the content of the local file--remote-path
name for the uploaded script on the remote machine (default:"./Enzyme-script"
)--upload-files
files for copying into the cluster (into~/Enzyme-upload
folder with the same names)--download-files
files for copying from the cluster (into./Enzyme-download
folder with the same names)
project_name
(default:"zyme-cluster"
)user_name
user name for ssh access (default:"ec2-user"
)image_name
name of the image of the machine being created (default:"zyme-worker-node"
)disk_size
size of image boot disk, in GB (default:"20"
)
-
project_name
(default:"zyme-cluster"
) -
user_name
user name for ssh access (default:"ec2-user"
) -
cluster_name
name of the cluster being created (default:"sample-cloud-cluster"
) -
image_name
name of the image which will be used (default:"zyme-worker-node"
) -
worker_count
count of worker nodes (default:"2"
)**NOTICE**: *Must be greater than 1*
-
login_node_root_size
boot disk size for login node, in GB (default:"20"
)**NOTICE**: *Must be no less than `disk_size`*
-
instance_type_login_node
machine type of root node (default:"f1-micro"
for GCP) -
instance_type_worker_node
machine type of worker nodes (default:"f1-micro"
for GCP) -
ssh_key_pair_path
(default:"private_keys"
) -
key_name
(default:"hello"
)
project_name
(default:"zyme-cluster"
)user_name
user name for ssh access (default:"ec2-user"
)storage_name
name of the storage being created (default:"zyme-storage"
)image_name
name of the image which will be used (default:"zyme-worker-node"
)storage_disk_size
size of permanent disk, in GB (default:"50"
)storage_instance_type
machine type of storage node (default:"f1-micro"
for GCP)ssh_key_pair_path
(default:"private_keys"
)storage_key_name
(default:"hello-storage"
)
The included examples in this section all assume the correct build of Enzyme and the correct set up of user credentials. The examples will use the default cloud provider and the default user credentials file.
LAMMPS is a molecular dynamics simulation application. The included workload will launch a container to execute LAMMPS on a single compute node. This requires the use of the storage
capabilities of Enzyme.
-
Create storage for the LAMPPS workload
./Enzyme create storage --parameters=examples/lammps/lammps-single-node.json
-
Use information from
./Enzyme state
to get connection details to the storage node created in step 1. SSH into the storage nodeusing provided private key and IP address and execute the following commands:sudo mkdir /storage/lammps/ chown lammps-user /storage/lammps/
Then log out of the storage node.
-
Upload
lammps.avx512.simg
container into/storage/lammps/
, e.g. byscp -i path/to/private_key.pem path/to/lammps.avx512.simg lammps-user@storage-address:/storage/lammps/
-
Execute the LAMMPS benchmark through Enzyme
Enzyme run examples/lammps/lammps-single-node.sh --parameters=examples/lammps/lammps-single-node.json --use-storage --download-files=lammps.log
If successful, the content of Enzyme-download/lammps.log
file should look like this (Note: this was received by running on 4 cores):
args: 2
OMP_NUM_THREADS=1
NUMCORES=4
mpiexec.hydra -np 4 ./lmp_intel_cpu_intelmpi -in WORKLOAD -log none -pk intel 0 omp 1 -sf intel -v m 0.2 -screen
Running: airebo Performance: 1.208 timesteps/sec
Running: dpd Performance: 9.963 timesteps/sec
Running: eam Performance: 9.378 timesteps/sec
Running: lc Performance: 1.678 timesteps/sec
Running: lj Performance: 19.073 timesteps/sec
Running: rhodo Performance: 1.559 timesteps/sec
Running: sw Performance: 14.928 timesteps/sec
Running: tersoff Performance: 7.026 timesteps/sec
Running: water Performance: 7.432 timesteps/sec
Output file lammps-cluster-login_lammps_2019_11_17.results and all the logs for each workload lammps-cluster-login_lammps_2019_11_17 ... are located at /home/lammps-user/lammps
- Important Destroy storage using the
./Enzyme destroy
command with the storage ID to avoid unintended storage fees with the cloud provider.
OpenFOAM is a computation fluid dynamics application.
- Run OpenFOAM benchmark, where 7 is the
endTime
of computing benchmark:Enzyme run -r us-east1 -z b --parameters examples/openfoam/openfoam-single-node.json --download-files DrivAer/log.simpleFoam --overwrite examples/openfoam/openfoam-single-node.sh 7
Full log of running OpenFOAM should be available as Enzyme-download/log.simpleFoam
This section is intended to provide easy references to cloud providers relative to Enzyme setup.
Help generating the user credentials for Amazon Web Services
Google Cloud Platform Account Information
Help generating the user credentials for Google Cloud Platform