Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install enroot + pyxis in AMIs during build process #2793

Merged
merged 2 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This file is used to list changes made in each version of the AWS ParallelCluste
- Allow custom actions on login nodes.
- Allow DCV connection on login nodes.
- Add new attribute `efs_access_point_ids` to specify optional EFS access points for the mounts
- Install enroot and pyxis in official pcluster AMIs

**BUG FIXES**
- Fix EFA kmod installation with RHEL 8.10 or newer.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@
# ArmPL
default['conditions']['arm_pl_supported'] = arm_instance?

# Enroot + Pyxis
default['cluster']['enroot']['version'] = '3.4.1'
default['cluster']['pyxis']['version'] = '0.19.0'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Kitchen and Unit tests for these changes

# NVidia
default['cluster']['nvidia']['enabled'] = 'no'
default['cluster']['nvidia']['driver_version'] = '535.183.01'
Expand Down
3 changes: 3 additions & 0 deletions cookbooks/aws-parallelcluster-platform/recipes/config.rb
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,6 @@
include_recipe 'aws-parallelcluster-platform::supervisord_config'
fetch_config 'Fetch and load cluster configs'
include_recipe 'aws-parallelcluster-platform::config_login' if node['cluster']['node_type'] == 'LoginNode'
enroot 'Configure Enroot' do
action :configure
end
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@
include_recipe "aws-parallelcluster-platform::intel_mpi"
arm_pl 'Install ARM Performance Library'
intel_hpc 'Setup Intel HPC'
enroot 'Setup Enroot'
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

# Copyright:: 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

provides :enroot, platform: 'amazon' do |node|
node['platform_version'].to_i == 2023
end

use 'partial/_enroot_common.rb'
use 'partial/_enroot_rhel.rb'

def prerequisites
%w(jq squashfs-tools parallel pigz zstd)
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# frozen_string_literal: true

# Copyright:: 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

provides :enroot, platform: 'amazon', platform_version: '2'

use 'partial/_enroot_common.rb'
use 'partial/_enroot_rhel.rb'

def prerequisites
%w(jq squashfs-tools parallel pigz squashfuse zstd)
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

# Copyright:: 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

provides :enroot, platform: 'redhat' do |node|
node['platform_version'].to_i >= 8
end

use 'partial/_enroot_common.rb'
use 'partial/_enroot_rhel.rb'

def prerequisites
%w(jq fuse-overlayfs squashfs-tools parallel pigz squashfuse zstd)
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# frozen_string_literal: true

# Copyright:: 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

provides :enroot, platform: 'rocky' do |node|
node['platform_version'].to_i >= 8
end

use 'partial/_enroot_common.rb'
use 'partial/_enroot_rhel.rb'

def prerequisites
%w(jq fuse-overlayfs squashfs-tools parallel pigz squashfuse zstd)
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# frozen_string_literal: true

# Copyright:: 2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

provides :enroot, platform: 'ubuntu' do |node|
node['platform_version'].to_i >= 20
end
use 'partial/_enroot_common.rb'
use 'partial/_enroot_debian.rb'
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# frozen_string_literal: true
#
# Copyright:: 2013-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

unified_mode true
default_action :setup

action :setup do
return if on_docker?
action_install_package
end

action :configure do
return if on_docker?
return unless enroot_installed

bash "Configure enroot" do
user 'root'
code <<-ENROOT_CONFIGURE
set -e
ENROOT_CONFIG_RELEASE=pyxis
SHARED_DIR=#{node['cluster']['shared_dir']}
NONROOT_USER=#{node['cluster']['cluster_user']}
wget -O /tmp/enroot.template.conf https://raw.githubusercontent.com/aws-samples/aws-parallelcluster-post-install-scripts/${ENROOT_CONFIG_RELEASE}/pyxis/enroot.template.conf
mkdir -p ${SHARED_DIR}/enroot
chown ${NONROOT_USER} ${SHARED_DIR}/enroot
ENROOT_CACHE_PATH=${SHARED_DIR}/enroot envsubst < /tmp/enroot.template.conf > /tmp/enroot.conf
mv /tmp/enroot.conf /etc/enroot/enroot.conf
chmod 0644 /etc/enroot/enroot.conf

mkdir -p /tmp/enroot
chmod 1777 /tmp/enroot
mkdir -p /tmp/enroot/data
chmod 1777 /tmp/enroot/data

chmod 1777 ${SHARED_DIR}/enroot

mkdir -p ${SHARED_DIR}/pyxis/
chown ${NONROOT_USER} ${SHARED_DIR}/pyxis/
sed -i '${s/$/ runtime_path=${SHARED_DIR}\\/pyxis/}' /opt/slurm/etc/plugstack.conf.d/pyxis.conf
SHARED_DIR=${SHARED_DIR} envsubst < /opt/slurm/etc/plugstack.conf.d/pyxis.conf > /opt/slurm/etc/plugstack.conf.d/pyxis.tmp.conf
mv /opt/slurm/etc/plugstack.conf.d/pyxis.tmp.conf /opt/slurm/etc/plugstack.conf.d/pyxis.conf

ENROOT_CONFIGURE
retries 3
retry_delay 5
end
end

def package_version
node['cluster']['enroot']['version']
end

def enroot_installed
::File.exist?('/usr/bin/enroot')
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# frozen_string_literal: true
#
# Copyright:: 2013-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

action :install_package do
return unless nvidia_enabled?

bash "Install enroot" do
user 'root'
cwd node['cluster']['sources_dir']
code <<-ENROOT_INSTALL
set -e
apt-get install -y jq squashfs-tools parallel fuse-overlayfs pigz squashfuse zstd
curl -fSsL -O #{enroot_url}
curl -fSsL -O #{enroot_caps_url}
apt install -y ./*.deb

ln -s /usr/share/enroot/hooks.d/50-slurm-pmi.sh /etc/enroot/hooks.d/
ln -s /usr/share/enroot/hooks.d/50-slurm-pytorch.sh /etc/enroot/hooks.d/
mkdir -p /etc/sysconfig
echo "PATH=/opt/slurm/sbin:/opt/slurm/bin:$(bash -c 'source /etc/environment ; echo $PATH')" >> /etc/sysconfig/slurmd

ENROOT_INSTALL
retries 3
retry_delay 5
end
end

def enroot_url
"https://github.com/NVIDIA/enroot/releases/download/v#{package_version}/enroot_#{package_version}-1_#{arch_suffix}.deb"
end

def enroot_caps_url
"https://github.com/NVIDIA/enroot/releases/download/v#{package_version}/enroot+caps_#{package_version}-1_#{arch_suffix}.deb"
end

def arch_suffix
arm_instance? ? 'arm64' : 'amd64'
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# frozen_string_literal: true
#
# Copyright:: 2013-2023 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License").
# You may not use this file except in compliance with the License.
# A copy of the License is located at
#
# http://aws.amazon.com/apache2.0/
#
# or in the "LICENSE.txt" file accompanying this file.
# This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied.
# See the License for the specific language governing permissions and limitations under the License.

action :install_package do
return unless nvidia_enabled?

package prerequisites do
retries 3
retry_delay 5
end

bash "Install enroot" do
user 'root'
cwd node['cluster']['sources_dir']
code <<-ENROOT_INSTALL
set -e
yum install -y #{enroot_url}
yum install -y #{enroot_caps_url}
ENROOT_INSTALL
retries 3
retry_delay 5
end
end

def enroot_url
"https://github.com/NVIDIA/enroot/releases/download/v#{package_version}/enroot-#{package_version}-1.el8.#{arch_suffix}.rpm"
end

def enroot_caps_url
"https://github.com/NVIDIA/enroot/releases/download/v#{package_version}/enroot+caps-#{package_version}-1.el8.#{arch_suffix}.rpm"
end

def arch_suffix
arm_instance? ? 'aarch64' : 'x86_64'
end
Loading
Loading