Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Amazon Linux 2023 #2692

Merged
merged 44 commits into from
Jun 12, 2024
Merged

Support Amazon Linux 2023 #2692

merged 44 commits into from
Jun 12, 2024

Conversation

himani2411
Copy link
Contributor

@himani2411 himani2411 commented Apr 8, 2024

Description of changes

Tests

AMIs build are successful. All integration tests in develop.yaml have been run.
Only the following 6 tests are failing:

test-suites:
  basic:
    test_essential_features.py::test_essential_features:
      dimensions:
      - instances:
        - c5.xlarge
        oss:
        - alinux2023
        regions:
        - us-east-1
        schedulers:
        - slurm
  createami:
    test_createami.py::test_build_image_custom_components:
      dimensions:
      - instances:
        - m6g.xlarge
        oss:
        - alinux2023
        regions:
        - us-west-1
      - instances:
        - c5.xlarge
        oss:
        - alinux2023
        regions:
        - eu-north-1
  iam:
    test_iam_image.py::test_iam_roles:
      dimensions:
      - instances:
        - c5.xlarge
        oss:
        - alinux2023
        regions:
        - us-west-2
  networking:
    test_cluster_networking.py::test_cluster_in_no_internet_subnet:
      dimensions:
      - instances:
        - c5.xlarge
        oss:
        - alinux2023
        regions:
        - us-east-1
        schedulers:
        - slurm
  trainium:
    test_trainium.py::test_trainium:
      dimensions:
      - oss:
        - alinux2023
        regions:
        - us-west-2
        schedulers:
        - slurm

References

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.48%. Comparing base (7d414fe) to head (ecbd953).
Report is 76 commits behind head on develop.

Current head ecbd953 differs from pull request most recent head 5552b26

Please upload reports for the commit 5552b26 to get more accurate results.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2692   +/-   ##
========================================
  Coverage    76.48%   76.48%           
========================================
  Files           22       22           
  Lines         2220     2220           
========================================
  Hits          1698     1698           
  Misses         522      522           
Flag Coverage Δ
unittests 76.48% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor Author

@himani2411 himani2411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we require this specific commit or not as the [TEMP] taggged commits were something that I was adding to find a solution to few errors that I was getting

84e0d19

7cf55fd

cookbooks/aws-parallelcluster-computefleet/metadata.rb Outdated Show resolved Hide resolved
@@ -49,6 +49,8 @@ def package_platform
platform_version = node['platform_version'].to_i
if platform_version == 2
platform_version = 7
elsif platform_version == 2023
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad practice: redefinition of the same variable with different meaning.
Here we should have platform_version and mysql_platform_version

Copy link
Contributor Author

@himani2411 himani2411 Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using the platform_version is just the location in our s3 bucket where we archive the packages we want. Similar to what we do for Cinc-client.

archives/mysql/el/9/x86_64/mysql-community-client-*tar.gz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the bad practice is doing:

platform_version = define platform version with meaning 1
if platform_version == 2
      platform_version = 7
use platform version with meaning 2

Not a blocking comment because the practice is not introduced in this PR, but better to address it at least in a follow up PR

@@ -37,6 +37,10 @@ if [ -e "${SCRIPT}" ]; then
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/instance-id
')
echo "Install libxcrypt-compat dmidecode package by using SSH key: ${KITCHEN_SSH_KEY_PATH}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is something required only by AL2023, let's add a comment saying that.
If this is not, why adding it now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is in test code, can we address this later?

util/cinc-install.sh Outdated Show resolved Hide resolved
configured_ip=`nmcli -t -f IP4.ADDRESS device show ${DEVICE_NAME} | cut -f2 -d':'`
if [ -z "${configured_ip}" ]; then
# Setup connection method to "manual", configure ip address and gateway, only if not already configured.
sudo nmcli connection modify "${con_name}" ipv4.method manual ipv4.addresses ${DEVICE_IP_ADDRESS}/${CIDR_PREFIX_LENGTH} ipv4.gateway ${GW_IP_ADDRESS}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this comment nmcli is not supported: https://github.com/aws/aws-parallelcluster/pull/6223/files#r1630201744

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing, Amazon Linux 2023 does not require this script because amazon-ec2-net-utils is pre-installed in AL2023 and handles multi-nics instances properly.

test_efa has passed on multi-nics instances

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon Linux 2023 does not require this script
So we are removing the whole file, right?

Comment on lines 22 to 39
action :cloudwatch_prerequisite do
package "gnupg2-full" do
options '--allowerasing'
retries 3
retry_delay 5
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Himani Deshpande and others added 25 commits June 11, 2024 14:14
* Changing Package repos test to skip checking Repo name for epel
Inspec test has not supported Amazon Linux 2023. https://docs.chef.io/inspec/platforms/. Therefore, this commit disable the check which was generating false errors
Running this recipe on Alinux 2023 docker generates false failure: https://github.com/aws/aws-parallelcluster-cookbook/actions/runs/9373643185/job/25807894209?pr=2692

Signed-off-by: Hanwen <hanwenli@amazon.com>
Signed-off-by: Hanwen <hanwenli@amazon.com>
Signed-off-by: Hanwen <hanwenli@amazon.com>
Signed-off-by: Hanwen <hanwenli@amazon.com>
rsyslog is required to have Amazon Linux 2023 writes to messages log. The messages log is uploaded to CloudWatch. Therefore, this commits move the installation and start of rsyslog to CloudWatch recipe and removes the standalone recipe to enable rsyslog.

Signed-off-by: Hanwen <hanwenli@amazon.com>
amazon-ec2-net-utils is pre-installed in AL2023 and handles multi-nics instances properly

Signed-off-by: Hanwen <hanwenli@amazon.com>
Signed-off-by: Hanwen <hanwenli@amazon.com>
@@ -50,7 +50,9 @@
cwd '/tmp'
code <<-CUDA
set -e
./cuda.run --silent --toolkit --samples
mkdir /cuda-install
./cuda.run --silent --toolkit --samples --tmpdir=/cuda-install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cuda-install is meant to be a temporary directory, so we should create it under /tmp.
That said it's fine to use recursive deletion of that folder since it's a temp directory created by us and not meant ot be used by the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp on some OSes has size limit. That's why we had to change the directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please add a comment on top of it to explain why we are not using tmp dir.
However, we should be in control of the tmp dir size and adjust it to our needs.
May you please track in the backlog the possibility to control the /tmp size as a separate partition (that is also a storage best practice)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, added a comment

@hanwen-pcluste hanwen-pcluste merged commit 095f581 into aws:develop Jun 12, 2024
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants