Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO-1100: enable RHEL entitlements in on-cluster layering #4312

Conversation

cheesesashimi
Copy link
Member

@cheesesashimi cheesesashimi commented Apr 9, 2024

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. With that in mind, we copy everything into a series of temp directories first, and then mount those temp directories into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails. In OpenShift CI, this location will be determined by the ARTIFACT_DIR env var. If this env var is not present, it will default the current directory.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 (or newer) of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
  rpm-ostree install buildah && \
  ln -s /run/secrets/rhsm /etc/rhsm-host && \
  ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 9, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

  • use caching to speed up incremental builds
  • kubelet: restorecon necessary files on kubelet's prestart
  • Revert "Merge pull request MCO-1092: Adapt the MCO's featuregate usage to new API #4275 from dkhater-redhat/mco-1092"
  • fix: resources were in the wrong indentation level
  • first pass of entitlements stuff
  • adds more RHEL entitlement stuff
  • fixes RHEL entitlements
  • adds additional e2e tests for RHEL entitlement support

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2024
Copy link
Contributor

openshift-ci bot commented Apr 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2024
@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch from a2e8d5e to 77b9422 Compare April 9, 2024 21:40
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024
@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch 3 times, most recently from a38312d to c16775a Compare April 9, 2024 23:16
@cheesesashimi
Copy link
Member Author

Checking to see if my Dockerfile changes had anything to do with this:

/test e2e-gcp-op-techpreview

@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch from 9c3aab3 to e93591a Compare April 10, 2024 15:54
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 10, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch 4 times, most recently from b18108f to 7fd8a8b Compare April 15, 2024 16:29
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 15, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. What that in mind, we copy everything into a temp directory first, and then mount that temp directory into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails.

- How to verify it

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  2. Run the e2e test suite.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch 3 times, most recently from 1739853 to 274e37f Compare April 16, 2024 14:38
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 16, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. What that in mind, we copy everything into a temp directory first, and then mount that temp directory into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
 rpm-ostree install buildah && \
 ln -s /run/secrets/rhsm /etc/rhsm-host && \
 ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch 2 times, most recently from 5614c7e to 10e78aa Compare April 18, 2024 14:11
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 18, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. With that in mind, we copy everything into a series of temp directories first, and then mount those temp directories into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails. In CI, this will default to using the value provided by the ARTIFACT_DIR environment variable. Running locally, it will default to the users' current directory.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
 rpm-ostree install buildah && \
 ln -s /run/secrets/rhsm /etc/rhsm-host && \
 ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch from 10e78aa to b52f3f3 Compare April 18, 2024 14:15
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 18, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. With that in mind, we copy everything into a series of temp directories first, and then mount those temp directories into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails. In OpenShift CI, this location will be determined by the ARTIFACT_DIR env var. If this env var is not present, it will default the current directory.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
 rpm-ostree install buildah && \
 ln -s /run/secrets/rhsm /etc/rhsm-host && \
 ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

This adds the capability for BuildController to use the RHEL entitlement
secrets to allow cluster admins to inject RHEL content into their builds
that they are entitled to receive. This also allows the injection /
consumption of content into /etc/yum.repos.d as well as
/etc/pki/rpm-gpg. There are a few notes about the implementation that I
would like to have at a higher level:

- Because we run rootless Buildah, we're more prone to running into
  SELinux complications. This makes it more difficult to directly mount
  the contents of /etc/yum.repos.d, /etc/pki/entitlement, and
  /etc/pki/rpm-gpg directly into the build context. With that in mind,
  we copy everything into a series of temp directories first, and then
  mount those temp directories into the build context as a volume.
- We also create an emptyDir which is mounted into the build pod at
  /home/build/.local/share/containers. It is unclear why this is
  necessary, but as mentioned before, I suspect that this is due to
  SELinux issues.
- The e2e test suite now has the capability to stream the container logs
  from the build pod to the filesystem as there is useful information
  contained within those logs if the e2e test fails. In OpenShift CI,
  this location will be determined by the ARTIFACT_DIR env var. If this
  env var is not present, it will default the current directory.
@cheesesashimi cheesesashimi force-pushed the zzlotnik/rhel-entitlements branch from b52f3f3 to ee44666 Compare April 18, 2024 14:17
@cheesesashimi cheesesashimi marked this pull request as ready for review April 19, 2024 19:51
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@openshift-ci openshift-ci bot requested review from dkhater-redhat and jkyros April 19, 2024 19:52
@cheesesashimi
Copy link
Member Author

/refresh-jira

@cheesesashimi
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 19, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Apr 19, 2024

@cheesesashimi: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade-out-of-change ee44666 link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/okd-scos-e2e-aws-ovn ee44666 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cheesesashimi
Copy link
Member Author

/test e2e-aws-ovn

This was a transient failure due to AWS capacity constraints:

level=error msg=Error: creating EC2 Instance: InsufficientInstanceCapacity: We currently do not have sufficient m6a.xlarge capacity in the Availability Zone you requested (us-east-2a). Our system will be working on provisioning additional capacity. You can currently get m6a.xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-2b, us-east-2c.

@sinnykumari
Copy link
Contributor

Tested locally on an AWS cluster and RHEL entitlement worked fine with this PR. nodes opted to layered pool got updated with buildah-1.33.5-1.el9.x86_64 installed!

Copy link
Contributor

@dkhater-redhat dkhater-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks great zack! just a few suggestions but it looks really clean. nice work! going to approve because none of the changes are mandatory

Comment on lines +41 to +69
# If we have /etc/pki/entitlement certificates, commonly used with RHEL
# entitlements, copy them into a tempdir to avoid SELinux issues, and tell
# Buildah about them.
if [[ -n "$ETC_PKI_ENTITLEMENT_MOUNTPOINT" ]] && [[ -d "$ETC_PKI_ENTITLEMENT_MOUNTPOINT" ]]; then
configs="$(mktemp -d)"
cp -r -v "$ETC_PKI_ENTITLEMENT_MOUNTPOINT/." "$configs"
chmod -R 0755 "$configs"
build_args+=("--volume=$configs:$ETC_PKI_ENTITLEMENT_MOUNTPOINT:$mount_opts")
fi

# If we have /etc/yum.repos.d configs, commonly used with Red Hat Satellite
# subscriptions, copy them into a tempdir to avoid SELinux issues, and tell
# Buildah about them.
if [[ -n "$ETC_YUM_REPOS_D_MOUNTPOINT" ]] && [[ -d "$ETC_YUM_REPOS_D_MOUNTPOINT" ]]; then
configs="$(mktemp -d)"
cp -r -v "$ETC_YUM_REPOS_D_MOUNTPOINT/." "$configs"
chmod -R 0755 "$configs"
build_args+=("--volume=$configs:$ETC_YUM_REPOS_D_MOUNTPOINT:$mount_opts")
fi

# If we have /etc/pki/rpm-gpg configs, commonly used with Red Hat Satellite
# subscriptions, copy them into a tempdir to avoid SELinux issues, and tell
# Buildah about them.
if [[ -n "$ETC_PKI_RPM_GPG_MOUNTPOINT" ]] && [[ -d "$ETC_PKI_RPM_GPG_MOUNTPOINT" ]]; then
configs="$(mktemp -d)"
cp -r -v "$ETC_PKI_RPM_GPG_MOUNTPOINT/." "$configs"
chmod -R 0755 "$configs"
build_args+=("--volume=$configs:$ETC_PKI_RPM_GPG_MOUNTPOINT:$mount_opts")
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For modularity, you could make a function that encapsulates this, only if you wanted to. This isnt really a mandatory, as your code looks good! just a suggestion:

function prepare_and_mount_dir {
	if [[ -n "$mount_point" ]] && [[ -d "$mount_point" ]]; then
		configs=$(mktemp -d)
		cp -r -v "$mount_point/." "$configs"
		chmod -R 0755 "$configs"
		build_args+=("--volume=$configs:$mount_point:$mount_opts")
	fi
}

prepare_and_mount_dir "RHSM Certs" "$rhsm_path" "rhsm_certs"
prepare_and_mount_dir "RPM-GPG Configs" "$ETC_PKI_RPM_GPG_MOUNTPOINT" "rpm_gpg_configs"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that, but I couldn't get it to work quite the way that I wanted it to. Admittedly, my Bash is a little rusty. But I did think of two interesting paths forward for the future:

  1. There is a Python3 interpreter available in the official Buildah image, the MCO image, and the RHCOS image. So if I were so inclined, I could re-write this in Python. That would open the door to writing unit tests around the script. Although, one doesn't strictly need Python to do unit tests since Bats exists.
  2. Use Go instead of Bash to orchestrate things. Instead of this Bash script, we would add another binary to the MCO container which would get called instead. As a starting point, this binary could do what this Bash script does, but it could eventually do so much more.

Comment on lines 1022 to +1055

// Fetches an optional secret to inject into the build. Returns a nil error if
// the secret is not found.
func (ctrl *Controller) getOptionalSecret(secretName string) (*corev1.Secret, error) {
optionalSecret, err := ctrl.kubeclient.CoreV1().Secrets(ctrlcommon.MCONamespace).Get(context.TODO(), secretName, metav1.GetOptions{})
if err == nil {
klog.Infof("Optional build secret %q found, will include in build", secretName)
return optionalSecret, nil
}

if k8serrors.IsNotFound(err) {
klog.Infof("Could not find optional secret %q, will not include in build", secretName)
return nil, nil
}

return nil, fmt.Errorf("could not retrieve optional secret: %s: %w", secretName, err)
}

// Fetches an optional ConfigMap to inject into the build. Returns a nil error if
// the ConfigMap is not found.
func (ctrl *Controller) getOptionalConfigMap(configmapName string) (*corev1.ConfigMap, error) {
optionalConfigMap, err := ctrl.kubeclient.CoreV1().ConfigMaps(ctrlcommon.MCONamespace).Get(context.TODO(), configmapName, metav1.GetOptions{})
if err == nil {
klog.Infof("Optional build ConfigMap %q found, will include in build", configmapName)
return optionalConfigMap, nil
}

if k8serrors.IsNotFound(err) {
klog.Infof("Could not find ConfigMap %q, will not include in build", configmapName)
return nil, nil
}

return nil, fmt.Errorf("could not retrieve optional ConfigMap: %s: %w", configmapName, err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another non mandatory suggestion. these can be combined into a "getOptionalResource" and just pass in a resource type, do something like, then combine the rest

if resourceType == "Secret" {
        resource, err = ctrl.kubeclient.CoreV1().Secrets(ctrlcommon.MCONamespace).Get(context.TODO(), resourceName, metav1.GetOptions{})
    } else if resourceType == "ConfigMap" {
        resource, err = ctrl.kubeclient.CoreV1().ConfigMaps(ctrlcommon.MCONamespace).Get(context.TODO(), resourceName, metav1.GetOptions{})
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea for the future, especially since both of these helpers only really concern themselves about the existence of the resource. That would blend nicely with some future refactoring ideas that I have.

Copy link
Contributor

openshift-ci bot commented Apr 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, dkhater-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cheesesashimi,dkhater-redhat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 24, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. With that in mind, we copy everything into a series of temp directories first, and then mount those temp directories into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails. In OpenShift CI, this location will be determined by the ARTIFACT_DIR env var. If this env var is not present, it will default the current directory.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 (or newer) of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
 rpm-ostree install buildah && \
 ln -s /run/secrets/rhsm /etc/rhsm-host && \
 ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sergiordlr
Copy link

Verification steps:

  1. Enable techpreview

oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'
2. Create the OCB configuration configmap


oc create -f - << EOF
apiVersion: v1
data:
  baseImagePullSecretName: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
  finalImagePushSecretName: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
  finalImagePullspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image"
  imageBuilderType: "custom-pod-builder"
kind: ConfigMap
metadata:
  name: on-cluster-build-config
  namespace: openshift-machine-config-operator
EOF


  1. Create the custom Containerfile configmap
oc create -f - << EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: on-cluster-build-custom-dockerfile
  namespace: openshift-machine-config-operator
data:
  # This reflects a 1:1 mapping of MachineConfigPool name to custom Dockerfile.
  master: ""
  worker: ""
  infra: |-
    FROM configs AS final

    RUN rm -rf /etc/rhsm-host && \
      rpm-ostree install buildah && \
      ln -s /run/secrets/rhsm /etc/rhsm-host && \
      ostree container commit
EOF

  1. Copy the etc-pki-entitlement secret
oc create secret generic etc-pki-entitlement \
  --namespace "openshift-machine-config-operator" \
  --from-file=entitlement.pem=<(oc get secret/etc-pki-entitlement -n openshift-config-managed -o go-template='{{index .data "entitlement.pem" | base64decode }}') \
  --from-file=entitlement-key.pem=<(oc get secret/etc-pki-entitlement -n openshift-config-managed -o go-template='{{index .data "entitlement-key.pem" | base64decode }}')
  1. Create a infra pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""
EOF

oc label node $(oc get nodes -l node-role.kubernetes.io/worker -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/infra=
  1. Enable the OCB functionality in the worker pool
    $ oc label mcp/infra machineconfiguration.openshift.io/layering-enabled=

  2. Check that the image is correctly built and deployed

Build was successfully executed, in the logs we can see the buildah installation:

time="2024-04-24T20:46:00Z" level=debug msg="Running &exec.Cmd{Path:\"/bin/sh\", Args:[]string{\"/bin/sh\", \"-c\", \"rm -rf /etc/rhsm-host &&       rpm-ostree install buildah &&       ln -s /run/secrets/rhsm /etc/rhsm-host &&       ostree container commit\"}, Env:[]string{\"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\", \"HOSTNAME=742c97be8272\", \"HOME=/root\"}, Dir:\"/\", Stdin:(*os.File)(0xc000068058), Stdout:(*os.File)(0xc000068060), Stderr:(*os.File)(0xc000068068), ExtraFiles:[]*os.File(nil), SysProcAttr:(*syscall.SysProcAttr)(0xc000000000), Process:(*os.Process)(nil), ProcessState:(*os.ProcessState)(nil), ctx:context.Context(nil), Err:error(nil), Cancel:(func() error)(nil), WaitDelay:0, childIOFiles:[]io.Closer(nil), parentIOPipes:[]io.Closer(nil), goroutine:[]func() error(nil), goroutineErr:(<-chan error)(nil), ctxResult:(<-chan exec.ctxResult)(nil), createdByStack:[]uint8(nil), lookPathErr:error(nil)} (PATH = \"\")"
Enabled rpm-md repositories: rhel-9-for-x86_64-baseos-beta-rpms rhel-9-for-x86_64-appstream-beta-rpms
Updating metadata for 'rhel-9-for-x86_64-baseos-beta-rpms'...done
Updating metadata for 'rhel-9-for-x86_64-appstream-beta-rpms'...done
Importing rpm-md...done
rpm-md repo 'rhel-9-for-x86_64-baseos-beta-rpms'; generated: 2024-03-25T12:33:56Z solvables: 1816
rpm-md repo 'rhel-9-for-x86_64-appstream-beta-rpms'; generated: 2024-03-25T12:34:53Z solvables: 6972
Resolving dependencies...done
Will download: 1 package (9.9?MB)
Downloading from 'rhel-9-for-x86_64-appstream-beta-rpms'...done
Installing 1 packages:
  buildah-2:1.33.5-1.el9.x86_64 (rhel-9-for-x86_64-appstream-beta-rpms)
Installing: buildah-2:1.33.5-1.el9.x86_64 (rhel-9-for-x86_64-appstream-beta-rpms)

In the node we have access to buildah

$ oc debug -q  node/$(oc get nodes -l node-role.kubernetes.io/infra -ojsonpath="{.items[0].metadata.name}") -- chroot /host rpm-ostree status
State: idle
Deployments:
* ostree-unverified-registry:image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:ddffa388d2065a1ae2446fa8d02db1ca1d00e538b6c1fd87a258e93c9a966709
                   Digest: sha256:ddffa388d2065a1ae2446fa8d02db1ca1d00e538b6c1fd87a258e93c9a966709
                  Version: 416.94.202404221029-0 (2024-04-24T20:46:05Z)

$ oc debug -q  node/$(oc get nodes -l node-role.kubernetes.io/infra -ojsonpath="{.items[0].metadata.name}") -- chroot /host rpm -qa |grep buildah
buildah-1.33.5-1.el9.x86_64

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Apr 24, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 24, 2024

@cheesesashimi: This pull request references MCO-1100 which is a valid jira issue.

In response to this:

- What I did

This adds the capability for BuildController to use the RHEL entitlement secrets to allow cluster admins to inject RHEL content into their builds that they are entitled to receive. This also allows the injection / consumption of content into /etc/yum.repos.d as well as /etc/pki/rpm-gpg. There are a few notes about the implementation that I would like to have at a higher level:

  • Because we run rootless Buildah, we're more prone to running into SELinux complications. This makes it more difficult to directly mount the contents of /etc/yum.repos.d, /etc/pki/entitlement, and /etc/pki/rpm-gpg directly into the build context. With that in mind, we copy everything into a series of temp directories first, and then mount those temp directories into the build context as a volume.
  • We also create an emptyDir which is mounted into the build pod at /home/build/.local/share/containers. It is unclear why this is necessary, but as mentioned before, I suspect that this is due to SELinux issues.
  • The e2e test suite now has the capability to stream the container logs from the build pod to the filesystem as there is useful information contained within those logs if the e2e test fails. In OpenShift CI, this location will be determined by the ARTIFACT_DIR env var. If this env var is not present, it will default the current directory.

- How to verify it

Automated verification:

  1. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace. If this secret is not present, TestEntitledBuilds and TestEntitledBuildsRollsOutImage will be skipped.
  2. Ensure that the OnClusterBuild feature-gate is enabled. The test suite will fail immediately if the feature-gate is not enabled.
  3. Run the tech preview e2e test suite: $ go test -count=1 -v ./test/e2e-techpreview/...

(Note: Because we have not landed #4284, the cleanup / teardown will delete the node and its underlying machine, causing the Machine API to provision a replacement node.)

Semi-manual verification:

  1. Download / install v0.0.14 (or newer) of my OpenShift helpers on your local machine.
  2. Bring up a cluster where the secret etc-pki-entitlement exists in the openshift-config-managed namespace.
  3. Create a Dockerfile on your local machine that contains the following content:
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
 rpm-ostree install buildah && \
 ln -s /run/secrets/rhsm /etc/rhsm-host && \
 ostree container commit
  1. With my onclustertesting helper in your $PATH, run the following: $ onclustertesting setup in-cluster-registry --enable-featuregate --pool=layered --custom-dockerfile=./path/to/the/Dockerfile
  2. If you have not previously enabled the featuregate, my helper will enable it for you. It will cause a new MachineConfig to be created and rolled out to all of the nodes, so the build might not begin immediately. Using this flag is idempotent.
  3. Watch for the machine-os-builder pod to start. Shortly afterward, the build pod should start. It should complete without any errors.

- Description for the changelog
Enables RHEL entitlements in on-cluster layering

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi
Copy link
Member Author

These changes were incorporated into #4327, so this PR can be closed.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 13, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants