-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA-1294: enhancements/security/openshift-image-policy: Propose new enhancement #1633
OTA-1294: enhancements/security/openshift-image-policy: Propose new enhancement #1633
Conversation
Skipping CI for Draft Pull Request. |
5837b9a
to
b0f05e6
Compare
1010a07
to
a388407
Compare
@wking: This pull request references OTA-1294 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ab6b145
to
bb35b37
Compare
6ada6e7
to
ab7a83e
Compare
* Suggest changes to mirroring tools, although they will need to learn to mirror Sigstore signatures for use in disconnected/restricted-network environments. | ||
* Suggest changes to the ImageStream controller, although it will need to learn to mirror Sigstore signatures for verification when running pods from ImageStream sources, e.g. [must-gather][]. | ||
* Suggest policies for `quay.io/openshift-release-dev/ocp-v4.0-art-dev`, although those would help tighten down security for other `openshift-*` namespaces. | ||
* Support testing unsigned releases as HyperShift hosted clusters on a management cluster that requires signed releases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How/when does a hosted control plane deployment verify the release payload signature today (during install / during update)? Do we have the same luxury as we do in standalone of a Pod needing to run with the new payload before it has any impact on the hosted cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm aware, there is no signature verification today in the HyperShift stack. In managed HyperShift (currently just hosted ROSA), ClusterImageSets define allowed update targets, and customers select from those presumably pre-verified targets. The HyperShift rolls out whichever release pullspec without performing any signature verification of its own, although it does run some pre-checks like looking at Upgradeable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone can patch HostedCluster
and roll out an untrusted payload, I think we must close this gap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, but:
- Do you want to close that gap via OpenPGP / GPG signatures? Because if you do, that's a completely separate effort from this enhancement's Sigstore proposal.
- If you are comfortable closing that gap via management clusters having a ClusterImagePolicy per this proposal, the gap will close, although without some HyperShift-side investigation, the user experience of failing to launch Sigstore-troubled release-image pods is unclear. But closing that gap will take time, as this enhancement percolates through tech-preview to GA, and management clusters eventually update to pick up the GA version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing that gap via management clusters having a ClusterImagePolicy per this proposal
A corner case might be the management cluster enforcing a production-key-signed policy, while trying to roll out a beta-key-signed prerelease of the managed cluster. Can that happen at all?
|
||
## Proposal | ||
|
||
This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images. | |
This enhancement proposes a new `openshift-release` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images. |
Trying not to step on a likely future name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping this instance would get expanded later to cover more of OpenShift. In the event that it does not (e.g. if we end up using different signing keys/policies for quay.io/openshift-release-dev/ocp-v4.0-art-dev
, can we shard names then? Or can we use openshift-container-platform
or something to match the product linked to the release we're verifying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering more of openshift requires careful collaboration that I don't any value in - given that the ClusterPolicies are merged automatically. I actually hope this ClusterImagePolicy
becomes just ImagePolicy
in the future., so having a unique instance that we can manage and remove at will has benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually hope this ClusterImagePolicy becomes just ImagePolicy in the future...
If that happens, then the naming of this temporary policy is largely irrelevant.
... so having a unique instance that we can manage and remove at will has benefits.
The CVO is going to stomp anyone else making any kind of spec
edits, so it's going to be something we can manage and remove at will regardless of what we name this ClusterImagePolicy.
This enhancement will not: | ||
|
||
* Figure out how to configure a ClusterImagePolicy for verifying a mirrored release image, although this will need to happen before we can use Sigstore signatures in disconnected/restricted-network environments. | ||
* Suggest changes to mirroring tools, although they will need to learn to mirror Sigstore signatures for use in disconnected/restricted-network environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, I'd like to state how we think this should work, avoiding implementation detail, and have mirror team sign off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More on mirroring in this thread.
There are a number of possible next-steps that I'm punting on for now, as I try to nibble off the smallest possible slice of progress. | ||
This enhancement will not: | ||
|
||
* Figure out how to configure a ClusterImagePolicy for verifying a mirrored release image, although this will need to happen before we can use Sigstore signatures in disconnected/restricted-network environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we state how we think this should work (transparently) and have mirror / container runtime teams review & sign off on the enhancement. I think a fair non-goal is to describe the technical implementation of how it will be implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My grasp of Sigstore is light, so I'd guess it's "do whatever CRI-O does to find relevant signatures, and then copy them over in a way that's discoverable on the mirror". I don't know more details. Just look for *.sig
sibling tags? Eventually moving over to the referrer API, as tracked in RUN-1880 and containers/image#1848? With Sigstore as "the generic way to sign images", isn't there some prior art on "the generic way to mirror these things" we can draft on, without me having to guess from a cold start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClusterImagePolicy
part should be a no-op here: ClusterImagePolicy would only have a policy for quay.io/openshift-release-dev/ocp-release
, and image references (in Pod
s and elsewhere) should refer to ``quay.io/openshift-release-dev/ocp-release`.
CRI-O (and other image users) should then interpret IDMS to redirect the actual registry accesses from quay.io/openshift-release-dev/ocp-release
to the mirrors. If that works in current disconnected environments, no other IDMS / policy should have to be configured specifically for mirrors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(mirroring tools do need to be taught to mirror the signature images/artifacts, and the related tags, if any.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But while IDMS will redirect the by-digest quay.io/openshift-release-dev/ocp-release@sha256:d93db95ec59b2f51ff45fdecd54b4dd9144521e3d099fef151c189598200cf24
image request, we might need an ImageTagMirrorSet
to get the quay.io/openshift-release-dev/ocp-release:sha256-d93db95ec59b2f51ff45fdecd54b4dd9144521e3d099fef151c189598200cf24.sig
redirected. Until CRI-O pivots to the referrers API for signature lookup?
But regardless, out of scope initially, and we can test all of this once the current scope restriction is dropped from the API and MCO implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as implemented in c/image (and called by CRI-O): the “abstract image” as a whole has a mirror identified using IDMS, and then that mirror is used to fetch the manifest, layers, and signatures — although the way signatures are stored is structurally similar to an independent image, the implementation does not do another mirror lookup.
(If something else is fetching signatures without using c/image, then, yes, that something else is responsible for interpreting IDMS/ITMS, and I selfishly think it should adjust to behave the same way as c/image — or to call it.)
For Red-Hat-internal CI and QE, the agent configuring the install-config needs to set `internalTestingImagePolicy: Disable`. | ||
The new property can be set regardless of version or whether the `ImagePolicy` feature gate is enabled, because [the installer does not fail on unrecognized properties][install-config-unrecognized-properties]. | ||
|
||
### API Extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status information for ClusterVersion / HostedCluster / NodePool when payload is not signed / Pod can't be deployed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CVO-launches a Job to pull in the new target's manifests, and that job can have problems already. OCPBUGS-31462 is a recent example. It's reported via the ReleaseAccepted
condition. Likely has no analog for HostedCluster or NodePool, because I doubt they do the warm manifest handoff that the CVO needs to do to handle implicitly-enabled capabilities. But they will likely need something similar if they ever get around to supporting configurable cluster capabilities. And they should already have something similar, because "the pod I tried to launch came up healthy" is not something you want to rely on happening every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e5a5b10
to
ddbd388
Compare
## Proposal | ||
|
||
This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images. | ||
The new ClusterImagePolicy will be managed by the ClusterVersion operator and applied to all clusters by default (initially tech-preview, gated by the `ImagePolicy` feature gate ([currently `TechPreviewNoUpgrade`][feature-gate]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we ship the policy per default if image policies are enabled? Or do we want to have an additional toggle switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the proposal. If you think there's a case for a separate feature gate, can you give some pros and cons, for an Alternatives section? Then folks weighing pros and cons could decide to flip that option in as the core proposal and shift "use the ImagePolicy
feature-gate" out into the Alternatives section if they want. Personally, my initial take is that is makes sense to at least start expecting one gate, and possibly to pivot to a separate gate if the Implementation order plan hits trouble during the step (3) testing step, before we've done anything to the payload except (1)'s lifting the current scope restrictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing to clarify in this section is how we will continue handling gpg verification. In early discussions we had discussed having both side by side initially. We need to decide if we will keep it while this is in tech preview and then disable gpg entirely when we GA sigstore support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this enhancement doesn't suggest changes to the OpenPGP/GPG handling, because:
- We'll need to keep it for the GA 4.17, which won't have the
TechPreviewNoUpgrade
Sigstore stuff. - It's pretty orthogonal to the Sigstore proposal, so the CVO continuing to do it's GPG thing while the
TechPreviewNoUpgrade
Sigstore stuff happens too is just belt-and-suspenders. And also helps demo how the transitional period will work if this thing goes GA.
Do you think it's worth calling that out explicitly in Markdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do - a viable plan to migrate, securely and transparently for the end user, over time is an important consideration to the overall proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed 4a5ca6b -> 7137261, adding an Overlap with OpenPGP signing section explaining how the two systems are orthogonal and working through some details of a transitional update to make that more clear.
I'm leaving this thread unresolved, because @saschagrunert 's initial ask about whether this deserves a separate feature gate than the existing ImagePolicy
feature-gate is not yet clear. Do folks have any pros or cons to feed an Alternatives section about that? Or can we mark this thread resolved based on reasoning like:
No real pros/cons yet, but we if we bump into any later like "wow, testing shows this is going to take a long time to get working", we can come split off to a new feature-gate then.
/cc |
a2ebeaf
to
a862ec3
Compare
a862ec3
to
ea17075
Compare
|
||
#### Installer | ||
|
||
As discussed in [the *Proposal* section](#proposal), the installer will grow a new, feature-gated `OPENSHIFT_INSTALL_EXPERIMENTAL_DISABLE_IMAGE_POLICY` environment variable driving ClusterVersion `overrides` extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why an env var rather than just configuring the appropriate feature gate in the install configuration via TechPreviewNoUpgrade and CustomNoUpgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Environment variables are how the installer implements knobs used for testing which aren't customer-facing. See the Have the installer use an install-config property alternative section for more context, although I don't get at feature-gates directly there. But the idea with this knob is that "disable security" is expected to be internal forever, and isn't something we expect to eventually expose to customers.
Setting a policy to require Red Hat signatures on OpenShift release images from the canonical quay.io/openshift-release-dev/ocp-release repository, as described in [1]. [1]: openshift/enhancements#1633
Setting a TechPreviewNoUpgrade policy to require Red Hat signatures on OpenShift release images from the canonical quay.io/openshift-release-dev/ocp-release repository, as described in [1]. [1]: openshift/enhancements#1633
ea17075
to
c089c6f
Compare
I've pushed ea17075 -> c089c6f to pivot from the |
Setting a TechPreviewNoUpgrade policy to require Red Hat signatures on OpenShift release images from the canonical quay.io/openshift-release-dev/ocp-release repository, as described in [1]. [1]: openshift/enhancements#1633
Increasing the security of OpenShift release images from the current "check GPG signatures before initiating an update" to "check Sigstore signatures on every Pod launch". Like the (Cluster)ImagePolicy enhancement I'm building on, this enhancement mostly belongs to the Node component. But the node component doesn't seem to have its own subdirectory, so I'm dropping this into the enhancements/security directory for now. I've tried to lay out context for most things in the enhancement text itself, but the: --sort-by='{.lastTimestamp}{.metadata.creationTimestamp}' event filtering seems peripheral enough to be worth punting to here in the commit message. The logic Kubernetes uses to populate the LAST SEEN column is complicated [1]. While lastTimestamp seems to be what these particular Pod events most commonly use, that property is optional [2]. By prefering lastTimestamp, and falling back to creationTimestamp if lastTimestamp is unset, I'll hopefully fairly reliably deliver a descending LAST SEEN column. [1]: https://github.com/kubernetes/kubernetes/blob/9c8c61aee4966d153fba0b9c365c7d03c602b4fc/staging/src/k8s.io/kubectl/pkg/cmd/events/event_printer.go#L66-L99 [2]: https://github.com/kubernetes/kubernetes/blame/9c8c61aee4966d153fba0b9c365c7d03c602b4fc/pkg/apis/core/types.go#L5607-L5609
c089c6f
to
ee2f985
Compare
source: quay.io/openshift-release-dev/ocp-release | ||
``` | ||
|
||
In any case, it's out of scope for a reason, but until it gets picked up, it might make things more difficult for `TechPreviewNoUpgrade` disconnected/restricted-network clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the main blocker here support in mirror tooling or we need any changes in c/image? @QiWang19 @saschagrunert @mtrmac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, want to ensure that we capture this as a GA requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC this is entirely mirror tooling.
I have no idea; suggestions welcome :). | ||
|
||
There will be a gap during install before the ClusterImagePolicy has been created and rolled out to nodes, during which unsigned `quay.io/openshift-release-dev/ocp-release` images might run. | ||
But we do not perform any in-cluster signature checking during installs today; they are secured by a trusted installer using a baked-in release image digest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would want this documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could we actually fix that, by (vaguely guessing) applying the ClusterImagePolicy
earlier, when the bootstrap node/etcd is still being used, and there’s no possibility of triggering an upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to fix, because we get the ClusterImagePolicy from the release image, so you need that first release image pull to go through before you know what the policy should be. You could pull the policy out of the release image and inject it into the installer binary at extraction time, like we currently do for other release-image metadata. But Sigstore signatures on release images are still too late to establish trust in the installer. And once you have established trust in the installer, the installer's injected-at-extraction-time by-digest release image pullspec establishes trust in the release image. So it's not clear to me what gap earlier Sigstore signature coverage would close.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m sorry, that’s my mistake; it says “there will be a gap during install”, I was reading that as “after install there is a gap where an upgrade could be triggered before a policy has finished rolling out”.
I agree that the digest references flowing from the installer through the release image to the various product images is sufficient; and I don’t think that particularly needs documentation (end-user-focused documentation, at least).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can follow up on HCP and mirroring for the next phase.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Increasing the security of OpenShift release images from the current "check GPG signatures before initiating an update" to "check Sigstore signatures on every Pod launch".
Like the (Cluster)ImagePolicy enhancement I'm building on, this enhancement mostly belongs to the Node component. But the node component doesn't seem to have its own subdirectory, so I'm dropping this into the
enhancements/security
directory for now.