-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single layer OCI Artifact ADR #92
base: main
Are you sure you want to change the base?
Conversation
7fa56ce
to
3797630
Compare
f353463
to
6615438
Compare
docs/adr/artifacts.md
Outdated
This `artifact` type was [defined][artifact-definition] to point to a URL and hold a human-readable identifier | ||
`Revision` of a blob stored in a http-server inside the controller. | ||
|
||
The `artifact` idea was part of a bigger [RFC][fluxcd-rfc] for `FluxCD` which was unfortunately rejected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which was unfortunately rejected.
Technically, it wasn't rejected. The timeline was a lot longer than we would have liked. But it wasn't rejected. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. I'll adjust this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I thought the idea of a generic artifact-source was rejected, since this would mean new security audits for customers etc.pp
108ac97
to
ae3941f
Compare
[`SnapshotStatus`][snapshot-status] | ||
- (Conditions) | ||
- LastReconciledDigest: | ||
- Purpose? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determine whether a new reconcilation is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't a "normal" digest
field be enough? I don't understand the prefix LastReconciled...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If digest is different than LastReconciledDigest
it means that there was an error in between. Digest not always == LastReconciledDigest. LastReconciledDigest marks the last SUCCESSFULLY reconciled digest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, this field is part of the status. I would only expect successfully reconciled digests in the status. Honestly, I don't think this field is worth a discussion. We should store the digest
in the status. The fieldname is irrelevant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job here on the proposal overall. Im mostly concerned that some of the options seem to be presented with points I couldnt follow up on. Nevertheless Im assuming that taking over the snapshot implementation (with heavy adjustments) is an acceptable solution
3e2596e
to
96033a0
Compare
96033a0
to
530cad8
Compare
transformation, they also can be used as a caching mechanism to reduce unnecessary calls to the source OCM registry._ | ||
|
||
[`SnapshotSpec`][snapshot-spec] | ||
- Identity: OCM Identity (map[string]string) (Created by [constructIdentity()][snapshot-create-identity]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to store an OCM identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood it, the identity
is used to create the OCI repository (==name) for the resource. Then, e.g. IsCached()
is used to check if an OCI repository (with manifest) is already created or not (see code). However, I think that IsCached()
only checks if the OCI repository exists using the identity. It does not validate if it is still the same. For that, we must compare the digest
(which would mean that we have to download the resource again to calculate and compare the digest). I am not so happy about this^^.
@Skarlso did I get that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait a second.. this does not explain why it is stored in the SnapshotSpec
. Will take a look again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases the identity
is used to generate the OCI repository name (again). However, there are some other cases, e.g.:
- Get the
ComponentResourceVersion
andComponentVersion
from thesnapshot
(the functions have 0 calls) - The
Mutation
reconciler needs the identity. Speculation - I guess to get thesnapshot
object byidentity
for mutations (=localization/configuration) - The
FluxDeployer
uses the identity to check theHelmChartVersion
I will take a look while implementing the snapshot
for v2 and try to omit it if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I have some trouble with the argumentation in places this Design looks mostly good formally. The decision here can be accepted.
decided to not use a plain http-server but an internal OCI registry to store and publish its blobs that are produced by | ||
the OCM controllers as single layer OCI artifacts. | ||
|
||
Arguments (meeting notes from 26.11.2024): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguments (meeting notes from 26.11.2024): | |
Arguments: |
the OCM controllers as single layer OCI artifacts. | ||
|
||
Arguments (meeting notes from 26.11.2024): | ||
- An OCI registry is a responsibility less to maintain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only true for the development responsibility. One can argue that running an OCI registry in production is tricky too.
- (Conditions) | ||
- LastReconciledDigest: | ||
- Purpose? | ||
- LastReconciledTag: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this field necessary (maybe similar discussion as above on LastReconciledDigest?)
|
||
#### Negative Consequences | ||
|
||
- Requires a transformer that transforms the `snapshot` resource in something that, for example, FluxCDs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transformation was the main reason the new controller architecture was developed in the first place. IMO you will need to further explain how a user is now expected to interact with the Artifact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, I am now wondering how the e2e flow will look like. Having the OCIRepository indirection in there, how do we get users to look into the right CRs for debugging? How would a deployment look like (roughly)?
- Orientation on `snapshot` and `artifact` ease the implementation. | ||
|
||
Cons: | ||
- New implementation is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a real con because we have to change code anyways. IMO this should be the goto if the only disadvantage here is that we have to develop something, after all thats why we are here. Im looking for Design Cons here, not Effort Cons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the true Con here is that it offers no benefit over the existing implementation
Cons: | ||
- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry | ||
- To support Docker images, the ergistry must be run in compartibility mode | ||
- Bigger image size: 69 MB the minimal version and 208 MB the full version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image size on a registry that is optionally deployed and used for development is not a real concern imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not a real concern
Sure. That is why the decision is still for zot. I'd keep the point in the document, because that is a fact. But in case you insist, I can also remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Image size has nothing to do with zot or not. A fact that is irrelevant for a decision does not need to be tracked.
Its like saying "zot offers docker compatibility, but only with a flag", sure its a fact, but its not relevant to the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your judgement about the relevance of certain pieces of information is based on assumptions, which are only known to you. Would be good, if you could share, what you think the decision criteria are, for choosing between zot
and registry:2
. Btw., in different talks with team members both the size and the compatibility flag were mentioned at least as potentially relevant.
docs/adr/artifacts.md
Outdated
|
||
Cons: | ||
- We do not "control" the resource and issues caused by another OCI registry could be hard to fix/support | ||
- Most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true, most people will look at integrating a registry of their choice (e.g. from a cloud provider) into their workflows instead of maintaining their own if given the choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) Please see this comment by @jakobmoellerdev :
#92 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service, my assumption would be that this should not be the default and not advertised until we feel confident.
This is exactly the same point. If you tell people they need an OCI registry, they will look at a service and integrate that instead of maintaining their own...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, this is not about people not being able to run their own registry. Its the fact that they can just integrate any external registry of their choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we talking past each other? The question in the document is, if we want to force the users to bring their own registry. Your initial argument (I btw. agree with) is that this is not convenient for most people, and we should offer a default registry. Now you are saying that most people wouldn't need a default, but "will look at integrating a registry of their choice".
Arguments (meeting notes from 26.11.2024): | ||
- An OCI registry is a responsibility less to maintain | ||
- Stop support at the level of the distribution spec of OCI | ||
- OCI registries could provide GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do they provide GC? AFAIK, the OCI GC is only for dangling blobs which would only be relevant for failed operations, right?
- An OCI registry is a responsibility less to maintain | ||
- Stop support at the level of the distribution spec of OCI | ||
- OCI registries could provide GC | ||
- We will need an abstraction that handles OCI registries anyway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We will need an abstraction that handles OCI registries anyway | |
- We will need an abstraction that handles OCI registries anyway to convert resources into a flux consumable format (= single layer oci artifact) |
- Since there is a "real" blob in the storage, it should have a respective entity to represent it, e.g. | ||
`artifact`/`snapshot` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that statement alone can be used as an argument. What are the implications of not having the entity represented?
Pros: | ||
- No additional custom resource needed. | ||
|
||
Cons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the loss of the extensibility of the the architecture provided by a common interface (artifact / snapshot) be listed as a con? I mean, we essentially tried to convince flux create separate cr's deliberately to allow this extensibility.
Cons: | ||
- Implemented for a plain http-server and not for OCI registry (check | ||
[storage implementation][controller-manager-storage]). Thus, missing dedicated control-loop. | ||
- Would require a custom deployment of controller of FluxCD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is not really true. Conceptually, you also only require a transformer (just as in option 2). The actual difference is that in option 2, your transformer does not have to do that much anymore, because your resource is already available in a flux compatible format (as a single layer oci artifact). So, your transformer is only on kubernetes cr level.
The actual con is that we'd have to maintain the storage server (copied from flux) AND additionally, the oci implementation from option 2, because that's what the transformer would have to do, still.
- Using the `OCIRepository` control-loop would basically "clone" every blob from the OCI registry in FluxCD | ||
local storage (plain http server). | ||
|
||
Using `OCIRepository` as intermediate `storage`-pointer CR is not an option as the control-loop of that resource would | ||
"clone" any OCI Registry blob to its own local storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
twice the same point?
docs/adr/artifacts.md
Outdated
|
||
Cons: | ||
- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry | ||
- To support Docker images, the ergistry must be run in compartibility mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The registry is supposed to store deployment descriptions (helm/kustomize/k8s manifests) and related files (for localization and configuration), so docker images should not really be an issue, right?
Or do we also plan that the users can use the registry to store the images for this environment (e.g. in combination with the replication controller)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No such plans at the moment. So, not an issue. Just taking a note of a potential limitation for the future.
Co-authored-by: Fabian Burth <fabian.burth@sap.com>
Co-authored-by: Fabian Burth <fabian.burth@sap.com>
Closes #90