Single layer OCI Artifact ADR #92

frewilhelm · 2025-01-09T14:00:20Z

Closes #90

docs/adr/artifacts.md

Skarlso · 2025-01-22T07:00:17Z

docs/adr/artifacts.md

+This `artifact` type was [defined][artifact-definition] to point to a URL and hold a human-readable identifier
+`Revision` of a blob stored in a http-server inside the controller.
+
+The `artifact` idea was part of a bigger [RFC][fluxcd-rfc] for `FluxCD` which was unfortunately rejected.


which was unfortunately rejected.

Technically, it wasn't rejected. The timeline was a lot longer than we would have liked. But it wasn't rejected. :)

Fair point. I'll adjust this

However, I thought the idea of a generic artifact-source was rejected, since this would mean new security audits for customers etc.pp

docs/adr/artifacts.md

fabianburth · 2025-01-23T10:24:15Z

docs/adr/artifacts.md

+[`SnapshotStatus`][snapshot-status]
+- (Conditions)
+- LastReconciledDigest:
+  - Purpose?


determine whether a new reconcilation is necessary?

Wouldn't a "normal" digest field be enough? I don't understand the prefix LastReconciled....

If digest is different than LastReconciledDigest it means that there was an error in between. Digest not always == LastReconciledDigest. LastReconciledDigest marks the last SUCCESSFULLY reconciled digest.

i see, thank you!

however, this field is part of the status. I would only expect successfully reconciled digests in the status. Honestly, I don't think this field is worth a discussion. We should store the digest in the status. The fieldname is irrelevant

docs/adr/artifacts.md

jakobmoellerdev

Good job here on the proposal overall. Im mostly concerned that some of the options seem to be presented with points I couldnt follow up on. Nevertheless Im assuming that taking over the snapshot implementation (with heavy adjustments) is an acceptable solution

docs/adr/artifacts.md

jakobmoellerdev · 2025-01-24T09:40:36Z

docs/adr/artifacts.md

+transformation, they also can be used as a caching mechanism to reduce unnecessary calls to the source OCM registry._
+
+[`SnapshotSpec`][snapshot-spec]
+- Identity: OCM Identity (map[string]string) (Created by [constructIdentity()][snapshot-create-identity])


Why do we need to store an OCM identity?

As far as I understood it, the identity is used to create the OCI repository (==name) for the resource. Then, e.g. IsCached() is used to check if an OCI repository (with manifest) is already created or not (see code). However, I think that IsCached() only checks if the OCI repository exists using the identity. It does not validate if it is still the same. For that, we must compare the digest (which would mean that we have to download the resource again to calculate and compare the digest). I am not so happy about this^^.

@Skarlso did I get that right?

wait a second.. this does not explain why it is stored in the SnapshotSpec. Will take a look again

In most cases the identity is used to generate the OCI repository name (again). However, there are some other cases, e.g.:

Get the ComponentResourceVersion and ComponentVersion from the snapshot (the functions have 0 calls)

The Mutation reconciler needs the identity. Speculation - I guess to get the snapshot object by identity for mutations (=localization/configuration)

The FluxDeployer uses the identity to check the HelmChartVersion

I will take a look while implementing the snapshot for v2 and try to omit it if possible

jakobmoellerdev

Even though I have some trouble with the argumentation in places this Design looks mostly good formally. The decision here can be accepted.

jakobmoellerdev · 2025-02-11T08:02:04Z

docs/adr/artifacts.md

+decided to not use a plain http-server but an internal OCI registry to store and publish its blobs that are produced by
+the OCM controllers as single layer OCI artifacts.
+
+Arguments (meeting notes from 26.11.2024):


Suggested change

Arguments (meeting notes from 26.11.2024):

Arguments:

jakobmoellerdev · 2025-02-11T08:04:02Z

docs/adr/artifacts.md

+the OCM controllers as single layer OCI artifacts.
+
+Arguments (meeting notes from 26.11.2024):
+- An OCI registry is a responsibility less to maintain 


This is only true for the development responsibility. One can argue that running an OCI registry in production is tricky too.

jakobmoellerdev · 2025-02-11T08:05:22Z

docs/adr/artifacts.md

+- (Conditions)
+- LastReconciledDigest:
+  - Purpose?
+- LastReconciledTag:


Why is this field necessary (maybe similar discussion as above on LastReconciledDigest?)

jakobmoellerdev · 2025-02-11T08:07:01Z

docs/adr/artifacts.md

+
+#### Negative Consequences
+
+- Requires a transformer that transforms the `snapshot` resource in something that, for example, FluxCDs


This transformation was the main reason the new controller architecture was developed in the first place. IMO you will need to further explain how a user is now expected to interact with the Artifact.

For example, I am now wondering how the e2e flow will look like. Having the OCIRepository indirection in there, how do we get users to look into the right CRs for debugging? How would a deployment look like (roughly)?

jakobmoellerdev · 2025-02-11T08:09:00Z

docs/adr/artifacts.md

+- Orientation on `snapshot` and `artifact` ease the implementation.
+
+Cons:
+- New implementation is required.


This is not a real con because we have to change code anyways. IMO this should be the goto if the only disadvantage here is that we have to develop something, after all thats why we are here. Im looking for Design Cons here, not Effort Cons.

I think the true Con here is that it offers no benefit over the existing implementation

jakobmoellerdev · 2025-02-11T08:11:22Z

docs/adr/artifacts.md

+Cons:
+- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry
+- To support Docker images, the ergistry must be run in compartibility mode
+- Bigger image size: 69 MB the minimal version and 208 MB the full version


Image size on a registry that is optionally deployed and used for development is not a real concern imo

is not a real concern

Sure. That is why the decision is still for zot. I'd keep the point in the document, because that is a fact. But in case you insist, I can also remove it.

Image size has nothing to do with zot or not. A fact that is irrelevant for a decision does not need to be tracked.

Its like saying "zot offers docker compatibility, but only with a flag", sure its a fact, but its not relevant to the decision.

Your judgement about the relevance of certain pieces of information is based on assumptions, which are only known to you. Would be good, if you could share, what you think the decision criteria are, for choosing between zot and registry:2. Btw., in different talks with team members both the size and the compatibility flag were mentioned at least as potentially relevant.

docs/adr/artifacts.md

jakobmoellerdev · 2025-02-11T08:13:00Z

docs/adr/artifacts.md

+
+Cons:
+- We do not "control" the resource and issues caused by another OCI registry could be hard to fix/support
+- Most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service


This is not true, most people will look at integrating a registry of their choice (e.g. from a cloud provider) into their workflows instead of maintaining their own if given the choice.

:) Please see this comment by @jakobmoellerdev :
#92 (comment)

Considering that most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service, my assumption would be that this should not be the default and not advertised until we feel confident.

This is exactly the same point. If you tell people they need an OCI registry, they will look at a service and integrate that instead of maintaining their own...

For me, this is not about people not being able to run their own registry. Its the fact that they can just integrate any external registry of their choice.

Are we talking past each other? The question in the document is, if we want to force the users to bring their own registry. Your initial argument (I btw. agree with) is that this is not convenient for most people, and we should offer a default registry. Now you are saying that most people wouldn't need a default, but "will look at integrating a registry of their choice".

fabianburth · 2025-02-11T10:11:01Z

docs/adr/artifacts.md

+Arguments (meeting notes from 26.11.2024):
+- An OCI registry is a responsibility less to maintain 
+- Stop support at the level of the distribution spec of OCI
+- OCI registries could provide GC


How do they provide GC? AFAIK, the OCI GC is only for dangling blobs which would only be relevant for failed operations, right?

fabianburth · 2025-02-11T10:12:16Z

docs/adr/artifacts.md

+- An OCI registry is a responsibility less to maintain 
+- Stop support at the level of the distribution spec of OCI
+- OCI registries could provide GC
+- We will need an abstraction that handles OCI registries anyway


Suggested change

- We will need an abstraction that handles OCI registries anyway

- We will need an abstraction that handles OCI registries anyway to convert resources into a flux consumable format (= single layer oci artifact)

fabianburth · 2025-02-11T10:46:33Z

docs/adr/artifacts.md

+- Since there is a "real" blob in the storage, it should have a respective entity to represent it, e.g.
+`artifact`/`snapshot`


I don't think that statement alone can be used as an argument. What are the implications of not having the entity represented?

fabianburth · 2025-02-11T10:50:53Z

docs/adr/artifacts.md

+Pros:
+- No additional custom resource needed.
+
+Cons:


Shouldn't the loss of the extensibility of the the architecture provided by a common interface (artifact / snapshot) be listed as a con? I mean, we essentially tried to convince flux create separate cr's deliberately to allow this extensibility.

fabianburth · 2025-02-11T14:38:44Z

docs/adr/artifacts.md

+Cons:
+- Implemented for a plain http-server and not for OCI registry (check 
+[storage implementation][controller-manager-storage]). Thus, missing dedicated control-loop.
+- Would require a custom deployment of controller of FluxCD


Well, this is not really true. Conceptually, you also only require a transformer (just as in option 2). The actual difference is that in option 2, your transformer does not have to do that much anymore, because your resource is already available in a flux compatible format (as a single layer oci artifact). So, your transformer is only on kubernetes cr level.
The actual con is that we'd have to maintain the storage server (copied from flux) AND additionally, the oci implementation from option 2, because that's what the transformer would have to do, still.

fabianburth · 2025-02-11T14:45:21Z

docs/adr/artifacts.md

+- Using the `OCIRepository` control-loop would basically "clone" every blob from the OCI registry in FluxCD
+local storage (plain http server). 
+
+Using `OCIRepository` as intermediate `storage`-pointer CR is not an option as the control-loop of that resource would
+"clone" any OCI Registry blob to its own local storage.


twice the same point?

docs/adr/artifacts.md

fabianburth · 2025-02-11T14:57:17Z

docs/adr/artifacts.md

+
+Cons:
+- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry
+- To support Docker images, the ergistry must be run in compartibility mode


The registry is supposed to store deployment descriptions (helm/kustomize/k8s manifests) and related files (for localization and configuration), so docker images should not really be an issue, right?
Or do we also plan that the users can use the registry to store the images for this environment (e.g. in combination with the replication controller)?

No such plans at the moment. So, not an issue. Just taking a note of a potential limitation for the future.

Co-authored-by: Fabian Burth <fabian.burth@sap.com>

frewilhelm force-pushed the adr_artifacts branch 2 times, most recently from 7fa56ce to 3797630 Compare January 10, 2025 09:10

frewilhelm force-pushed the adr_artifacts branch 2 times, most recently from f353463 to 6615438 Compare January 20, 2025 10:24

ikhandamirov mentioned this pull request Jan 8, 2025

Evaluate the existing ocm controller oci cache implementation #76

Closed

2 tasks

Skarlso reviewed Jan 21, 2025

View reviewed changes

docs/adr/artifacts.md Show resolved Hide resolved