Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single layer OCI Artifact ADR #92

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Single layer OCI Artifact ADR #92

wants to merge 27 commits into from

Conversation

frewilhelm
Copy link
Contributor

Closes #90

@frewilhelm frewilhelm force-pushed the adr_artifacts branch 2 times, most recently from 7fa56ce to 3797630 Compare January 10, 2025 09:10
@frewilhelm frewilhelm force-pushed the adr_artifacts branch 2 times, most recently from f353463 to 6615438 Compare January 20, 2025 10:24
This `artifact` type was [defined][artifact-definition] to point to a URL and hold a human-readable identifier
`Revision` of a blob stored in a http-server inside the controller.

The `artifact` idea was part of a bigger [RFC][fluxcd-rfc] for `FluxCD` which was unfortunately rejected.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which was unfortunately rejected.

Technically, it wasn't rejected. The timeline was a lot longer than we would have liked. But it wasn't rejected. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I'll adjust this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I thought the idea of a generic artifact-source was rejected, since this would mean new security audits for customers etc.pp

[`SnapshotStatus`][snapshot-status]
- (Conditions)
- LastReconciledDigest:
- Purpose?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determine whether a new reconcilation is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a "normal" digest field be enough? I don't understand the prefix LastReconciled....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If digest is different than LastReconciledDigest it means that there was an error in between. Digest not always == LastReconciledDigest. LastReconciledDigest marks the last SUCCESSFULLY reconciled digest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, this field is part of the status. I would only expect successfully reconciled digests in the status. Honestly, I don't think this field is worth a discussion. We should store the digest in the status. The fieldname is irrelevant

Copy link
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job here on the proposal overall. Im mostly concerned that some of the options seem to be presented with points I couldnt follow up on. Nevertheless Im assuming that taking over the snapshot implementation (with heavy adjustments) is an acceptable solution

hilmarf
hilmarf previously approved these changes Jan 28, 2025
transformation, they also can be used as a caching mechanism to reduce unnecessary calls to the source OCM registry._

[`SnapshotSpec`][snapshot-spec]
- Identity: OCM Identity (map[string]string) (Created by [constructIdentity()][snapshot-create-identity])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to store an OCM identity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood it, the identity is used to create the OCI repository (==name) for the resource. Then, e.g. IsCached() is used to check if an OCI repository (with manifest) is already created or not (see code). However, I think that IsCached() only checks if the OCI repository exists using the identity. It does not validate if it is still the same. For that, we must compare the digest (which would mean that we have to download the resource again to calculate and compare the digest). I am not so happy about this^^.

@Skarlso did I get that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a second.. this does not explain why it is stored in the SnapshotSpec. Will take a look again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases the identity is used to generate the OCI repository name (again). However, there are some other cases, e.g.:

  • Get the ComponentResourceVersion and ComponentVersion from the snapshot (the functions have 0 calls)
  • The Mutation reconciler needs the identity. Speculation - I guess to get the snapshot object by identity for mutations (=localization/configuration)
  • The FluxDeployer uses the identity to check the HelmChartVersion

I will take a look while implementing the snapshot for v2 and try to omit it if possible

@ikhandamirov ikhandamirov marked this pull request as ready for review January 31, 2025 07:59
@ikhandamirov ikhandamirov requested a review from a team as a code owner January 31, 2025 07:59
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I have some trouble with the argumentation in places this Design looks mostly good formally. The decision here can be accepted.

decided to not use a plain http-server but an internal OCI registry to store and publish its blobs that are produced by
the OCM controllers as single layer OCI artifacts.

Arguments (meeting notes from 26.11.2024):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Arguments (meeting notes from 26.11.2024):
Arguments:

the OCM controllers as single layer OCI artifacts.

Arguments (meeting notes from 26.11.2024):
- An OCI registry is a responsibility less to maintain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for the development responsibility. One can argue that running an OCI registry in production is tricky too.

- (Conditions)
- LastReconciledDigest:
- Purpose?
- LastReconciledTag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this field necessary (maybe similar discussion as above on LastReconciledDigest?)


#### Negative Consequences

- Requires a transformer that transforms the `snapshot` resource in something that, for example, FluxCDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation was the main reason the new controller architecture was developed in the first place. IMO you will need to further explain how a user is now expected to interact with the Artifact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I am now wondering how the e2e flow will look like. Having the OCIRepository indirection in there, how do we get users to look into the right CRs for debugging? How would a deployment look like (roughly)?

- Orientation on `snapshot` and `artifact` ease the implementation.

Cons:
- New implementation is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a real con because we have to change code anyways. IMO this should be the goto if the only disadvantage here is that we have to develop something, after all thats why we are here. Im looking for Design Cons here, not Effort Cons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the true Con here is that it offers no benefit over the existing implementation

Cons:
- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry
- To support Docker images, the ergistry must be run in compartibility mode
- Bigger image size: 69 MB the minimal version and 208 MB the full version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size on a registry that is optionally deployed and used for development is not a real concern imo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not a real concern

Sure. That is why the decision is still for zot. I'd keep the point in the document, because that is a fact. But in case you insist, I can also remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size has nothing to do with zot or not. A fact that is irrelevant for a decision does not need to be tracked.

Its like saying "zot offers docker compatibility, but only with a flag", sure its a fact, but its not relevant to the decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your judgement about the relevance of certain pieces of information is based on assumptions, which are only known to you. Would be good, if you could share, what you think the decision criteria are, for choosing between zot and registry:2. Btw., in different talks with team members both the size and the compatibility flag were mentioned at least as potentially relevant.


Cons:
- We do not "control" the resource and issues caused by another OCI registry could be hard to fix/support
- Most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true, most people will look at integrating a registry of their choice (e.g. from a cloud provider) into their workflows instead of maintaining their own if given the choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) Please see this comment by @jakobmoellerdev :
#92 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that most people need to operate a registry than and the majority would not have experience maintaining a production grade stable oci registry as a service, my assumption would be that this should not be the default and not advertised until we feel confident.

This is exactly the same point. If you tell people they need an OCI registry, they will look at a service and integrate that instead of maintaining their own...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, this is not about people not being able to run their own registry. Its the fact that they can just integrate any external registry of their choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking past each other? The question in the document is, if we want to force the users to bring their own registry. Your initial argument (I btw. agree with) is that this is not convenient for most people, and we should offer a default registry. Now you are saying that most people wouldn't need a default, but "will look at integrating a registry of their choice".

Arguments (meeting notes from 26.11.2024):
- An OCI registry is a responsibility less to maintain
- Stop support at the level of the distribution spec of OCI
- OCI registries could provide GC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do they provide GC? AFAIK, the OCI GC is only for dangling blobs which would only be relevant for failed operations, right?

- An OCI registry is a responsibility less to maintain
- Stop support at the level of the distribution spec of OCI
- OCI registries could provide GC
- We will need an abstraction that handles OCI registries anyway
Copy link
Contributor

@fabianburth fabianburth Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We will need an abstraction that handles OCI registries anyway
- We will need an abstraction that handles OCI registries anyway to convert resources into a flux consumable format (= single layer oci artifact)

Comment on lines +127 to +128
- Since there is a "real" blob in the storage, it should have a respective entity to represent it, e.g.
`artifact`/`snapshot`
Copy link
Contributor

@fabianburth fabianburth Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that statement alone can be used as an argument. What are the implications of not having the entity represented?

Pros:
- No additional custom resource needed.

Cons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the loss of the extensibility of the the architecture provided by a common interface (artifact / snapshot) be listed as a con? I mean, we essentially tried to convince flux create separate cr's deliberately to allow this extensibility.

Cons:
- Implemented for a plain http-server and not for OCI registry (check
[storage implementation][controller-manager-storage]). Thus, missing dedicated control-loop.
- Would require a custom deployment of controller of FluxCD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not really true. Conceptually, you also only require a transformer (just as in option 2). The actual difference is that in option 2, your transformer does not have to do that much anymore, because your resource is already available in a flux compatible format (as a single layer oci artifact). So, your transformer is only on kubernetes cr level.
The actual con is that we'd have to maintain the storage server (copied from flux) AND additionally, the oci implementation from option 2, because that's what the transformer would have to do, still.

Comment on lines +171 to +175
- Using the `OCIRepository` control-loop would basically "clone" every blob from the OCI registry in FluxCD
local storage (plain http server).

Using `OCIRepository` as intermediate `storage`-pointer CR is not an option as the control-loop of that resource would
"clone" any OCI Registry blob to its own local storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twice the same point?


Cons:
- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry
- To support Docker images, the ergistry must be run in compartibility mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The registry is supposed to store deployment descriptions (helm/kustomize/k8s manifests) and related files (for localization and configuration), so docker images should not really be an issue, right?
Or do we also plan that the users can use the registry to store the images for this environment (e.g. in combination with the replication controller)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No such plans at the moment. So, not an issue. Just taking a note of a potential limitation for the future.

ikhandamirov and others added 3 commits February 12, 2025 09:47
Co-authored-by: Fabian Burth <fabian.burth@sap.com>
Co-authored-by: Fabian Burth <fabian.burth@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create ADR for Single Layer OCI Artifacts
6 participants