Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single layer OCI Artifact ADR #92

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a0795d6
draft artifacts adr
frewilhelm Jan 9, 2025
80c24d0
update artifact discussion
frewilhelm Jan 10, 2025
20f18d1
add further options
frewilhelm Jan 10, 2025
042c8d0
clarify background for artifact ADR
frewilhelm Jan 13, 2025
c6a4915
Discuss pros and cons of FluxCDs OCIRepository
frewilhelm Jan 16, 2025
99248ae
add points to OCI registry discussion
frewilhelm Jan 17, 2025
db9f75c
another disadvantage for OCIRepository as artifact replacement
frewilhelm Jan 17, 2025
afc4d25
draft picture of controller process
frewilhelm Jan 17, 2025
f4c5174
discuss options for OCI Artifact
frewilhelm Jan 20, 2025
8f8ce35
replace png image with svg
frewilhelm Jan 20, 2025
97778e8
add new architecture diagram for ocm-controllers and fluxcd
frewilhelm Jan 20, 2025
f0539ff
update artifact ADR
frewilhelm Jan 22, 2025
66ca1dc
remove architecture picture since it is not scope of this PR
frewilhelm Jan 22, 2025
9c942f0
state preliminary decision on artifact resource
frewilhelm Jan 22, 2025
86ad703
add some context to the discussion
frewilhelm Jan 22, 2025
639d6b5
incorporate feedback
frewilhelm Jan 23, 2025
f27e785
add 'list of approvers' to the template
frewilhelm Jan 23, 2025
530cad8
adjust ADR to our template
frewilhelm Jan 24, 2025
3587e1e
more on registry options
ikhandamirov Jan 24, 2025
27c19ad
resolve discussion about the artifact RFC
frewilhelm Jan 24, 2025
60f1d27
finalize registry chapter
ikhandamirov Jan 30, 2025
0f184af
typo
ikhandamirov Jan 30, 2025
609b612
typo
ikhandamirov Jan 30, 2025
5c008db
Merge branch 'main' into adr_artifacts
ikhandamirov Jan 30, 2025
4e8bc54
Update docs/adr/artifacts.md
ikhandamirov Feb 12, 2025
2e4e0e6
Update docs/adr/artifacts.md
ikhandamirov Feb 12, 2025
0246ac2
Update artifacts.md
ikhandamirov Feb 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 271 additions & 0 deletions docs/adr/artifacts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
# Use Single Layer OCI Artifacts for (intermediate) blobs

* Status: proposed
* Deciders: @frewilhelm @ikhandamirov
* Approvers:

Technical Story: https://github.com/open-component-model/ocm-project/issues/333

## Context and Problem Statement

The controllers in this repository create artifacts/blobs that are used by one another. For example, the
component-controller creates an artifact containing the component descriptors from the specified component version.
Finally, the resource controller, or if specified the configuration controller, creates a blob as an artifact that holds
the resource that is consumed by the deployers.

Initially, it was planned to use a Custom Resource `artifact` type to represent these artifacts.
This `artifact` type was [defined][artifact-definition] to point to a URL and holds a "human-readable" identifier
`Revision` of a blob stored in a http-server inside the controller.

The `artifact` idea was part of a bigger [RFC][fluxcd-rfc] for `FluxCD`. Unfortunately, the change would be difficult
to communicate and potentially prompt security audits on FluxCDs customer side. Thus, the proposal was not acceptable
in the given format due to the differences on the watch on the `artifact` resource. This was tantamount to a rejection.

Therefore, the original purpose of that Custom Resource `artifact` is not present anymore. Additionally, the team
decided to not use a plain http-server but an internal OCI registry to store and publish its blobs that are produced by
the OCM controllers as single layer OCI artifacts.

Arguments (meeting notes from 26.11.2024):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Arguments (meeting notes from 26.11.2024):
Arguments:

- An OCI registry is a responsibility less to maintain
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for the development responsibility. One can argue that running an OCI registry in production is tricky too.

- Stop support at the level of the distribution spec of OCI
- OCI registries could provide GC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do they provide GC? AFAIK, the OCI GC is only for dangling blobs which would only be relevant for failed operations, right?

- We will need an abstraction that handles OCI registries anyway
Copy link
Contributor

@fabianburth fabianburth Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- We will need an abstraction that handles OCI registries anyway
- We will need an abstraction that handles OCI registries anyway to convert resources into a flux consumable format (= single layer oci artifact)


The following discussion concerns two major topics:
- How to store and reference the single layer OCI artifacts.
- How to setup the internal OCI registry and which one to use.

## Decision Drivers

- Reduce maintenance effort
- Fit into our use-cases (especially with FluxCD)

## Artifact (How to store and reference the single layer OCI artifacts)

An artifact in the current context describes a resource that holds an identity to a blob and a pointer where to find
the blob (currently a URL). In that sense, a producer can create an artifact and store this information and a consumer
can search for artifacts with the specific identity to find out its location.

In the current implementation the artifact is defined in the [openfluxcd/artifact repository][artifact-definition].

The ocm-controller `v1` implementation defined a `snapshot` type that serves similar purposes.
Its definition can be found in [open-component-model/ocm-controller][snapshot-definition].

### Comparison `artifact` vs `snapshot`

To enable the following option discussion, the fields of the CRs `artifact` and `snapshot` are compared:

#### Snapshot

From [ocm-controller v1 Architecture][ocm-controller-v1-architecture]:
_snapshots are immutable, Flux-compatible, single layer OCI images containing a single OCM resource.
Snapshots are stored in an in-cluster registry and in addition to making component resources accessible for
transformation, they also can be used as a caching mechanism to reduce unnecessary calls to the source OCM registry._

[`SnapshotSpec`][snapshot-spec]
- Identity: OCM Identity (map[string]string) (Created by [constructIdentity()][snapshot-create-identity])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to store an OCM identity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood it, the identity is used to create the OCI repository (==name) for the resource. Then, e.g. IsCached() is used to check if an OCI repository (with manifest) is already created or not (see code). However, I think that IsCached() only checks if the OCI repository exists using the identity. It does not validate if it is still the same. For that, we must compare the digest (which would mean that we have to download the resource again to calculate and compare the digest). I am not so happy about this^^.

@Skarlso did I get that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait a second.. this does not explain why it is stored in the SnapshotSpec. Will take a look again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases the identity is used to generate the OCI repository name (again). However, there are some other cases, e.g.:

  • Get the ComponentResourceVersion and ComponentVersion from the snapshot (the functions have 0 calls)
  • The Mutation reconciler needs the identity. Speculation - I guess to get the snapshot object by identity for mutations (=localization/configuration)
  • The FluxDeployer uses the identity to check the HelmChartVersion

I will take a look while implementing the snapshot for v2 and try to omit it if possible

- Digest: OCI Layer Digest (Based on [go-containerregistry OCI implementation][go-containerregistry-digest])
- Tag: The version (e.g. `latest`, `v1.0.0`, ..., see [reference][snapshot-version-ref]
- (Suspend)

[`SnapshotStatus`][snapshot-status]
- (Conditions)
- LastReconciledDigest:
- Purpose?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determine whether a new reconcilation is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a "normal" digest field be enough? I don't understand the prefix LastReconciled....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If digest is different than LastReconciledDigest it means that there was an error in between. Digest not always == LastReconciledDigest. LastReconciledDigest marks the last SUCCESSFULLY reconciled digest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, this field is part of the status. I would only expect successfully reconciled digests in the status. Honestly, I don't think this field is worth a discussion. We should store the digest in the status. The fieldname is irrelevant

- LastReconciledTag:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this field necessary (maybe similar discussion as above on LastReconciledDigest?)

- Purpose?
- RepositoryURL: Concrete URL pointing to the local registry including the service name
- (ObservedGeneration)

#### Artifact

[`ArtifactSpec`][artifact-spec]
- URL: HTTP address of the artifact as exposed by the controller managing the source
- Revision: "Human-readable" identifier traceable in the origin source system (commit SHA, tag, version, ...)
- Digest: Digest of the file that is stored (algo:checksum)
- Used to verify the artifact (see [artifact-digest-verify-ref][artifact-digest-verify-ref])
- LastUpdateTime: Timestamp of the last update of the artifact
- Size: Number of bytes in the file (decide beforehand on how to download the files)
- Metadata: Holds upstream information, e.g. OCI annotations (as map[string]string)

[`ArtifactStatus`][artifact-status]
- No fields

### Considered Options

* Option 1: Omit the `artifact`/`snapshot` concept
* Option 2: Use the `snapshot` implementation
* Option 3: Use the `artifact` implementation
* Option 4: Use `OCIRepository` implementation from `FluxCD`
* Option 5: Create a new custom resource

### Decision Outcome

Chosen option: "Option 2: Use the `snapshot` implementation", because it is already implemented in the
`ocm-controllers` v1 and fits our use-cases most.

#### Positive Consequences

- Most of the functionality is already implemented, can be copied, and adjusted/refactored to our design.

#### Negative Consequences

- Requires a transformer that transforms the `snapshot` resource in something that, for example, FluxCDs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transformation was the main reason the new controller architecture was developed in the first place. IMO you will need to further explain how a user is now expected to interact with the Artifact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, I am now wondering how the e2e flow will look like. Having the OCIRepository indirection in there, how do we get users to look into the right CRs for debugging? How would a deployment look like (roughly)?

`source-controller` can consume. For this, the FluxCDs `OCIRepository` resource seems predestined.

### Pros and Cons of the Options

#### Option 1: Omit the `artifact`/`snapshot` concept

Instead of using an intermediate Custom Resource as `artifact` or `snapshot`, one could update the status of the source
resource that is creating the blob could point to the location of that blob itself.

Pros:
- No additional custom resource needed.

Cons:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the loss of the extensibility of the the architecture provided by a common interface (artifact / snapshot) be listed as a con? I mean, we essentially tried to convince flux create separate cr's deliberately to allow this extensibility.

- Since there is a "real" blob in the storage, it should have a respective entity to represent it, e.g.
`artifact`/`snapshot`
Comment on lines +127 to +128
Copy link
Contributor

@fabianburth fabianburth Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that statement alone can be used as an argument. What are the implications of not having the entity represented?


#### Option 2: Use the `snapshot` implementation

Pros:
- Already implemented (and probably tested).
- Implemented for an OCI registry

Cons:
- Require a transformer to make the artifacts consumable by FluxCDs Helm- and Kustomize-Controller. E.g. by using
FluxCDs `source-controller` and its CR `OCIRepository`.
- Implemented in `open-component-model/ocm-controller` which will be archived, when the `ocm-controller` v2 go
productive. Thus, the `snapshot` implementation must be copied in this repository.

#### Option 3: Use the `artifact` implementation

Pros:
- Already implemented (and a bit tested)
- Rather easy and simple

Cons:
- Implemented for a plain http-server and not for OCI registry (check
[storage implementation][controller-manager-storage]). Thus, missing dedicated control-loop.
- Would require a custom deployment of controller of FluxCD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is not really true. Conceptually, you also only require a transformer (just as in option 2). The actual difference is that in option 2, your transformer does not have to do that much anymore, because your resource is already available in a flux compatible format (as a single layer oci artifact). So, your transformer is only on kubernetes cr level.
The actual con is that we'd have to maintain the storage server (copied from flux) AND additionally, the oci implementation from option 2, because that's what the transformer would have to do, still.


Basically rejected because we could only use the `Artifact` type definition and not the implementation for the storage.

#### Option 4: Use `OCIRepository` implementation from `FluxCD`

See [definition][oci-repository-type]. The type is part of the FluxCDs `source-controller`, which also
provides a control-loop for that resource.

Pros:
- No transformer needed for `FluxCD`s consumers Helm- and Kustomize-Controller
- Control-loop for `OCIRepository` is already implemented
- `OCIRepository` is an integration point with Flux and Argo

Cons:
- Integrating FluxCDs `source-controller` would be a hard dependency on that repository. It would be mandatory
to deploy the `source-controller`
- It is not possible to start the `source-controller` and only watch the `OCIRepository` type. It would
start all other control-loops for `kustomize`, `helm`, `git`, and more objects. This seems a bit of an
overkill.
- Using the `OCIRepository` control-loop would basically "clone" every blob from the OCI registry in FluxCD
local storage (plain http server).

Using `OCIRepository` as intermediate `storage`-pointer CR is not an option as the control-loop of that resource would
"clone" any OCI Registry blob to its own local storage.
Comment on lines +171 to +175
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

twice the same point?


#### Option 5: Create a new custom resource

Pros:
- Greenfield approach.
- Orientation on `snapshot` and `artifact` ease the implementation.

Cons:
- New implementation is required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a real con because we have to change code anyways. IMO this should be the goto if the only disadvantage here is that we have to develop something, after all thats why we are here. Im looking for Design Cons here, not Effort Cons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the true Con here is that it offers no benefit over the existing implementation


Creating a new custom resource seems like an overkill, considering that the `snapshot` implementation covers a lot of
our use-cases. Thus, it seems more reasonable to go with the `snapshot` implementation and adjust/refactor that.


## (Internal) OCI Registry

This in-cluster HTTPS-based registry is used by the OCM controllers to store resources locally. It should never be accessible from outside, thus it is transparent to the users of the OCM controller. At the same time the registry is accessible for Flux, running in the same cluster.


### Considered Options

* Option 1: Let the user provide a registry that is OCI compliant
* Option 2: Deploy an OCI image registry with our controllers
* Option 2.1: Use implementation from ocm-controllers v1
* Option 2.2: Use [`zot`](https://github.com/project-zot/zot)


### Decision Outcome

Chosen option:
- Option 2.2, i.e. the decision is to use `zot` as the in-cluster OCI registry for OCM controllers
- Once there is an installer for OCM controller, it should provide the users with a possibility to configure an own registry instead of embedded `zot`, either an in-cluster or an external one

To select the registry, no comprehensive benchmarking tests have been performed. The decision is based on the impression that `zot` is meanwhile being more actively maintained and will incorporate innovation faster. The registry comes with an [extensive feature set](https://zotregistry.dev/v2.1.2/general/features/), sufficient for the OCM controllers use case. The first tests have shown that OCM controllers are able to work with `zot`.


### Pros and Cons of the Options

### Option 1: Let the user provide a registry that is OCI compliant

Pros:
- Not our responsibility
- Users can customize their OCI registry like they want

Cons:
- We offer full support for `zot` only. Using a different OCI registry would be at risk of the user of OCM controllers.
- Most people need to operate a registry then and the majority would not have experience maintaining a production grade stable oci registry as a service
- Giving a possibility to the user to provide/configure an own registry does not eliminate the need to provide a default registry (option 2), especially to those users who do not want to customize an own registry.

#### Option 2: Deploy an OCI image registry with our controllers

Pros:
- Simplifies deployment choices and stability guarantees for us.

##### Option 2.1: Use implementation from ocm-controllers v1 ([distribution registry](https://github.com/distribution/distribution))

Pros:
- Faster implementation time, as deployment can be copied from v1 implementation
- Mature technology (almost legacy)
- Smaller image size (25 MB)

Cons:
- Seldom releases (latest stable from October 2, 2023)

##### Option 2.2: Use [`zot`](https://github.com/project-zot/zot)

Pros:
- Newer technology, focusing on embedding into other products, inline garbage collection and storage deduplication
- Nice documentation
- FluxCD team mentioned (verbally) that they want to use a `zot` OCI registry in the future (though no 100% guarantee or any evidence that they started working on this so far)
- Being actively maintained (several stable releases per year)

Cons:
- Potentially longer implementation time, as it involves learing how to deploy, configure and operate a new registry
- To support Docker images, the registry must be run in compatibility mode
- Bigger image size: 69 MB the minimal version and 208 MB the full version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size on a registry that is optionally deployed and used for development is not a real concern imo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not a real concern

Sure. That is why the decision is still for zot. I'd keep the point in the document, because that is a fact. But in case you insist, I can also remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size has nothing to do with zot or not. A fact that is irrelevant for a decision does not need to be tracked.

Its like saying "zot offers docker compatibility, but only with a flag", sure its a fact, but its not relevant to the decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your judgement about the relevance of certain pieces of information is based on assumptions, which are only known to you. Would be good, if you could share, what you think the decision criteria are, for choosing between zot and registry:2. Btw., in different talks with team members both the size and the compatibility flag were mentioned at least as potentially relevant.


# Links
- Epic [#75](https://github.com/open-component-model/ocm-k8s-toolkit/issues/75)
- Issue [#90](https://github.com/open-component-model/ocm-k8s-toolkit/issues/90)

[artifact-definition]: https://github.com/openfluxcd/artifact/blob/d9db932260eb5f847737bcae3589b653398780ae/api/v1alpha1/artifact_types.go#L30
[fluxcd-rfc]: https://github.com/fluxcd/flux2/discussions/5058
[snapshot-definition]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/api/v1alpha1/snapshot_types.go#L22
[ocm-controller-v1-architecture]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/docs/architecture.md
[snapshot-spec]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/api/v1alpha1/snapshot_types.go#L22
[snapshot-status]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/api/v1alpha1/snapshot_types.go#L35
[artifact-spec]: https://github.com/openfluxcd/artifact/blob/d9db932260eb5f847737bcae3589b653398780ae/api/v1alpha1/artifact_types.go#L30
[artifact-status]: https://github.com/openfluxcd/artifact/blob/d9db932260eb5f847737bcae3589b653398780ae/api/v1alpha1/artifact_types.go#L62
[go-containerregistry-digest]: https://github.com/google/go-containerregistry/blob/6bce25ecf0297c1aa9072bc665b5cf58d53e1c54/pkg/v1/manifest.go#L47
[snapshot-version-ref]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/controllers/resource_controller.go#L212
[snapshot-create-identity]: https://github.com/open-component-model/ocm-controller/blob/8588071a05532abd28916931963f88b16622e44d/controllers/resource_controller.go#L287
[artifact-digest-verify-ref]: https://github.com/openfluxcd/controller-manager/blob/d83030b764ab4f143d4b9a815227ad3cdfd9433f/storage/storage.go#L478
[oci-repository-type]: https://github.com/fluxcd/source-controller/blob/529eee0ed1afc6063acd9750aa598d90ae3399ed/api/v1beta2/ocirepository_types.go#L296
[controller-manager-storage]: https://github.com/openfluxcd/controller-manager/blob/d83030b764ab4f143d4b9a815227ad3cdfd9433f/storage/storage.go
[watch-resource-controller]: https://github.com/open-component-model/ocm-k8s-toolkit/blob/108ac97815258cef41cf8f340c99b45f7bdd5023/internal/controller/resource/resource_controller.go#L86
1 change: 1 addition & 0 deletions docs/adr/template.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

* Status: [proposed | rejected | accepted | deprecated | … | superseded by [ADR-0005](0005-example.md)] <!-- optional -->
* Deciders: [list everyone involved in the decision] <!-- optional -->
* Approvers: [list everyone that approved the decision] <!-- optional -->
* Date: [YYYY-MM-DD when the decision was last updated] <!-- optional -->

Technical Story: [description | ticket/issue URL] <!-- optional -->
Expand Down
Loading