refactor(RELEASE-1560): update 'publish-to-cgw' script #437

seanconroy2021 · 2025-05-22T11:38:50Z

Update the script to be used in the push-artifacts-to-cdn-task.

Now supports processing multiple components in a single run.
Data is now passed via the --data_json argument instead of file input.
Added support for --dry_run to simulate execution without making API calls.
Updated to support the new components.files and components.contentGateway structure.
Metadata generation is now based on filenames defined in the component data.

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

johnbieren

Seems sane to me, maybe we can get @swickersh or @pkhander to take a look before merging?

swickersh · 2025-06-03T12:54:44Z

publish-to-cgw-wrapper/test_publish_to_cgw_integration.py

+                        "productName": "product_name_1",
+                        "productCode": "product_code_1",
+                        "productVersionName": "1.1",
+                        "components": [


Hi,
Same comment as your PR in the catalog:
konflux-ci/release-service-catalog#975 (review)

@swickersh can I get your review here again ? :)

swickersh · 2025-06-16T13:58:14Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

-8. Writes the final result, including processed, created, and skipped files, to a JSON file.
-9. Outputs the path of the generated result.json file to an output file.
+1. Reads a JSON snapshot containing data that has been injected with contentGateway,
+   files and contentDir.


Where/how is contentDir injected into the data that this script takes as input? Is the plan for the release-service-catalog task to do that before invoking this?

Yeah thtat the plan the contentDir is injected before calling publish-to-cgw
https://github.com/konflux-ci/release-service-catalog/pull/975/files#diff-5b8756a69e7d7bf343436372e3a297add65fb59fe212ee781a3b91993fd62f14R1006

ack, thanks

Oh actually I have more follow up questions about this. Looking at your other PR, you only inject the contentDir to the snapshot. Why use the snapshot at all?
Doesn't the data_json have everything you need? This script looks like it expects data from the snapshot that doesn't exist at the moment (like the contentGateway section and files). Or am I misunderstanding this?

So it's a bit confusing, but we have the managed pipeline here, which then calls the internal pipeline to run here

So, in the managed pipeline, we have a task called apply-mapping, which injects all of the mapping into the snapshot.json here before calling the push-artifacts-to-cdn task. The reason for this is because we only pass in a snapshot.json into the internal pipeline here

That is the reason for using snapshot.json over data.json. I was confused myself when I first started the ticket.

Ah, now that you say that I recall seeing this before. Thanks for the clarification.

swickersh · 2025-06-16T14:02:19Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

    """
-    shortURL_base = "/pub/"
-    if mirror_openshift_Push:


This logic needs to stay.
If the RPA contains contentGateway.mirrorOpenshiftPush = true then the shortURL needs to have the /pub/cgw like shown here.
The documentation is a little out of date because of our current efforts to update the structure, but the description should be accurate if your curious. It defaults to false:

https://konflux.pages.redhat.com/docs/users/releasing/releasing-to-developer-portal.html#:~:text=customer%20facing%20description-,mirrorOpenshiftPush,-%3A%20Optional%20Default

I added the line to our google doc with CGW RPA data just now to help avoid confusion. Sorry it wasn't already present in that doc.

Will update it no problem.

great, thanks!

swickersh · 2025-06-17T15:02:29Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+    errors = []
+    valid_components = []
+
+    components = data.get("components")


Aren't components nested under spec?
should this be: components = data.get("spec", {}).get("components")

So the actual snapshot, yes, but release runs a task called reduce-snapshot example pr https://console-openshift-console.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/k8s/ns/rhtap-releng-tenant/tekton.dev~v1~PipelineRun/managed-pkg6v/logs?taskName=reduce-snapshot, so at the end we just have application & components

Ah, I didn't realize that! Makes sense, thanks.

swickersh

I ran the script twice against a snapshot I have. First time the files were already present and the rollback functionality worked as expected.

2025-06-17 12:34:30,864 - WARNING - Rolling back created files due to failure (productId: 4010373, productVersionId: 4156167)
2025-06-17 12:34:40,357 - ERROR - Error processing component 1 (productName: Releng Test Product, productVersionName: 1.6.0): Failed to create file: API call failed: File is already present in the system!

I removed the files manually from CGW and ran the script again and it re-added them no problem. Verified they exist in dev portal

#omitted output
2025-06-17 12:51:25,444 - INFO - All files processed successfully.

This PR LGTM.

swickersh · 2025-06-17T16:04:36Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+    logging.info(
+        f"Validation summary: {len(components)} total components {len(valid_components)}"
+        f"valid components, {len(components) - len(valid_components)} skipped components"
+    )


Your fstrings are getting concatenated. So output looks like this: INFO - Validation summary: 1 total components, 1valid components, 0 skipped components

Suggested change

logging.info(

f"Validation summary: {len(components)} total components {len(valid_components)}"

f"valid components, {len(components) - len(valid_components)} skipped components"

)

logging.info(

f"Validation summary: {len(components)} total components, "

f"{len(valid_components)} valid components, "

f"{len(components) - len(valid_components)} skipped components"

)

swickersh · 2025-06-17T16:30:16Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

+    componentName = component["name"]
+    files = component["files"]
+
+    if dry_run:


Could dry-runs still provide real product and version IDs?

I just did a dry run to test this script locally and it's a little confusing to have random ids.
If not practical to query the API, then i'd almost rather see an obvious fake/sample id like 9999999 or something.

Here is an actual script output:

./publish_to_cgw_wrapper --cgw_host https://developers.redhat.com/content-gateway/rest/admin --data_json "$NEW_SNAPSHOT" --dry_run 2025-06-17 12:28:57,908 - INFO - Validation summary: 1 total components 1valid components, 0 skipped components 2025-06-17 12:28:57,908 - INFO - Processing component: 1/1 (productName: Releng Test Product productVersionName: 1.6.0) 2025-06-17 12:28:57,908 - INFO - Generating metadata for files in /tmp/releases 2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-arm64.gz 2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-arm64.gz 2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-amd64.gz 2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-amd64.gz 2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-windows-amd64.gz 2025-06-17 12:28:57,908 - INFO - Created 5 files, Skipped 0 files. 2025-06-17 12:28:57,909 - INFO - Processed result: [ { "product_id": 97026, "product_version_id": 84796, "created_file_ids": [ 574240, 775368, 225278, 837714, 769809 ], "no_of_files_processed": 5, "no_of_files_created": 5, "no_of_files_skipped": 0, "metadata": [ { "type": "FILE", "hidden": false, "invisible": false, "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-arm64.gz", "productVersionId": 84796, "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-arm64.gz", "label": "releng-test-product-binaries-darwin-arm64.gz" }, { "type": "FILE", "hidden": false, "invisible": false, "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-arm64.gz", "productVersionId": 84796, "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-arm64.gz", "label": "releng-test-product-binaries-linux-arm64.gz" }, { "type": "FILE", "hidden": false, "invisible": false, "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-amd64.gz", "productVersionId": 84796, "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-amd64.gz", "label": "releng-test-product-binaries-darwin-amd64.gz" }, { "type": "FILE", "hidden": false, "invisible": false, "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-amd64.gz", "productVersionId": 84796, "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-amd64.gz", "label": "releng-test-product-binaries-linux-amd64.gz" }, { "type": "FILE", "hidden": false, "invisible": false, "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-windows-amd64.gz", "productVersionId": 84796, "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-windows-amd64.gz", "label": "releng-test-product-binaries-windows-amd64.gz" } ] } ] 2025-06-17 12:28:57,909 - INFO - All files processed successfully.

Yeah, I can make it more obvious. I don't think it's possible to actually call the API, at least for the RSC, since it's behind the VPN. I do have a mock script here: https://github.com/konflux-ci/release-service-catalog/blob/production/tasks/managed/publish-to-cgw/tests/mocks.sh, which could be added to the dry_run to make it more real.

I figured dry-run was also for local testing. But if it's simpler to just use all 9s or 0s or something, that's fine. I just think the random numbers might confuse someone.

seanconroy2021 · 2025-06-17T17:34:23Z

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py

-                elif file.endswith(".txt"):
-                    label = "Checksum"
-
+        elif file_name.startswith("sha256") or file_name.startswith(component_name):


@swickersh I do have a question. So each contentDir is always separate, right? If that's not the case, we will need to have a check here to ensure the same checksum is not added twice, which would cause failure later on.

The content directory early on in the pipeline I believe is the location of the unsigned binaries where the product teams place them in their built image.
But as the pipeline progresses, the binaries are sorted by windows, mac and linux and then signed on the signing hosts. Afterwards, all the signed binaries are moved to a 'signed' directory.
That directory should contain 3 checksum files (sha245sum.txt, sha256sum.txt.gpg and sha256sum.txt.sig) as well as all the signed binaries.
It's this signed directory that should be used to push the files to cdn and then later publish those files to CGW with this script.
This is how it was originally with the push-binaries-to-dev portal. But now that push-artifacts-to-cdn has been created and some things were tweaked, I'm not 100% positive this is still the case.

I don't know of a scenario where there could be duplicate checksum files.

The way the checksum generation and signing step works is as follows:

binaries are signed

binaries are moved to signed directory

checksum script generates a sha256sum of every binary in the directory and outputs the list of checksums to a single sha256sum.txt file.

that file is then signed by the checksum signing server and results in two additional files (sha256sum.txt.gpg and sha256sum.txt.sig)

johnbieren · 2025-06-17T18:54:04Z

Thanks @swickersh for going in depth on the review of this PR! @seanconroy2021 if you need an approval from the team once Scott gives the all clear, I can give it

swickersh · 2025-06-17T19:03:24Z

Thanks @swickersh for going in depth on the review of this PR! @seanconroy2021 if you need an approval from the team once Scott gives the all clear, I can give it

No problem!
This LGTM.
@pkhander gets back from PTO tomorrow and it wouldn't hurt for him to have a quick once-over since he wrote the original script. But I think it's good.

seanconroy2021 · 2025-06-17T19:38:17Z

Thank you guys :)
Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.

I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

push-disk-images-to-cdn
push-artifacts-to-cdn

swickersh · 2025-06-17T19:51:58Z

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.

I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

push-disk-images-to-cdn

push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H.
Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab.
It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

johnbieren · 2025-06-18T11:52:32Z

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

push-disk-images-to-cdn

push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

seanconroy2021 · 2025-06-18T15:00:58Z

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

push-disk-images-to-cdn

push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

I will open a ticket to follow up on it, as this ticket is already somewhat out of scope.

pkhander · 2025-06-19T13:52:18Z

LGTM!

seanconroy2021 · 2025-06-19T15:00:50Z

Thank you @swickersh @pkhander whenever you get a chance @johnbieren can you give me approval :)

Update the script to be used in the push-artifacts-to-cdn-task. * Now supports processing multiple components in a single run. * Data is now passed via the --data_json argument instead of file input. * Added support for `--dry_run` to simulate execution without making API calls. * Updated to support the new `components.files` and `components.contentGateway` structure. * Metadata generation is now based on `filenames` defined in the component data. Signed-off-by: Sean Conroy <sconroy@redhat.com>

seanconroy2021 · 2025-06-20T14:11:57Z

Had to rebase

seanconroy2021 requested a review from a team as a code owner May 22, 2025 11:38

seanconroy2021 mentioned this pull request May 22, 2025

refactor(RELEASE-1560): update to remove .contentGateway konflux-ci/release-service-catalog#973

Merged

4 tasks

seanconroy2021 force-pushed the RELEASE-1560 branch from efd71d9 to 23e851e Compare May 22, 2025 11:45

mmalina reviewed May 22, 2025

View reviewed changes

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py Show resolved Hide resolved

mmalina previously approved these changes May 22, 2025

View reviewed changes

seanconroy2021 mentioned this pull request May 22, 2025

refactor(RELEASE-1560): update to remove .contentGateway konflux-ci/release-service-catalog#975

Closed

4 tasks

seanconroy2021 dismissed mmalina’s stale review via c837b97 May 28, 2025 16:12

seanconroy2021 force-pushed the RELEASE-1560 branch 4 times, most recently from 3d185d1 to a452f17 Compare May 28, 2025 16:16

johnbieren reviewed May 28, 2025

View reviewed changes

publish-to-cgw-wrapper/publish_to_cgw_wrapper.py Outdated Show resolved Hide resolved

seanconroy2021 force-pushed the RELEASE-1560 branch 3 times, most recently from e9c7ba1 to 4071eef Compare June 3, 2025 12:03

johnbieren reviewed Jun 3, 2025

View reviewed changes

swickersh reviewed Jun 3, 2025

View reviewed changes

seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 6615ddc to b43a095 Compare June 16, 2025 10:38

seanconroy2021 changed the title ~~refactor(RELEASE-1560): expect contentGateway under mapping.components~~ refactor(RELEASE-1560): update 'publish-to-cgw' script Jun 16, 2025

seanconroy2021 force-pushed the RELEASE-1560 branch from b43a095 to 22fb455 Compare June 16, 2025 10:50

seanconroy2021 requested a review from swickersh June 16, 2025 10:57

swickersh reviewed Jun 16, 2025

View reviewed changes

seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 8e76f9a to 1d05d24 Compare June 16, 2025 15:06

seanconroy2021 requested a review from a team June 17, 2025 08:30

seanconroy2021 force-pushed the RELEASE-1560 branch from 1d05d24 to e7764f6 Compare June 17, 2025 08:31

swickersh reviewed Jun 17, 2025

View reviewed changes

seanconroy2021 commented Jun 17, 2025

View reviewed changes

seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 3cb1f51 to 8696ecc Compare June 17, 2025 17:41

seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from aa00dfd to def2277 Compare June 18, 2025 11:24

seanconroy2021 force-pushed the RELEASE-1560 branch from def2277 to ec9b98b Compare June 19, 2025 13:54

johnbieren approved these changes Jun 20, 2025

View reviewed changes

seanconroy2021 force-pushed the RELEASE-1560 branch from ec9b98b to 7a79289 Compare June 20, 2025 14:11

seanconroy2021 merged commit 937662a into konflux-ci:main Jun 20, 2025
3 checks passed

seanconroy2021 deleted the RELEASE-1560 branch June 20, 2025 15:52

refactor(RELEASE-1560): update 'publish-to-cgw' script #437

refactor(RELEASE-1560): update 'publish-to-cgw' script #437

Uh oh!

Conversation

seanconroy2021 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnbieren left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanconroy2021 Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

swickersh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanconroy2021 Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnbieren commented Jun 17, 2025

Uh oh!

swickersh commented Jun 17, 2025

Uh oh!

seanconroy2021 commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swickersh commented Jun 17, 2025

Uh oh!

johnbieren commented Jun 18, 2025

Uh oh!

seanconroy2021 commented Jun 18, 2025

Uh oh!

seanconroy2021 commented May 22, 2025 •

edited

Loading

seanconroy2021 Jun 17, 2025 •

edited

Loading

seanconroy2021 Jun 17, 2025 •

edited

Loading

seanconroy2021 commented Jun 17, 2025 •

edited

Loading