Skip to content

refactor(RELEASE-1560): update 'publish-to-cgw' script #437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2025

Conversation

seanconroy2021
Copy link
Member

@seanconroy2021 seanconroy2021 commented May 22, 2025

Update the script to be used in the push-artifacts-to-cdn-task.

  • Now supports processing multiple components in a single run.
  • Data is now passed via the --data_json argument instead of file input.
  • Added support for --dry_run to simulate execution without making API calls.
  • Updated to support the new components.files and components.contentGateway structure.
  • Metadata generation is now based on filenames defined in the component data.

mmalina
mmalina previously approved these changes May 22, 2025
@seanconroy2021 seanconroy2021 force-pushed the RELEASE-1560 branch 3 times, most recently from e9c7ba1 to 4071eef Compare June 3, 2025 12:03
Copy link
Collaborator

@johnbieren johnbieren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sane to me, maybe we can get @swickersh or @pkhander to take a look before merging?

"productName": "product_name_1",
"productCode": "product_code_1",
"productVersionName": "1.1",
"components": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
Same comment as your PR in the catalog:
konflux-ci/release-service-catalog#975 (review)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swickersh can I get your review here again ? :)

@seanconroy2021 seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 6615ddc to b43a095 Compare June 16, 2025 10:38
@seanconroy2021 seanconroy2021 changed the title refactor(RELEASE-1560): expect contentGateway under mapping.components refactor(RELEASE-1560): update 'publish-to-cgw' script Jun 16, 2025
@seanconroy2021 seanconroy2021 requested a review from swickersh June 16, 2025 10:57
8. Writes the final result, including processed, created, and skipped files, to a JSON file.
9. Outputs the path of the generated result.json file to an output file.
1. Reads a JSON snapshot containing data that has been injected with contentGateway,
files and contentDir.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where/how is contentDir injected into the data that this script takes as input? Is the plan for the release-service-catalog task to do that before invoking this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually I have more follow up questions about this. Looking at your other PR, you only inject the contentDir to the snapshot. Why use the snapshot at all?
Doesn't the data_json have everything you need? This script looks like it expects data from the snapshot that doesn't exist at the moment (like the contentGateway section and files). Or am I misunderstanding this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's a bit confusing, but we have the managed pipeline here, which then calls the internal pipeline to run here

So, in the managed pipeline, we have a task called apply-mapping, which injects all of the mapping into the snapshot.json here before calling the push-artifacts-to-cdn task. The reason for this is because we only pass in a snapshot.json into the internal pipeline here

That is the reason for using snapshot.json over data.json. I was confused myself when I first started the ticket.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that you say that I recall seeing this before. Thanks for the clarification.

"""
shortURL_base = "/pub/"
if mirror_openshift_Push:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic needs to stay.
If the RPA contains contentGateway.mirrorOpenshiftPush = true then the shortURL needs to have the /pub/cgw like shown here.
The documentation is a little out of date because of our current efforts to update the structure, but the description should be accurate if your curious. It defaults to false:

https://konflux.pages.redhat.com/docs/users/releasing/releasing-to-developer-portal.html#:~:text=customer%20facing%20description-,mirrorOpenshiftPush,-%3A%20Optional%20Default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the line to our google doc with CGW RPA data just now to help avoid confusion. Sorry it wasn't already present in that doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update it no problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks!

@seanconroy2021 seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 8e76f9a to 1d05d24 Compare June 16, 2025 15:06
@seanconroy2021 seanconroy2021 requested a review from a team June 17, 2025 08:30
errors = []
valid_components = []

components = data.get("components")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't components nested under spec?
should this be: components = data.get("spec", {}).get("components")

Copy link
Member Author

@seanconroy2021 seanconroy2021 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the actual snapshot, yes, but release runs a task called reduce-snapshot example pr https://console-openshift-console.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/k8s/ns/rhtap-releng-tenant/tekton.dev~v1~PipelineRun/managed-pkg6v/logs?taskName=reduce-snapshot, so at the end we just have application & components

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't realize that! Makes sense, thanks.

Copy link
Contributor

@swickersh swickersh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the script twice against a snapshot I have. First time the files were already present and the rollback functionality worked as expected.

2025-06-17 12:34:30,864 - WARNING - Rolling back created files due to failure (productId: 4010373, productVersionId: 4156167)
2025-06-17 12:34:40,357 - ERROR - Error processing component 1 (productName: Releng Test Product, productVersionName: 1.6.0): Failed to create file: API call failed: File is already present in the system!

I removed the files manually from CGW and ran the script again and it re-added them no problem. Verified they exist in dev portal

#omitted output
2025-06-17 12:51:25,444 - INFO - All files processed successfully.

This PR LGTM.

Comment on lines 110 to 118
logging.info(
f"Validation summary: {len(components)} total components {len(valid_components)}"
f"valid components, {len(components) - len(valid_components)} skipped components"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your fstrings are getting concatenated. So output looks like this: INFO - Validation summary: 1 total components, 1valid components, 0 skipped components

Suggested change
logging.info(
f"Validation summary: {len(components)} total components {len(valid_components)}"
f"valid components, {len(components) - len(valid_components)} skipped components"
)
logging.info(
f"Validation summary: {len(components)} total components, "
f"{len(valid_components)} valid components, "
f"{len(components) - len(valid_components)} skipped components"
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed ty

componentName = component["name"]
files = component["files"]

if dry_run:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could dry-runs still provide real product and version IDs?

I just did a dry run to test this script locally and it's a little confusing to have random ids.
If not practical to query the API, then i'd almost rather see an obvious fake/sample id like 9999999 or something.

Here is an actual script output:

./publish_to_cgw_wrapper --cgw_host https://developers.redhat.com/content-gateway/rest/admin --data_json "$NEW_SNAPSHOT" --dry_run
2025-06-17 12:28:57,908 - INFO - Validation summary: 1 total components 1valid components, 0 skipped components
2025-06-17 12:28:57,908 - INFO - Processing component: 1/1 (productName: Releng Test Product productVersionName: 1.6.0)
2025-06-17 12:28:57,908 - INFO - Generating metadata for files in /tmp/releases
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-arm64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-arm64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-amd64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-amd64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-windows-amd64.gz
2025-06-17 12:28:57,908 - INFO - Created 5 files, Skipped 0 files.
2025-06-17 12:28:57,909 - INFO - Processed result:
[
  {
    "product_id": 97026,
    "product_version_id": 84796,
    "created_file_ids": [
      574240,
      775368,
      225278,
      837714,
      769809
    ],
    "no_of_files_processed": 5,
    "no_of_files_created": 5,
    "no_of_files_skipped": 0,
    "metadata": [
      {
        "type": "FILE",
        "hidden": false,
        "invisible": false,
        "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-arm64.gz",
        "productVersionId": 84796,
        "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-arm64.gz",
        "label": "releng-test-product-binaries-darwin-arm64.gz"
      },
      {
        "type": "FILE",
        "hidden": false,
        "invisible": false,
        "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-arm64.gz",
        "productVersionId": 84796,
        "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-arm64.gz",
        "label": "releng-test-product-binaries-linux-arm64.gz"
      },
      {
        "type": "FILE",
        "hidden": false,
        "invisible": false,
        "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-amd64.gz",
        "productVersionId": 84796,
        "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-amd64.gz",
        "label": "releng-test-product-binaries-darwin-amd64.gz"
      },
      {
        "type": "FILE",
        "hidden": false,
        "invisible": false,
        "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-amd64.gz",
        "productVersionId": 84796,
        "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-amd64.gz",
        "label": "releng-test-product-binaries-linux-amd64.gz"
      },
      {
        "type": "FILE",
        "hidden": false,
        "invisible": false,
        "shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-windows-amd64.gz",
        "productVersionId": 84796,
        "downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-windows-amd64.gz",
        "label": "releng-test-product-binaries-windows-amd64.gz"
      }
    ]
  }
]
2025-06-17 12:28:57,909 - INFO - All files processed successfully.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can make it more obvious. I don't think it's possible to actually call the API, at least for the RSC, since it's behind the VPN. I do have a mock script here: https://github.com/konflux-ci/release-service-catalog/blob/production/tasks/managed/publish-to-cgw/tests/mocks.sh, which could be added to the dry_run to make it more real.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured dry-run was also for local testing. But if it's simpler to just use all 9s or 0s or something, that's fine. I just think the random numbers might confuse someone.

elif file.endswith(".txt"):
label = "Checksum"

elif file_name.startswith("sha256") or file_name.startswith(component_name):
Copy link
Member Author

@seanconroy2021 seanconroy2021 Jun 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swickersh I do have a question. So each contentDir is always separate, right? If that's not the case, we will need to have a check here to ensure the same checksum is not added twice, which would cause failure later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content directory early on in the pipeline I believe is the location of the unsigned binaries where the product teams place them in their built image.
But as the pipeline progresses, the binaries are sorted by windows, mac and linux and then signed on the signing hosts. Afterwards, all the signed binaries are moved to a 'signed' directory.
That directory should contain 3 checksum files (sha245sum.txt, sha256sum.txt.gpg and sha256sum.txt.sig) as well as all the signed binaries.
It's this signed directory that should be used to push the files to cdn and then later publish those files to CGW with this script.
This is how it was originally with the push-binaries-to-dev portal. But now that push-artifacts-to-cdn has been created and some things were tweaked, I'm not 100% positive this is still the case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know of a scenario where there could be duplicate checksum files.

The way the checksum generation and signing step works is as follows:

  1. binaries are signed
  2. binaries are moved to signed directory
  3. checksum script generates a sha256sum of every binary in the directory and outputs the list of checksums to a single sha256sum.txt file.
  4. that file is then signed by the checksum signing server and results in two additional files (sha256sum.txt.gpg and sha256sum.txt.sig)

@seanconroy2021 seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from 3cb1f51 to 8696ecc Compare June 17, 2025 17:41
@johnbieren
Copy link
Collaborator

Thanks @swickersh for going in depth on the review of this PR! @seanconroy2021 if you need an approval from the team once Scott gives the all clear, I can give it

@swickersh
Copy link
Contributor

Thanks @swickersh for going in depth on the review of this PR! @seanconroy2021 if you need an approval from the team once Scott gives the all clear, I can give it

No problem!
This LGTM.
@pkhander gets back from PTO tomorrow and it wouldn't hurt for him to have a quick once-over since he wrote the original script. But I think it's good.

@seanconroy2021
Copy link
Member Author

seanconroy2021 commented Jun 17, 2025

Thank you guys :)
Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.

I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

@swickersh
Copy link
Contributor

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.

I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H.
Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab.
It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

@seanconroy2021 seanconroy2021 force-pushed the RELEASE-1560 branch 2 times, most recently from aa00dfd to def2277 Compare June 18, 2025 11:24
@johnbieren
Copy link
Collaborator

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

@seanconroy2021
Copy link
Member Author

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

Thank you guys :) Also, in the RSC, I have updated the two pipelines. It made more sense since I had to change apply-mapping to support. .files instead of stage.files. which would somewhat break push-disk-images-to-cdn if it were left the old way.
I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:

  • push-disk-images-to-cdn
  • push-artifacts-to-cdn

I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. Rhel-ai has several RPAs that would need a little tweak, plus their schema in konflux-release-data gitlab. It might make more sense to do that now rather than having more conditionals in apply-mapping though. That's up to you guys

Yeah, the sooner the better IMO

I will open a ticket to follow up on it, as this ticket is already somewhat out of scope.

@pkhander
Copy link
Contributor

LGTM!

@seanconroy2021
Copy link
Member Author

Thank you @swickersh @pkhander whenever you get a chance @johnbieren can you give me approval :)

Update the script to be used in the push-artifacts-to-cdn-task.

  * Now supports processing multiple components
    in a single run.
  * Data is now passed via the --data_json
    argument instead of file input.
  * Added support for `--dry_run` to simulate execution
     without making API calls.
  * Updated to support the new `components.files`
     and `components.contentGateway` structure.
  * Metadata generation is now based on `filenames`
     defined in the component data.

Signed-off-by: Sean Conroy <sconroy@redhat.com>
@seanconroy2021
Copy link
Member Author

Had to rebase

@seanconroy2021 seanconroy2021 merged commit 937662a into konflux-ci:main Jun 20, 2025
3 checks passed
@seanconroy2021 seanconroy2021 deleted the RELEASE-1560 branch June 20, 2025 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants