-
Notifications
You must be signed in to change notification settings - Fork 36
refactor(RELEASE-1560): update 'publish-to-cgw' script #437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
efd71d9
to
23e851e
Compare
3d185d1
to
a452f17
Compare
e9c7ba1
to
4071eef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane to me, maybe we can get @swickersh or @pkhander to take a look before merging?
"productName": "product_name_1", | ||
"productCode": "product_code_1", | ||
"productVersionName": "1.1", | ||
"components": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Same comment as your PR in the catalog:
konflux-ci/release-service-catalog#975 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swickersh can I get your review here again ? :)
6615ddc
to
b43a095
Compare
b43a095
to
22fb455
Compare
8. Writes the final result, including processed, created, and skipped files, to a JSON file. | ||
9. Outputs the path of the generated result.json file to an output file. | ||
1. Reads a JSON snapshot containing data that has been injected with contentGateway, | ||
files and contentDir. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where/how is contentDir injected into the data that this script takes as input? Is the plan for the release-service-catalog task to do that before invoking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah thtat the plan the contentDir is injected before calling publish-to-cgw
https://github.com/konflux-ci/release-service-catalog/pull/975/files#diff-5b8756a69e7d7bf343436372e3a297add65fb59fe212ee781a3b91993fd62f14R1006
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually I have more follow up questions about this. Looking at your other PR, you only inject the contentDir to the snapshot. Why use the snapshot at all?
Doesn't the data_json have everything you need? This script looks like it expects data from the snapshot that doesn't exist at the moment (like the contentGateway section and files). Or am I misunderstanding this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it's a bit confusing, but we have the managed pipeline here, which then calls the internal pipeline to run here
So, in the managed pipeline, we have a task called apply-mapping, which injects all of the mapping into the snapshot.json here before calling the push-artifacts-to-cdn task. The reason for this is because we only pass in a snapshot.json into the internal pipeline here
That is the reason for using snapshot.json over data.json. I was confused myself when I first started the ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that you say that I recall seeing this before. Thanks for the clarification.
""" | ||
shortURL_base = "/pub/" | ||
if mirror_openshift_Push: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic needs to stay.
If the RPA contains contentGateway.mirrorOpenshiftPush = true
then the shortURL needs to have the /pub/cgw like shown here.
The documentation is a little out of date because of our current efforts to update the structure, but the description should be accurate if your curious. It defaults to false:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the line to our google doc with CGW RPA data just now to help avoid confusion. Sorry it wasn't already present in that doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update it no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks!
8e76f9a
to
1d05d24
Compare
1d05d24
to
e7764f6
Compare
errors = [] | ||
valid_components = [] | ||
|
||
components = data.get("components") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't components
nested under spec
?
should this be: components = data.get("spec", {}).get("components")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the actual snapshot, yes, but release runs a task called reduce-snapshot example pr https://console-openshift-console.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/k8s/ns/rhtap-releng-tenant/tekton.dev~v1~PipelineRun/managed-pkg6v/logs?taskName=reduce-snapshot, so at the end we just have application & components
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't realize that! Makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the script twice against a snapshot I have. First time the files were already present and the rollback functionality worked as expected.
2025-06-17 12:34:30,864 - WARNING - Rolling back created files due to failure (productId: 4010373, productVersionId: 4156167)
2025-06-17 12:34:40,357 - ERROR - Error processing component 1 (productName: Releng Test Product, productVersionName: 1.6.0): Failed to create file: API call failed: File is already present in the system!
I removed the files manually from CGW and ran the script again and it re-added them no problem. Verified they exist in dev portal
#omitted output
2025-06-17 12:51:25,444 - INFO - All files processed successfully.
This PR LGTM.
logging.info( | ||
f"Validation summary: {len(components)} total components {len(valid_components)}" | ||
f"valid components, {len(components) - len(valid_components)} skipped components" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your fstrings are getting concatenated. So output looks like this: INFO - Validation summary: 1 total components, 1valid components, 0 skipped components
logging.info( | |
f"Validation summary: {len(components)} total components {len(valid_components)}" | |
f"valid components, {len(components) - len(valid_components)} skipped components" | |
) | |
logging.info( | |
f"Validation summary: {len(components)} total components, " | |
f"{len(valid_components)} valid components, " | |
f"{len(components) - len(valid_components)} skipped components" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed ty
componentName = component["name"] | ||
files = component["files"] | ||
|
||
if dry_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could dry-runs still provide real product and version IDs?
I just did a dry run to test this script locally and it's a little confusing to have random ids.
If not practical to query the API, then i'd almost rather see an obvious fake/sample id like 9999999 or something.
Here is an actual script output:
./publish_to_cgw_wrapper --cgw_host https://developers.redhat.com/content-gateway/rest/admin --data_json "$NEW_SNAPSHOT" --dry_run
2025-06-17 12:28:57,908 - INFO - Validation summary: 1 total components 1valid components, 0 skipped components
2025-06-17 12:28:57,908 - INFO - Processing component: 1/1 (productName: Releng Test Product productVersionName: 1.6.0)
2025-06-17 12:28:57,908 - INFO - Generating metadata for files in /tmp/releases
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-arm64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-arm64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-darwin-amd64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-linux-amd64.gz
2025-06-17 12:28:57,908 - INFO - Processing file: releng-test-product-binaries-windows-amd64.gz
2025-06-17 12:28:57,908 - INFO - Created 5 files, Skipped 0 files.
2025-06-17 12:28:57,909 - INFO - Processed result:
[
{
"product_id": 97026,
"product_version_id": 84796,
"created_file_ids": [
574240,
775368,
225278,
837714,
769809
],
"no_of_files_processed": 5,
"no_of_files_created": 5,
"no_of_files_skipped": 0,
"metadata": [
{
"type": "FILE",
"hidden": false,
"invisible": false,
"shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-arm64.gz",
"productVersionId": 84796,
"downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-arm64.gz",
"label": "releng-test-product-binaries-darwin-arm64.gz"
},
{
"type": "FILE",
"hidden": false,
"invisible": false,
"shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-arm64.gz",
"productVersionId": 84796,
"downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-arm64.gz",
"label": "releng-test-product-binaries-linux-arm64.gz"
},
{
"type": "FILE",
"hidden": false,
"invisible": false,
"shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-darwin-amd64.gz",
"productVersionId": 84796,
"downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-darwin-amd64.gz",
"label": "releng-test-product-binaries-darwin-amd64.gz"
},
{
"type": "FILE",
"hidden": false,
"invisible": false,
"shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-linux-amd64.gz",
"productVersionId": 84796,
"downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-linux-amd64.gz",
"label": "releng-test-product-binaries-linux-amd64.gz"
},
{
"type": "FILE",
"hidden": false,
"invisible": false,
"shortURL": "/cgw/RelengTestProduct/1.6.0/releng-test-product-binaries-windows-amd64.gz",
"productVersionId": 84796,
"downloadURL": "/content/origin/files/sha256/69/691d5610f9f7c327facbf8856c5293c7a741b8ad2c4fa31775f3cca51c62e9dd/releng-test-product-binaries-windows-amd64.gz",
"label": "releng-test-product-binaries-windows-amd64.gz"
}
]
}
]
2025-06-17 12:28:57,909 - INFO - All files processed successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can make it more obvious. I don't think it's possible to actually call the API, at least for the RSC, since it's behind the VPN. I do have a mock script here: https://github.com/konflux-ci/release-service-catalog/blob/production/tasks/managed/publish-to-cgw/tests/mocks.sh, which could be added to the dry_run to make it more real.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured dry-run was also for local testing. But if it's simpler to just use all 9s or 0s or something, that's fine. I just think the random numbers might confuse someone.
elif file.endswith(".txt"): | ||
label = "Checksum" | ||
|
||
elif file_name.startswith("sha256") or file_name.startswith(component_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swickersh I do have a question. So each contentDir is always separate, right? If that's not the case, we will need to have a check here to ensure the same checksum is not added twice, which would cause failure later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content directory early on in the pipeline I believe is the location of the unsigned binaries where the product teams place them in their built image.
But as the pipeline progresses, the binaries are sorted by windows, mac and linux and then signed on the signing hosts. Afterwards, all the signed binaries are moved to a 'signed' directory.
That directory should contain 3 checksum files (sha245sum.txt, sha256sum.txt.gpg and sha256sum.txt.sig) as well as all the signed binaries.
It's this signed directory that should be used to push the files to cdn and then later publish those files to CGW with this script.
This is how it was originally with the push-binaries-to-dev portal. But now that push-artifacts-to-cdn has been created and some things were tweaked, I'm not 100% positive this is still the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know of a scenario where there could be duplicate checksum files.
The way the checksum generation and signing step works is as follows:
- binaries are signed
- binaries are moved to signed directory
- checksum script generates a sha256sum of every binary in the directory and outputs the list of checksums to a single sha256sum.txt file.
- that file is then signed by the checksum signing server and results in two additional files (sha256sum.txt.gpg and sha256sum.txt.sig)
3cb1f51
to
8696ecc
Compare
Thanks @swickersh for going in depth on the review of this PR! @seanconroy2021 if you need an approval from the team once Scott gives the all clear, I can give it |
No problem! |
Thank you guys :) I currently have one draft PR open, but it might be better to split it into two PRs to make it easier to review:
|
I'm not sure disk-images is switching away from staged.files just yet. I definitely think they should at some point, but that might require approval from Scott H. |
aa00dfd
to
def2277
Compare
Yeah, the sooner the better IMO |
I will open a ticket to follow up on it, as this ticket is already somewhat out of scope. |
LGTM! |
def2277
to
ec9b98b
Compare
Thank you @swickersh @pkhander whenever you get a chance @johnbieren can you give me approval :) |
Update the script to be used in the push-artifacts-to-cdn-task. * Now supports processing multiple components in a single run. * Data is now passed via the --data_json argument instead of file input. * Added support for `--dry_run` to simulate execution without making API calls. * Updated to support the new `components.files` and `components.contentGateway` structure. * Metadata generation is now based on `filenames` defined in the component data. Signed-off-by: Sean Conroy <sconroy@redhat.com>
ec9b98b
to
7a79289
Compare
Had to rebase |
Update the script to be used in the push-artifacts-to-cdn-task.
--dry_run
to simulate execution without making API calls.components.files
andcomponents.contentGateway
structure.filenames
defined in the component data.