Skip to content

STF 1.5.4 release ops #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Feb 14, 2024
Merged

STF 1.5.4 release ops #574

merged 41 commits into from
Feb 14, 2024

Conversation

vkmc
Copy link
Collaborator

@vkmc vkmc commented Feb 8, 2024

leifmadsen and others added 30 commits October 26, 2023 10:04
Add a .gitleaks.toml file to avoid the false positive leak for the
example certificate when deploying for Elasticsearch.
Update the check to use bool filter instead of a bar var.
By default, ansible parses vars as strings, and without the | bool
filter, this check is invalid, as it will always resolve to true, since
it is a non-empty string. Other instances of the same check did this,
but this one was missed.
* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir

This will allow configuration of the repo clone destination, so we can
use pre-cloned dirs instead of explicitly cloning the dirs each time.

This is essential for CI systems like zuul, that set-up the repos with
particular versions/branches prior to running the test scripts.

* [zuul] List the other infrawatch repos as required for the job

* [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars

Add in the repo dir locations where the repos should be pre-cloned by
zuul

* Replace base_dir with sto_dir

* set sto_dir relative to base_dir is it isn't already set

* [ci] use absolute dir for requirements.txt

* [ci] Update sto_dir using explicit reference

zuul.project.src_dir refers to the current project dir. When using the jobs
in another infrawatch project, this becomes invalid.
Instead, sto_dir is explicitly set using
zuul.projects[<project_name>].src_dir, the same way that the other repo dirs
are set in vars-zuul-common

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>
* Fix qdr auth one_time_upgrade label check

* Fix incorrect variable naming on one_time_upgrade label check

* Adjust QDR authentication password generation (#520)

Adjust the passwords being generated for QDR authentication since
certain characters (such as colon) will cause a failure in the parsing
routine within qpid-dispatch. Updates the lookup function to only use
ascii_letters and digits and increases the length to 32 characters.

---------

Co-authored-by: Leif Madsen <lmadsen@redhat.com>
* [allow_skip_clone] Add docs for clone_repos and *_dir vars

* Align README table column spacing (#516)

* Align README table column spacing

* Update build/stf-run-ci/README.md

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

---------

Co-authored-by: Leif Madsen <lmadsen@redhat.com>
It appears that STO is not included explictly when running jobs from
SGO [1]. This will be the case in all the other repos.
This change explicitly add it, in case it's not already included by
zuul.

[1] https://review.rdoproject.org/zuul/build/edd8f17bfdac4360a94186b46c4cea3f
* QDR Auth in smoketest

* Added qdr-test as a mock of the OSP-side QDR
* Connection from qdr-test -> default-interconnect is TLS+Auth
* Collectors point at qdr-test instead of default-interconnect directly
* Much more realistic than the existing setup
* Eliminated a substitution in sensubility config
* Used default QDR basic auth in Jenkinsfile
* QDR Auth for infrared 17.1 script

* Fix missing substitution for AMQP_PASS in infrared script
* [allow_skip_clone] Use <repo>_dir instead of hardcoding all directories relative to base_dir

This will allow configuration of the repo clone destination, so we can
use pre-cloned dirs instead of explicitly cloning the dirs each time.

This is essential for CI systems like zuul, that set-up the repos with
particular versions/branches prior to running the test scripts.

* [zuul] List the other infrawatch repos as required for the job

* [zuul] Set the {sgo,sg-bridge,sg-core,prometheus-webhook-snmp}_dir vars

Add in the repo dir locations where the repos should be pre-cloned by
zuul

* Replace base_dir with sto_dir

* set sto_dir relative to base_dir is it isn't already set

* [ci] use absolute dir for requirements.txt

* [ci] Update sto_dir using explicit reference

zuul.project.src_dir refers to the current project dir. When using the jobs
in another infrawatch project, this becomes invalid.
Instead, sto_dir is explicitly set using
zuul.projects[<project_name>].src_dir, the same way that the other repo dirs
are set in vars-zuul-common

* [zuul] Define a project template for stf-crc-jobs

Instead of listing all the jobs for each preoject in-repo, and needing to update the list every time
that a new job is added, the project template can be updated and the changes propogated to the
other infrawatch projects

* [zuul] don't enable using the template

* Revert "[zuul] don't enable using the template"

This reverts commit 56e2009.

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>
* Restart QDR after changing the password

* Fixes bug reported here: #517 (comment)
* Avoids an extra manual step when changing password
* Would affect users who upgrade from earlier STF and subsequently enable basic auth
* Also users who need to change their passwords

* Fixing ansible lint

* Update roles/servicetelemetry/tasks/component_qdr.yml

* Adjust QDR restarts to account for HA

* [smoketest] Wait for qdr-test to be Running

* [smoketest] Wait for QDR password upgrade

* Remove zuul QDR auth override
* Add crc_ocp_bundle value to select OCP version
* zuul: add log collection post-task to get crc logs
* Add ocp v13 and a timeout to the job
* Update README for 17.1 IR test

Update the 17.1 infrared test script README to show how to deploy a
virtualized workload on the deployed overcloud infrastructure. Helps
with testing by providing additional telemetry to STF required in
certain dashboards.

* Update tests/infrared/17.1/README.md

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>

* Update tests/infrared/17.1/README.md

---------

Co-authored-by: Chris Sibbitt <csibbitt@redhat.com>
Support STF 1.5.3 starting at OpenShift version 4.12 due to
incompatibility with 4.11 due to dependency requirements. Our primary
target is support of OCP EUS releases.

Closes: STF-1632
The "Question the deployment" task didn't have
ignore_errors: true set, so when the task fails, the play
is finished. This means that we don't get to the
"copy logs" task and can't see the job logs in zuul.

ignore_errors is set to true to be consistent with other tasks
* update stf-collect-logs tasks
* Update log path
* solve log bugs in stf-run-ci tasks
* create log directory
Adjust the operator package dependency requirements to align to known
required versions. Primarily reduce the version of
openshift-cert-manager from 1.10 to 1.7 in order to support the
tech-preview channel which was previously used.

Lowering the version requirement allows for the
openshift-cert-manager-operator installed previously to be used during
the STF 1.5.2 to 1.5.3 update, removing the update from being blocked.

Related: STF-1636
Update the stf-run-ci base setup to no longer need testing against OCP
4.10 and earlier, meaning we can rely on a single workflow for
installation. Also update the deployment to use
cluster-observability-operator via the redhat-operators CatalogSource
for installation via use_redhat and use_hybrid strategies.
* [zuul] Add job to build locally and do an index-based deployment
* Only require Interconnect and Smart Gateway

Update the dependency management within Service Telemetry Operator to
only require AMQ Interconnect and Smart Gateway Operator, which is
enough to deploy STF with observabilityStrategy: none. Other Operators
can be installed in order to satisfy data storage of telemetry and
events.

Installation of cert-manager is also required, but needs to be
pre-installed similar to Cluster Observability Operator, either as a
cluster-scoped operator with the tech-preview channel, or a single time
on the cluster as a namespace scoped operator, which is how the
stable-v1 channel installs.

Documentation will be updated to adjust for this change.

Related: STF-1636

* Perform CI update to match docs install changes (#542)

* Perform CI update to match docs install changes

Update the stf-run-ci scripting to match the documented installation
procedures which landed in
infrawatch/documentation#513. These changes are
also reflected in #541.

* Update build/stf-run-ci/tasks/setup_base.yml

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>

* Also drop cert-manager project

The cert-manager project gets created with workload items when deploying
the cert-manager from the cert-manager-operator project. When removing
cert-manager this project is not cleaned up, so we need to delete it as
well.

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>
…545)

In [1], the validate_deployment step is successful, despite the
deployment not being successful.
This causes the job to timeout because the following steps continue to
run despite an invalid state.

To get the expected behaviour, the output should be checked for a string
indicating success.
i.e. * [info] CI Build complete. You can now run tests.
[2] shows the output for a successful run.

[1] https://review.rdoproject.org/zuul/build/245ae63e41884dc09353d938ec9058d7/console#5/0/144/controller
[2] https://review.rdoproject.org/zuul/build/802432b23da24649b818985b7b1633bb/console#5/0/82/controller
* Implement dashboard management

Implement a new configuration option graphing.grafana.dashboards.enabled
which results in dashboards objects being created for the Grafana
Operator. Previously loading dashboards would be done manually via 'oc
apply' using instructions from documentation.

The new CRD parameters to the ServiceTelemetry object allows the Service
Telemetry Operator to now make the GrafanaDashboard objects directly.

Related: OSPRH-825

* Drop unnecessary cluster roles

* Update CSV for owned parameter
* Only openshift auth will be allowed
* Auth to prometheus using token instead of basicauth

* Add present/absent logic to prometheus-reader resources

* s/password/token in smoketest output

* [zuul] Make nightly_bundles jobs non-voting (#551)

---------

Co-authored-by: Emma Foley <elfiesmelfie@users.noreply.github.com>
The way we generate our CSVs uses OLM's skipRange functionality. This is fine,
but using only this leads to older versions becoming unavailable after the
fact -- see the warning at [1].

By adding an optional spec.replaces to our CSV we allow update testing as
well as actual production updates for downstream builds that leverage it.

Populating the field requires knowledge of the latest-released bundle,
so we take it from an environment variable to be provided by the
builder. If this is unset we don't include the spec.replaces field at
all -- leaving previous behavior unchanged.

Resolves #559
Related: STF-1658

[1] https://olm.operatorframework.io/docs/concepts/olm-architecture/operator-catalog/creating-an-update-graph/#skiprange
Add optional spec.replaces field to CSV for update graph compliance
mgirgisf and others added 5 commits January 17, 2024 18:35
The nightly_bundle jobs will run once a day
Remove the hard-coded Prometheus version in the Prometheus template when
using observabilityStrategy use_redhat, which uses Cluster Observability
Operator to manage the Prometheus instance requests.

Previously this value was hard-coded to prevent a potential rollback
when moving from Community Prometheus Operator to Cluster Observability
Operator.

Resolves: JIRA#OSPRH-2140
STF can now be deployed in disconnected mode. This change updates
the features.operators.openshift.io/disconnected annotation to
reflect this.
* [stf-run-ci] Update validation check for bundle URLs

An empty string passed as the bundle URL will pass the existing test
of "is defined" and "is not None" and still be invalid.

The validation for the bundle URL can be done in one check per var:

* If the var is undefined, it becomes "", and the check fails, because of length
* If the var is None, there's an error because None does not have a length
* If the var is an empty string, the check fails because of the length

This simplifies the check and improves readability
@vkmc vkmc changed the title Release prep 1.5.4 STF 1.5.4 release ops Feb 8, 2024
@vkmc vkmc requested review from leifmadsen, elfiesmelfie, csibbitt and mgirgisf and removed request for leifmadsen and elfiesmelfie February 8, 2024 17:59
@vkmc vkmc changed the base branch from master to stable-1.5 February 8, 2024 18:09
Comment on lines +90 to +94
- name: github.com/infrawatch/service-telemetry-operator
- name: github.com/infrawatch/smart-gateway-operator
- name: github.com/infrawatch/sg-bridge
- name: github.com/infrawatch/sg-core
- name: github.com/infrawatch/prometheus-webhook-snmp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elfiesmelfie these don't need overrides for stable-1.5 do they?

CC: @mgirgisf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add override-checkout to point at the proposed-release branches for the repos in infrawatch.

Note: except - name: github.com/infrawatch/service-telemetry-operator this don't need to be override to be able to test the changes in this branch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure. They may need an override for a while, until zuul is merged into stable-1.5.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are additonal comments on this conversation below on Chris's proposed patch (which I unnecessarily updated).

I was over thinking this earlier, sorry for all the confusion.

TL;DR: this can stay as-is, but we need to use Depends-On to point at the open PRs until they merge.

leifmadsen and others added 2 commits February 11, 2024 05:35
Prefer usage of Grafana 9 container image from RHCC. Grafana 7 is EOL
upstream and receives no security support. Prefer use of Grafana 9 which
is still supported.
@vkmc vkmc requested a review from leifmadsen February 12, 2024 15:06
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/e95f906951de4c16998760160b0c36e8

stf-crc-ocp_412-local_build FAILURE in 39m 02s
stf-crc-ocp_413-local_build FAILURE in 37m 48s
stf-crc-ocp_414-local_build FAILURE in 36m 20s
stf-crc-ocp_412-local_build-index_deploy FAILURE in 47m 21s
stf-crc-ocp_413-local_build-index_deploy FAILURE in 39m 14s
stf-crc-ocp_414-local_build-index_deploy FAILURE in 46m 19s

Copy link
Collaborator

@csibbitt csibbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proposed a change to the zuul config based on the conversation. Rest LGTM

@elfiesmelfie
Copy link
Collaborator

check-rdo

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/eb0c042f666b44309c7b752bd0bc0a92

✔️ stf-crc-ocp_412-local_build SUCCESS in 34m 20s
✔️ stf-crc-ocp_413-local_build SUCCESS in 31m 47s
✔️ stf-crc-ocp_414-local_build SUCCESS in 32m 26s
✔️ stf-crc-ocp_412-local_build-index_deploy SUCCESS in 42m 38s
stf-crc-ocp_413-local_build-index_deploy FAILURE in 21m 40s
✔️ stf-crc-ocp_414-local_build-index_deploy SUCCESS in 41m 45s

@elfiesmelfie
Copy link
Collaborator

check-rdo

@elfiesmelfie
Copy link
Collaborator

The PRs to the other repos need to merge first. Zuul does not handle merging to this repo, so having the Depends-On PR merged is not enforced

@vkmc vkmc merged commit ad468f2 into stable-1.5 Feb 14, 2024
@vkmc vkmc deleted the release-prep-1.5.4 branch February 14, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants