Skip to content

Add federation to skmo#3766

Open
vakwetu wants to merge 10 commits intoopenstack-k8s-operators:mainfrom
vakwetu:add-federation-to-skmo
Open

Add federation to skmo#3766
vakwetu wants to merge 10 commits intoopenstack-k8s-operators:mainfrom
vakwetu:add-federation-to-skmo

Conversation

@vakwetu
Copy link
Contributor

@vakwetu vakwetu commented Mar 13, 2026

This adds cinder-volume and federation support to the SKMO scenario.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign michburk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3f0763b046e041c18b43ec998692e6d3

openstack-k8s-operators-content-provider FAILURE in 10m 52s
⚠️ podified-multinode-edpm-deployment-crc SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
⚠️ cifmw-crc-podified-edpm-baremetal-minor-update SKIPPED Skipped due to failed job openstack-k8s-operators-content-provider
✔️ cifmw-pod-zuul-files SUCCESS in 4m 34s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 05s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 53s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 55s
✔️ cifmw-architecture-validate-hci SUCCESS in 3m 46s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 21s
✔️ cifmw-molecule-federation SUCCESS in 1m 59s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/735d0c0530b44e039353be5e0993611a

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 46m 16s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 21m 45s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 33m 32s
cifmw-crc-podified-edpm-baremetal-minor-update RETRY_LIMIT in 24m 48s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 46s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 01s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 5m 26s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 53s
✔️ cifmw-architecture-validate-hci SUCCESS in 3m 51s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 19s
✔️ cifmw-molecule-federation SUCCESS in 1m 33s

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f424a1444f9247a78d0afc7cb1f4660f

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 11m 03s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 24m 05s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 23m 46s
✔️ cifmw-crc-podified-edpm-baremetal-minor-update SUCCESS in 1h 55m 56s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 54s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 9m 37s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 52s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 18s
cifmw-architecture-validate-hci FAILURE in 3m 34s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 52s
✔️ cifmw-molecule-federation SUCCESS in 2m 04s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 30s

@vakwetu vakwetu force-pushed the add-federation-to-skmo branch from f588376 to 6a74e12 Compare March 20, 2026 18:00
…ooks

Add support for Shared Keystone Multi-region OpenStack (SKMO)
deployments with cross-region Barbican keystone listener:

Playbooks:
- prepare-leaf.yaml: Pre-stage hook that creates a TransportURL CR
  in the central region for the leaf's barbican-keystone-listener,
  copies the generated secret to the leaf namespace, extracts
  rootca-internal CA cert from central and adds it to the leaf's
  custom-ca-certs bundle, and waits for central Keystone and
  openstackclient readiness with retry logic
- configure-leaf-listener.yaml: Post-stage hook that patches the
  leaf OpenStackControlPlane with the cross-region transport_url
  for the barbican-keystone-listener
- trust-leaf-ca.yaml: Post-stage hook that extracts the leaf
  region's rootca-public and rootca-internal CA certs and adds
  them to the central region's custom-ca-certs bundle
- ensure-central-ca-bundle.yaml: Ensures the central CA bundle
  secret exists before the leaf control plane deployment

Scenario:
- va-multi-skmo.yml reproducer scenario configuration
- multi-namespace-skmo architecture scenario symlink

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Ade Lee <alee@redhat.com>
@vakwetu vakwetu force-pushed the add-federation-to-skmo branch 3 times, most recently from 00dba0e to 7b69e43 Compare March 20, 2026 19:31
vakwetu and others added 9 commits March 20, 2026 15:37
…mespace SKMO scenario

Add a 4th extra disk to OCP VMs in the SKMO reproducer and enable the
devscripts MachineConfig-based cinder-volumes LVM VG setup:

- extra_disks_num: 3 -> 4 to provide a dedicated disk (/dev/vdd) for Cinder
- cifmw_devscripts_create_logical_volume: true to generate the MachineConfig
  that creates the cinder-volumes VG via a systemd unit at boot time
- cifmw_devscripts_cinder_volume_pvs: [/dev/vdd] to target the 4th disk
- cifmw_devscripts_enable_iscsi_on_ocp_nodes: true to enable iscsid on
  OCP nodes (required for the iSCSI target created by cinder-volume)

LVMS continues to use the original three disks (/dev/vda, /dev/vdb, /dev/vdc).

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Ade Lee <alee@redhat.com>
…ecret

Add a new variable cifmw_federation_ca_bundle_secret_name (default: "")
to the federation role.  When set, hook_controlplane_config.yml merges the
Keycloak CA certificate as a new key (keycloak-ca.crt) into the named
secret rather than creating a separate 'keycloakca' secret.  If the named
secret does not yet exist it is created automatically.

In merge mode the kustomization patch omits the spec.tls.caBundleSecretName
op-add, since the OpenStackControlPlane CR is assumed to already reference
the correct secret (e.g. custom-ca-certs in SKMO deployments).

When cifmw_federation_ca_bundle_secret_name is empty the original behaviour
is preserved for backward compatibility: a dedicated 'keycloakca' secret is
created and the kustomization patches spec.tls.caBundleSecretName to point
at it.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Two bugs in run_keycloak_setup.yml:
1. The 'until' condition wrapped its expression in {{ }} delimiters,
   which Ansible forbids in conditionals (causes a parse error).
2. map(attribute='metadata.labels') returns a dict per resource;
   select('match', ...) cannot regex-match a dict, causing
   'dict object has no attribute labels' at runtime.

Fix by removing the {{ }} and using dict2items + flatten to extract
label keys before applying the regex selector.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…re writing

The ansible.builtin.copy task that writes keystone_federation.yaml fails
if the destination directory does not yet exist. Add an explicit
ansible.builtin.file task (state: directory) immediately before the two
copy tasks so the directory is created on demand.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…test

The customServiceConfig patch that adds 'openid' to Keystone's
[auth] methods is applied during the control-plane kustomize deploy
(stage 5). By the time the leaf control-plane post_stage_run hooks
execute (including federation-post-deploy.yml), Keystone may not
have finished reconciling with the new config.

Domain/IdP/mapping/protocol creation succeed because they use the
existing password auth path; only get-token.sh (which authenticates
via openid) fails with HTTP 401 'unsupported method'.

Add a wait-for-Ready loop on the KeystoneAPI CR at the start of
hook_post_deploy.yml (retries=30, delay=20s = up to 10 min) so
the auth test only runs once Keystone has restarted with federation
configuration active.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
The kustomizations/controlplane/ directory is only consumed by the
edpm_prepare / ci_kustomize flow (CRC/devscripts deployments). In the
kustomize_deploy flow used by SKMO (deploy-architecture.sh), nothing
reads that directory, so the keystone_federation.yaml file was written
but never applied - leaving the OSCP unmodified.

Add Step 6 to hook_controlplane_config.yml that:
1. Checks whether the OpenStackControlPlane CR already exists.
2. If so, patches it directly via kubernetes.core.k8s (state: patched)
   with the httpdCustomization, customServiceConfig (openid methods),
   and (in dedicated-secret mode) spec.tls.caBundleSecretName.

The kustomization file is still written for backward compatibility with
deployments that use edpm_prepare (CRC/devscripts flow). The direct
patch is a no-op when the OSCP does not yet exist (fresh install with
CRC flow), making both paths safe.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
When deploy-architecture.sh is re-run against an existing deployment,
the federation domain, identity provider, mapping, group, project and
protocol may already exist in Keystone. The plain 'openstack X create'
commands fail with HTTP 409 Conflict in that case.

Fix by checking for the existence of each resource with 'openstack X show'
(failed_when: false, changed_when: false) before attempting to create it.
The create task is only run when the show returned rc != 0 (i.e. the
resource was not found).

Role-add is repeated unconditionally with failed_when: false because
the Keystone API makes it idempotent already.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
…ues template

The edpm-nodeset2-values template derives _vm_type by splitting the first
node name from the existing values.yaml (e.g. edpm-compute-0 -> compute).
It then uses _vm_type to find matching instances (startswith compute2-).

This creates a self-poisoning 3-run death spiral:

Run 1: nodes have git placeholder names (edpm-compute-0)
       -> _vm_type=compute -> finds compute2-* instances -> writes real
          hostnames (edpm-compute2-XXXXX-0) back to values.yaml

Run 2: nodes now have real CI hostnames (edpm-compute2-XXXXX-0)
       -> _vm_type=compute2 -> searches for compute22-* (does not exist)
       -> instances_names=[] -> writes nodes: null back to values.yaml

Run 3: nodes is null (Python None)
       -> None | default({}) returns None (default only fires for Undefined)
       -> None.keys() -> CRASH: None has no attribute keys

Fix with two changes:
1. Replace | default({}) with explicit None-safe conditional so that
   an explicit YAML null does not sneak through as Python None.
2. Strip trailing digits from the derived _vm_type so that after run 1
   rewrites node names, compute2 strips back to compute and the instance
   lookup continues to find compute2-* entries correctly on all subsequent
   runs.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
An OpenStackDataPlaneDeployment (OSDPD) is an immutable record of a single
deployment run. Once its Status.Deployed is true, the operator short-circuits
reconciliation with "Already deployed" and will never re-run jobs, even if
the referenced nodesets have since been updated with new content (e.g. new
SSH keys, new node config).

When ci-framework re-applies a deployment stage with oc apply and the OSDPD
already exists from a previous run, the operator ignores it. Meanwhile the
nodeset operator resets DeploymentReady=False because it detects that the
nodeset"s generation has advanced since the last deployment. This produces a
permanent deadlock: the nodeset waits for a deployment that will never run,
and the wait condition times out after 60 minutes.

The correct model is: one OSDPD per deployment *run*, not per nodeset.

Fix by auto-generating a timestamp suffix (YYYYMMDDHHMMSS) once at the
start of the first deployment stage and appending it to the name of every
OpenStackDataPlaneDeployment resource found in the kustomize build output
before applying it. The suffix is stable within a single ansible run (so
both edpm-deployment and edpm-deployment2 share the same suffix) but
differs across runs, producing names like:

  edpm-deployment-20260313215236
  edpm-deployment-20260314093012

Old OSDPDs are left in place as an audit trail of past runs. The operator
only acts on the new CR, so the deadlock cannot occur.

The suffix can be pinned by setting cifmw_kustomize_deploy_osdpd_suffix
explicitly (useful for idempotent re-runs of the same logical deployment).
Leave it empty (the default) for automatic timestamp generation.

Signed-off-by: Ade Lee <alee@redhat.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Made-with: Cursor
@vakwetu vakwetu force-pushed the add-federation-to-skmo branch from 7b69e43 to b0ed8a7 Compare March 20, 2026 19:37
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9a5b3bdb290346f4afb91921e37419c7

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 13m 36s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 26m 28s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 35m 50s
✔️ cifmw-crc-podified-edpm-baremetal-minor-update SUCCESS in 2h 00m 16s
✔️ cifmw-pod-zuul-files SUCCESS in 29m 17s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 59s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 25s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 09s
cifmw-architecture-validate-hci FAILURE in 3m 54s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 22s
✔️ cifmw-molecule-federation SUCCESS in 2m 03s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 04s

@vakwetu
Copy link
Contributor Author

vakwetu commented Mar 20, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/d1f9efba90624c1595998f89fea46d3e

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 13m 34s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 04s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 43s
✔️ cifmw-crc-podified-edpm-baremetal-minor-update SUCCESS in 1h 58m 41s
✔️ cifmw-pod-zuul-files SUCCESS in 4m 45s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 50s
✔️ cifmw-pod-k8s-snippets-source SUCCESS in 4m 32s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 45s
cifmw-architecture-validate-hci FAILURE in 4m 02s
✔️ cifmw-molecule-ci_gen_kustomize_values SUCCESS in 5m 13s
✔️ cifmw-molecule-federation SUCCESS in 2m 00s
✔️ cifmw-molecule-kustomize_deploy SUCCESS in 4m 09s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant