Conversation
…mber - when serviceAccountAnnotations contains iam.gke.io/gcp-service-account, the operator creates an IAMPolicyMember CR to bind the K8s service account to the GCP service account via roles/iam.workloadIdentityUser - GCP project ID is extracted from the SA email - if the IAMPolicyMember CRD is absent (Config Connector not installed), the operator sets a clear NotReady status rather than crashing - canary Job validates Workload Identity token from the GKE metadata server before allowing the StatefulSet to proceed - extracted shared CanaryConfig, canary_job_spec, and check_canary_conditions helpers to deduplicate canary logic between AWS and GCP - added IAMPolicyMember CRD types (hand-written minimal subset) - added conditional IAMPolicyMember and canary Job watchers in the controller - added unconditional RBAC rules for iampolicymembers.iam.cnrm.cloud.google.com - new unit tests covering GCP helpers and shared canary logic - added release notes
- run_canary_job now returns Result<CanaryResult, Error> instead of Result<(), Error> so callers can tell if the job succeeded or is still running - fixes a bug where a completed canary was treated as pending, causing the operator to loop indefinitely on NotReady - also fixes Config Connector IAMServiceAccount external ref format to use projects/PROJECT/serviceAccounts/EMAIL this appears to touch AWS code but it's only due to them sharing some common code, it's just a return type change.
| - resources: | ||
| - iampolicymembers | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - create | ||
| - patch | ||
| - delete | ||
| apiGroups: | ||
| - iam.cnrm.cloud.google.com |
There was a problem hiding this comment.
PodIdentityAssociation RBAC is conditional on .Values.awsPodIdentityAssociationCluster. Is it worth introducing a similar flag (e.g., .Values.gcpConfigConnector) for GCP? I believe the rules are harmless when the CRD doesn't exist so this is largely cosmetic.
There was a problem hiding this comment.
i did test that it was harmless and therefore left it in. but you're right we should do the same.
| .and_then(|s| s.service_account_annotations.as_ref()) | ||
| .and_then(|a| a.get("iam.gke.io/gcp-service-account")); | ||
|
|
||
| if let Some(gcp_sa_email) = gcp_sa_email { |
There was a problem hiding this comment.
We don't clean up removed iam.gke.io/gcp-service-account annotations. The AWS PIA block explicitly deletes the PodIdentityAssociation and canary job. As is, I think removing the annotation would leave IAMPolicyMember and the restate-wi-canary job orphaned. Does Config Connector handle IAMPolicyMember removal if not referenced? Either way, we should probably delete the job if sa is unset.
} else {
delete_job(name, &job_api, "restate-wi-canary").await?;
// also IAMPolicyMember cleanup if necessary
}There was a problem hiding this comment.
the operator owns the job and IAMPolicyMember via k8s owner references, so would be cleaned up when the parent CR is removed. i figured that was enough, wdyt?
There was a problem hiding this comment.
actually i think you're right. i'll clean it up
| // The eks-pod-identity-token volume is visible immediately at pod creation. | ||
| let name = config.name; | ||
| if let Ok(pods) = pod_api | ||
| .list(&ListParams::default().labels(&format!("batch.kubernetes.io/job-name={name}"))) |
There was a problem hiding this comment.
I don't quite understand the implications of removing the UID filter, but the code originally included batch.kubernetes.io/controller-uid={} - was it intentional? Is there a difference in behavior?
There was a problem hiding this comment.
this was a casualty of the refactor to reuse most of the existing job code so it could be shared, mostly, across providers. the UID wasn't available in the shared code path post-refactor. if there were somehow two jobs with the same name it might grab the wrong one but since no two in the same namespace can have the same name, i think it's fine. might be worth a clarifying comment though perhaps?
| if let Some(status) = &ipm.status | ||
| && let Some(conditions) = &status.conditions | ||
| && let Some(ready) = conditions.iter().find(|cond| cond.r#type == "Ready") | ||
| && ready.status == "True" | ||
| { | ||
| true | ||
| } else { | ||
| false | ||
| } |
There was a problem hiding this comment.
Nit: I think this can just be:
| if let Some(status) = &ipm.status | |
| && let Some(conditions) = &status.conditions | |
| && let Some(ready) = conditions.iter().find(|cond| cond.r#type == "Ready") | |
| && ready.status == "True" | |
| { | |
| true | |
| } else { | |
| false | |
| } | |
| ipm.status | |
| .as_ref() | |
| .and_then(|s| s.conditions.as_ref()) | |
| .and_then(|cs| cs.iter().find(|c| c.r#type == "Ready")) | |
| .is_some_and(|c| c.status == "True") |
There was a problem hiding this comment.
nice, thanks! i'll do this
- added cleanup of IAMPolicyMember and canary Job when annotation is removed - added comment explaining why UID filter isn't needed on pod lookup - simplified is_iam_policy_member_ready to functional chain style - made IAMPolicyMember RBAC conditional on gcpConfigConnector Helm value
this is basically the GCP equivalent of what we're doing for AWS with PodIdentityAssociations.
serviceAccountAnnotationscontainsiam.gke.io/gcp-service-account,the operator creates an
IAMPolicyMemberCR to bind the K8s service accountto the GCP service account via
roles/iam.workloadIdentityUserIAMPolicyMemberCRD is absent (Config Connector not installed), theoperator sets a clear
NotReadystatus rather than crashingbefore allowing the
StatefulSetto proceedCanaryConfig,canary_job_spec, andcheck_canary_conditionshelpers to deduplicate canary logic between AWS and GCP
IAMPolicyMemberCRD types (hand-written minimal subset)IAMPolicyMemberand canary Job watchers in the controlleriampolicymembers.iam.cnrm.cloud.google.comi tested this by:
iam.gke.io/gcp-service-account in serviceAccountAnnotationsand creates anIAMPolicyMemberCR with correct member format (serviceAccount:PROJECT.svc.id.goog[NS/restate]), role(
roles/iam.workloadIdentityUser), and resource ref (projects/PROJECT/serviceAccounts/EMAIL)IAMPolicyMemberand creates the IAM binding in GCPStatefulSetcreation on canary success, then marksRestateClusterasReadyNotReadywith message "IAMPolicyMember CRD not found - is Config Connector installed?" rather than crashingRestateClustertoReadyand removes the annotation from theServiceAccount