Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all PodMonitor for instances that are already paused/hibernated #1090

Merged
merged 1 commit into from
Dec 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 56 additions & 1 deletion tembo-operator/src/cloudnativepg/hibernate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ pub async fn reconcile_cluster_hibernation(cdb: &CoreDB, ctx: &Arc<Context>) ->
"enabled"
};

let cluster_annotations = cluster.metadata.annotations.unwrap_or_default();
let cluster_annotations = cluster.metadata.annotations.clone().unwrap_or_default();
let hibernation_value = if cdb.spec.stop { "on" } else { "off" };

// Build the hibernation patch we want to apply to disable the CNPG cluster.
Expand Down Expand Up @@ -229,6 +229,10 @@ pub async fn reconcile_cluster_hibernation(cdb: &CoreDB, ctx: &Arc<Context>) ->
return Err(action);
}

// If CNPG is already hibernated then there maybe a dangling PodMonitor still present
// This will not get cleaned up if already hibernated. We need to remove it manually
cleanup_hibernated_podmonitor(ctx, namespace, name.clone(), cdb, &cluster).await?;

patch_cluster_merge(cdb, ctx, patch_hibernation_annotation).await?;
info!(
"Toggled hibernation annotation of {} to '{}'",
Expand Down Expand Up @@ -402,6 +406,57 @@ async fn patch_appservice_deployments(
Ok(())
}

/// Cleans up any dangling PodMonitor resources for a hibernated CloudNativePostgreSQL cluster.
///
/// When a CNPG cluster is hibernated, there might be leftover PodMonitor resources that
/// need manual cleanup. This function handles that cleanup process.
///
/// # Arguments
///
/// * `client` - Kubernetes client for API operations
/// * `namespace` - Namespace where the cluster and PodMonitor reside
/// * `name` - Name of the cluster and associated PodMonitor
/// * `cdb` - Reference to the CloudNativePostgreSQL resource
/// * `cluster` - Reference to the Cluster resource
///
/// # Returns
///
/// * `Ok(())` if the cleanup was successful or if no action was needed
/// * `Err(Error)` if there was an error during cleanup that requires requeuing
///
/// # Errors
///
/// Returns an error if the PodMonitor deletion fails for reasons other than the resource not existing.
async fn cleanup_hibernated_podmonitor(
ctx: &Arc<Context>,
namespace: String,
name: String,
cdb: &CoreDB,
cluster: &Cluster,
) -> Result<(), Action> {
if cdb.spec.stop && is_cluster_hibernated(cluster) {
let client = ctx.client.clone();
let podmonitor_api: Api<podmon::PodMonitor> = Api::namespaced(client, &namespace);
match podmonitor_api.delete(&name, &DeleteParams::default()).await {
Ok(_) => {
info!("Deleted PodMonitor for hibernated cluster {}", name);
}
Err(kube::Error::Api(api_err)) if api_err.code == 404 => {
debug!("No PodMonitor found for hibernated cluster {}", name);
}
Err(e) => {
warn!(
"Could not delete PodMonitor for hibernated cluster {}; retrying",
name
);
debug!("Caught error {}", e);
return Err(requeue_normal_with_jitter());
}
}
}
Ok(())
}

async fn update_pooler_instances(
pooler: &Option<Pooler>,
cdb: &CoreDB,
Expand Down