Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723

chrisahl · 2024-07-18T17:44:39Z

We are attempting to do a bulk create of 20 ROSA clusters at a time in the same AWS account and region. It appears that there is some throttling of 13 ROSA creates at a time, so it takes until one of these 13 complete until any additional ROSA deploys start running. This is leading to us seeing timeouts.

Is there a dynamic way to change the hard coded value of:

terraform-provider-rhcs/provider/clusterrosa/common/consts.go

Line 18 in 58b45a1

DefaultWaitTimeoutForHCPControlPlaneInMinutes = int64(20)

?

Any reason for the 20 min vs something larger? Any other suggestions for achieving higher success rates?

Thanks.

willgarcia · 2024-09-16T02:59:22Z

Hi @chrisahl

I am a user of this provider and suspect having the same issue when deploying clusters in bulk.

My clusters show in ready state but Terraform fails with the following error: "Waiting for cluster creation finished with the error".

Is that the error you see?

According to the different places in code showing this message, the actual error should be added at the end of the error message but that does not seem to be the case for me. I would like to confirm it is timeout related.

At TF re-run, the clusters get deleted as well because of the TF state erroring, so it takes a long time to get lucky.

chrisahl · 2024-09-16T12:11:52Z

@willgarcia In my case I get an error saying the error is "installing" because it is timing out. I think it would be good if DefaultWaitTimeoutForHCPControlPlanInMinutes was parameterized similar to how DefaultWaitTimeoutInMinutes has the ability to use
resource "rhcs_cluster_wait" "rosa_cluster" {
cluster = rhcs_cluster_rosa_classic.rosa_sts_cluster.id

timeout in minutes

timeout = 60
}

because different AWS regions take longer than others to provision based on your geo location and time of day/load.

chrisahl · 2024-10-22T18:28:53Z

https://issues.redhat.com/browse/OCM-12006 was recently opened and may help get this addressed

willgarcia · 2025-01-30T08:18:32Z

@chrisahl any chance you could extract some info from the OCM issue ( I cannot access it). This issue is still present today for me:

�[31m│�[0m �[0mWaiting for cluster creation finished with the error cluster

| 2025-01-30T08:01:13.419Z | �[31m│�[0m �[0m'2gkdnejl6bp56kvp311q5ikkaal54sm9' is in state 'installing'

chrisahl · 2025-01-30T14:51:29Z

So terraform-redhat/rhcs made some changes in 1.6.8 that make parameter available but we found out the hard way that if you do not define the new variable the terraform module dies(they have made a fix for this but not a new release). Since we were close to release, we pinned to 1.6.7 because that is what we tested with.

max_cluster_wait_timeout_in_minutes (Number) This value sets the maximum duration in minutes to wait for the cluster to be in a ready state.

See #858

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723

Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723

chrisahl commented Jul 18, 2024

willgarcia commented Sep 16, 2024 •

edited

Loading

chrisahl commented Sep 16, 2024

chrisahl commented Oct 22, 2024

willgarcia commented Jan 30, 2025

chrisahl commented Jan 30, 2025 •

edited

Loading

Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723

Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723

Comments

chrisahl commented Jul 18, 2024

willgarcia commented Sep 16, 2024 • edited Loading

chrisahl commented Sep 16, 2024

timeout in minutes

chrisahl commented Oct 22, 2024

willgarcia commented Jan 30, 2025

�[31m│�[0m �[0mWaiting for cluster creation finished with the error cluster

chrisahl commented Jan 30, 2025 • edited Loading

willgarcia commented Sep 16, 2024 •

edited

Loading

chrisahl commented Jan 30, 2025 •

edited

Loading