-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed DefaultWaitTimeoutForHCPControlPlaneInMinutes and timeouts while still installing #723
Comments
Hi @chrisahl I am a user of this provider and suspect having the same issue when deploying clusters in bulk. My clusters show in ready state but Terraform fails with the following error: "Waiting for cluster creation finished with the error". Is that the error you see? According to the different places in code showing this message, the actual error should be added at the end of the error message but that does not seem to be the case for me. I would like to confirm it is timeout related. At TF re-run, the clusters get deleted as well because of the TF state erroring, so it takes a long time to get lucky. |
@willgarcia In my case I get an error saying the error is "installing" because it is timing out. I think it would be good if DefaultWaitTimeoutForHCPControlPlanInMinutes was parameterized similar to how DefaultWaitTimeoutInMinutes has the ability to use timeout in minutestimeout = 60 because different AWS regions take longer than others to provision based on your geo location and time of day/load. |
https://issues.redhat.com/browse/OCM-12006 was recently opened and may help get this addressed |
@chrisahl any chance you could extract some info from the OCM issue ( I cannot access it). This issue is still present today for me: �[31m│�[0m �[0mWaiting for cluster creation finished with the error cluster| 2025-01-30T08:01:13.419Z | �[31m│�[0m �[0m'2gkdnejl6bp56kvp311q5ikkaal54sm9' is in state 'installing' |
So terraform-redhat/rhcs made some changes in 1.6.8 that make parameter available but we found out the hard way that if you do not define the new variable the terraform module dies(they have made a fix for this but not a new release). Since we were close to release, we pinned to 1.6.7 because that is what we tested with. max_cluster_wait_timeout_in_minutes (Number) This value sets the maximum duration in minutes to wait for the cluster to be in a ready state. See #858 |
We are attempting to do a bulk create of 20 ROSA clusters at a time in the same AWS account and region. It appears that there is some throttling of 13 ROSA creates at a time, so it takes until one of these 13 complete until any additional ROSA deploys start running. This is leading to us seeing timeouts.
Is there a dynamic way to change the hard coded value of:
terraform-provider-rhcs/provider/clusterrosa/common/consts.go
Line 18 in 58b45a1
Any reason for the 20 min vs something larger? Any other suggestions for achieving higher success rates?
Thanks.
The text was updated successfully, but these errors were encountered: