-
Couldn't load subscription status.
- Fork 27
Add Troubleshooting Guide and Improve Installer Script Variable Validation #234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Changes from all commits
4cbd02c
ca1e981
9c128e1
923961a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,183 @@ | ||
|
|
||
| # OpenShift on IBM PowerVS: Common Issues and Resolutions | ||
|
|
||
| This document lists common issues encountered when deploying OpenShift on IBM PowerVS using the `openshift-install-powervs` wrapper, along with their causes and resolutions. | ||
|
|
||
| --- | ||
|
|
||
| ## Terraform Stored Resource IDs | ||
|
|
||
| **Error:** | ||
|
|
||
| Error: cannot find resource with id <resource-id> | ||
|
|
||
| **Cause:** | ||
| Terraform retains deleted PowerVS resource IDs in its state or backup files. This often occurs after a Terraform rerun when instances or resources have changed in PowerVS. | ||
|
|
||
|
|
||
| **Resolution:** | ||
|
|
||
| Search for the stale ID in Terraform state or backup files: | ||
|
|
||
| ```bash | ||
| grep -R "<resource-id>" . | ||
| ``` | ||
|
|
||
| Remove stale state entries: | ||
|
|
||
| ```bash | ||
|
|
||
| terraform state rm <resource-name> | ||
| ``` | ||
|
|
||
| Re-run the apply: | ||
|
|
||
| ```bash | ||
| terraform apply | ||
| ``` | ||
|
|
||
| To rebuild specific worker or master nodes: | ||
|
|
||
| ```bash | ||
| terraform taint module.nodes.ibm_pi_instance.worker[0] | ||
| terraform apply | ||
| ``` | ||
|
|
||
| ## Bastion Node OS Compatibility | ||
|
|
||
| If getting errors regarding missing packages or incorrect storage type while using CentOS 10, switch to CentOS Stream 9 to avoid missing package errors or volume type mismatches. | ||
|
|
||
| Common Issues and Fixes | ||
|
|
||
| Missing Required Packages (e.g. Ansible) | ||
|
|
||
| **Error**: | ||
| Missing ansible or dependency packages during setup. | ||
|
|
||
| **Resolution**: | ||
| SSH into the bastion node using the generated key: | ||
| ssh -i id_rsa root@<bastion-external-ip> | ||
| sudo dnf install ansible | ||
|
|
||
| - note: you can also import using python and pip, if the above does not work. | ||
|
|
||
| **Error** | ||
| Incorrect Storage Type (e.g. "nfs" not recognized) | ||
|
|
||
| Error: "pi_volume_type" must contain a value from ["ssd", "standard", "tier1", "tier3"], got "" | ||
|
Comment on lines
+65
to
+67
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you share where you hit this? it's probably a bug in the tfvars There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line occurred on the pi_volume_type for the bastion storage type please see code bellow for full error: |
||
|
|
||
|
|
||
| **Resolution**: | ||
| Edit your variables.tf or corresponding .tfvars file: | ||
| bastion_storage_type = "tier3" | ||
|
|
||
| - if needed change the defautlt bastion_storage_type in variables.tf to the storage type you desire | ||
| - note you can easly find this by hitting CTRL + W and searching for `bastion_storage_type` | ||
|
|
||
|
|
||
| ## Re-installation / Network Name Conflict | ||
|
|
||
| **Error:** | ||
|
|
||
| Error: Network with name "ocp-net" already exists. | ||
|
|
||
|
|
||
| **Cause:** | ||
| On a subsequent UPI install attempt, Terraform tries to create a network with the same name that already exists. | ||
| PowerVS does not allow duplicate network names—even if the old network is inactive. | ||
|
|
||
| **Resolution:** | ||
|
|
||
| - Log into your PowerVS workspace. | ||
|
|
||
| - Delete or rename the existing ocp-net network or subnet. | ||
|
|
||
| - Re-run the installer: | ||
| ```bash | ||
| terraform apply ./openshift-install-powervs create | ||
| ``` | ||
|
|
||
| ## Remote-Exec Provisioning Errors | ||
|
|
||
| **Error:** | ||
|
|
||
| Terraform remote-exec provisioner failures | ||
|
|
||
|
|
||
| Cause: | ||
| These are transient SSH or remote-execution issues that occur during provisioning. | ||
|
|
||
| Resolution: | ||
| Re-run Terraform: | ||
|
|
||
| terraform apply | ||
|
|
||
|
|
||
| This typically resolves the issue automatically. | ||
| See ocp4-upi-powervs known issues for more details. ["OCP Known issues"]((https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md)) | ||
|
|
||
| 5. LPAR in WARNING State | ||
|
|
||
| Error: | ||
|
|
||
| Error: the operation cannot be performed when the lpar health in the WARNING State | ||
|
|
||
|
|
||
| Cause: | ||
| Terraform cannot modify instances whose PowerVS LPAR health is in WARNING state. | ||
| This often occurs after partial provisioning, failed networking setup, or API timeouts. | ||
|
|
||
| Resolution: | ||
|
|
||
| Check instance health: | ||
| ```bash | ||
| ibmcloud pi instance get <INSTANCE_ID> | ||
| ``` | ||
| **Note**: Due to RSCT daemon not being available for RHCOS, RHCOS instances in dashboard can show "Warning" Status, ignore this! | ||
|
|
||
| In console reboot instances by OS shutting down the instance, then restarting | ||
|
|
||
| To rebuild only specific nodes: | ||
| ```bash | ||
|
|
||
| terraform taint module.nodes.ibm_pi_instance.master[1] | ||
| terraform taint module.nodes.ibm_pi_instance.worker[0] | ||
| terraform apply | ||
| ``` | ||
|
|
||
| ## Missing or Outdated Images (RHEL / RHCOS) | ||
|
|
||
| **Error:** | ||
|
|
||
| Error: failed to perform Get Image Operation for image rhcos-4.15 | ||
| [pcloudCloudinstancesImagesGetNotFound] image does not exist. ID: rhcos-4.12 | ||
|
|
||
| **Cause:** | ||
| Terraform and the PowerVS provider reference image names (e.g. rhcos-4.15, rhel-8.3) that may not exist in your workspace. | ||
| The wrapper may also use the RHEL version for RHCOS images by mistake. | ||
|
|
||
| **Resolution:** | ||
|
|
||
| Option 1 — Import Pre-built Images | ||
|
|
||
| Use pre-built RHCOS and RHEL OVA images from IBM’s public repository. | ||
| See Christy Norman’s blog | ||
| for steps. ["Blog"](https://community.ibm.com/community/user/blogs/christy-norman/2024/08/06/import-pre-built-red-hat-coreos-ovas-into-powervs) | ||
|
|
||
| Option 2 — Update variables.tf | ||
|
|
||
| Set available image names manually: | ||
| ```bash | ||
|
|
||
| variable "rhel_image_name" { | ||
| default = "rhel-9.6" | ||
| } | ||
|
|
||
| variable "rhcos_image_name" { | ||
| default = "rhcos-4.19" | ||
| } | ||
| ``` | ||
| Option 3 — Export Versions Before Running | ||
| export RELEASE_VER=4.9 | ||
|
|
||
| Ensure RHEL and RHCOS versions are aligned and available. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -159,6 +159,67 @@ function output { | |
| $TF output "$output_var" | ||
| } | ||
|
|
||
| #------------------------------------------------------------------------- | ||
| # Check for required environment variables and display helpful information | ||
| #------------------------------------------------------------------------- | ||
| function check_required_env_vars { | ||
| missing_vars=0 | ||
|
|
||
| log "Checking required environment variables..." | ||
|
|
||
| # Check IBMCLOUD_API_KEY | ||
| if [[ -z "${IBMCLOUD_API_KEY}" ]]; then | ||
| warn "IBMCLOUD_API_KEY is not set" | ||
| echo " Description: IBM Cloud API key for authentication" | ||
| echo " How to set: export IBMCLOUD_API_KEY='your-api-key-here'" | ||
| echo "" | ||
| missing_vars=1 | ||
| fi | ||
|
|
||
| # Check RELEASE_VER (optional since we have default) | ||
| if [[ -z "${RELEASE_VER}" ]]; then | ||
| warn "RELEASE_VER is not set (will use default: 4.15 type 4.15 if you want to use defualt elsee export correct rhcos version)" | ||
| echo " Description: OpenShift release version to install" | ||
| echo " Default: " | ||
| echo " How to set: export RELEASE_VER='4.16'" | ||
| echo "" | ||
| else | ||
| log "Using RHCOS release version: ${RELEASE_VER}, to change run export RELEASE_VER='<version>'" | ||
| fi | ||
|
|
||
| # Check RHEL_SUBS_PASSWORD (optional) | ||
| if [[ -z "${RHEL_SUBS_PASSWORD}" ]]; then | ||
| warn "RHEL_SUBS_PASSWORD is not set" | ||
| echo " Description: RHEL subscription password for bastion nodes" | ||
| echo " Note: You can provide this during the 'variables' prompt or set it now" | ||
| echo " How to set: export RHEL_SUBS_PASSWORD='your-password-here'" | ||
| echo "" | ||
| fi | ||
|
|
||
| # Check NO_OF_RETRY (optional) | ||
| if [[ -z "${NO_OF_RETRY}" ]]; then | ||
| log "NO_OF_RETRY not set (using default: 5)" | ||
| else | ||
| log "Using retry count: ${NO_OF_RETRY}" | ||
| fi | ||
|
|
||
| # Check ARTIFACTS_VERSION (optional) | ||
| if [[ -z "${ARTIFACTS_VERSION}" ]]; then | ||
| log "ARTIFACTS_VERSION not set (using default: main)" | ||
| else | ||
| log "Using artifacts version: ${ARTIFACTS_VERSION}" | ||
| fi | ||
|
|
||
| echo "" | ||
|
|
||
| if [[ $missing_vars -eq 1 ]]; then | ||
| error "Required environment variables are missing. Please set them and try again." | ||
| fi | ||
|
|
||
| success "Environment variable check completed" | ||
| } | ||
|
Comment on lines
+162
to
+220
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above. |
||
|
|
||
|
|
||
| #------------------------------------------------------------------------- | ||
| # Util for retrying any command, special case for curl downloads | ||
| #------------------------------------------------------------------------- | ||
|
|
@@ -1694,6 +1755,9 @@ function main { | |
|
|
||
| [[ -z "$ACTION" ]] && help | ||
| platform_checks | ||
| if [[ "$ACTION" != "help" ]]; then | ||
| check_required_env_vars | ||
| fi | ||
|
Comment on lines
+1758
to
+1760
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO this function is not required, |
||
| setup_tools | ||
|
|
||
| case "$ACTION" in | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should recommend this to customers.