Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,8 @@ You'll need to place the file in the install directory and name it as **pull-sec

```

**Note**: If you encounter terraform-related errors during the create command, see ["Known Issues & Troubleshooting"](https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md) and ["TroubleShooting Document"](docs/troubleShooting.md)

## Advanced Usage

Before running the script, you may choose to override some environment variables as per your requirement.
Expand Down
183 changes: 183 additions & 0 deletions docs/troubleShooting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@

# OpenShift on IBM PowerVS: Common Issues and Resolutions

This document lists common issues encountered when deploying OpenShift on IBM PowerVS using the `openshift-install-powervs` wrapper, along with their causes and resolutions.

---

## Terraform Stored Resource IDs

**Error:**

Error: cannot find resource with id <resource-id>

**Cause:**
Terraform retains deleted PowerVS resource IDs in its state or backup files. This often occurs after a Terraform rerun when instances or resources have changed in PowerVS.


**Resolution:**

Search for the stale ID in Terraform state or backup files:

```bash
grep -R "<resource-id>" .
```

Remove stale state entries:

```bash

terraform state rm <resource-name>
Comment on lines +29 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
terraform state rm <resource-name>
terraform state rm <resource-name>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should recommend this to customers.

```

Re-run the apply:

```bash
terraform apply
```

To rebuild specific worker or master nodes:

```bash
terraform taint module.nodes.ibm_pi_instance.worker[0]
terraform apply
```

## Bastion Node OS Compatibility

If getting errors regarding missing packages or incorrect storage type while using CentOS 10, switch to CentOS Stream 9 to avoid missing package errors or volume type mismatches.

Common Issues and Fixes

Missing Required Packages (e.g. Ansible)

**Error**:
Missing ansible or dependency packages during setup.

**Resolution**:
SSH into the bastion node using the generated key:
ssh -i id_rsa root@<bastion-external-ip>
sudo dnf install ansible

- note: you can also import using python and pip, if the above does not work.

**Error**
Incorrect Storage Type (e.g. "nfs" not recognized)

Error: "pi_volume_type" must contain a value from ["ssd", "standard", "tier1", "tier3"], got ""
Comment on lines +65 to +67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share where you hit this? it's probably a bug in the tfvars

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line occurred on the pi_volume_type for the bastion storage type please see code bellow for full error:

Error: "pi_volume_type" must contain a value from []string{"ssd", "standard", "tier1", "tier3"}, got ""

│   with module.prepare.ibm_pi_volume.volume[0],

│   on modules/1_prepare/prepare.tf line 87, in resource "ibm_pi_volume" "volume":

│   87:   pi_volume_type       = local.bastion_storage_type



**Resolution**:
Edit your variables.tf or corresponding .tfvars file:
bastion_storage_type = "tier3"

- if needed change the defautlt bastion_storage_type in variables.tf to the storage type you desire
- note you can easly find this by hitting CTRL + W and searching for `bastion_storage_type`


## Re-installation / Network Name Conflict

**Error:**

Error: Network with name "ocp-net" already exists.


**Cause:**
On a subsequent UPI install attempt, Terraform tries to create a network with the same name that already exists.
PowerVS does not allow duplicate network names—even if the old network is inactive.

**Resolution:**

- Log into your PowerVS workspace.

- Delete or rename the existing ocp-net network or subnet.

- Re-run the installer:
```bash
terraform apply ./openshift-install-powervs create
```

## Remote-Exec Provisioning Errors

**Error:**

Terraform remote-exec provisioner failures


Cause:
These are transient SSH or remote-execution issues that occur during provisioning.

Resolution:
Re-run Terraform:

terraform apply


This typically resolves the issue automatically.
See ocp4-upi-powervs known issues for more details. ["OCP Known issues"]((https://github.com/ocp-power-automation/ocp4-upi-powervs/blob/release-4.6/docs/known_issues.md))

5. LPAR in WARNING State

Error:

Error: the operation cannot be performed when the lpar health in the WARNING State


Cause:
Terraform cannot modify instances whose PowerVS LPAR health is in WARNING state.
This often occurs after partial provisioning, failed networking setup, or API timeouts.

Resolution:

Check instance health:
```bash
ibmcloud pi instance get <INSTANCE_ID>
```
**Note**: Due to RSCT daemon not being available for RHCOS, RHCOS instances in dashboard can show "Warning" Status, ignore this!

In console reboot instances by OS shutting down the instance, then restarting

To rebuild only specific nodes:
```bash

terraform taint module.nodes.ibm_pi_instance.master[1]
terraform taint module.nodes.ibm_pi_instance.worker[0]
terraform apply
```

## Missing or Outdated Images (RHEL / RHCOS)

**Error:**

Error: failed to perform Get Image Operation for image rhcos-4.15
[pcloudCloudinstancesImagesGetNotFound] image does not exist. ID: rhcos-4.12

**Cause:**
Terraform and the PowerVS provider reference image names (e.g. rhcos-4.15, rhel-8.3) that may not exist in your workspace.
The wrapper may also use the RHEL version for RHCOS images by mistake.

**Resolution:**

Option 1 — Import Pre-built Images

Use pre-built RHCOS and RHEL OVA images from IBM’s public repository.
See Christy Norman’s blog
for steps. ["Blog"](https://community.ibm.com/community/user/blogs/christy-norman/2024/08/06/import-pre-built-red-hat-coreos-ovas-into-powervs)

Option 2 — Update variables.tf

Set available image names manually:
```bash

variable "rhel_image_name" {
default = "rhel-9.6"
}

variable "rhcos_image_name" {
default = "rhcos-4.19"
}
```
Option 3 — Export Versions Before Running
export RELEASE_VER=4.9

Ensure RHEL and RHCOS versions are aligned and available.
64 changes: 64 additions & 0 deletions openshift-install-powervs
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,67 @@ function output {
$TF output "$output_var"
}

#-------------------------------------------------------------------------
# Check for required environment variables and display helpful information
#-------------------------------------------------------------------------
function check_required_env_vars {
missing_vars=0

log "Checking required environment variables..."

# Check IBMCLOUD_API_KEY
if [[ -z "${IBMCLOUD_API_KEY}" ]]; then
warn "IBMCLOUD_API_KEY is not set"
echo " Description: IBM Cloud API key for authentication"
echo " How to set: export IBMCLOUD_API_KEY='your-api-key-here'"
echo ""
missing_vars=1
fi

# Check RELEASE_VER (optional since we have default)
if [[ -z "${RELEASE_VER}" ]]; then
warn "RELEASE_VER is not set (will use default: 4.15 type 4.15 if you want to use defualt elsee export correct rhcos version)"
echo " Description: OpenShift release version to install"
echo " Default: "
echo " How to set: export RELEASE_VER='4.16'"
echo ""
else
log "Using RHCOS release version: ${RELEASE_VER}, to change run export RELEASE_VER='<version>'"
fi

# Check RHEL_SUBS_PASSWORD (optional)
if [[ -z "${RHEL_SUBS_PASSWORD}" ]]; then
warn "RHEL_SUBS_PASSWORD is not set"
echo " Description: RHEL subscription password for bastion nodes"
echo " Note: You can provide this during the 'variables' prompt or set it now"
echo " How to set: export RHEL_SUBS_PASSWORD='your-password-here'"
echo ""
fi

# Check NO_OF_RETRY (optional)
if [[ -z "${NO_OF_RETRY}" ]]; then
log "NO_OF_RETRY not set (using default: 5)"
else
log "Using retry count: ${NO_OF_RETRY}"
fi

# Check ARTIFACTS_VERSION (optional)
if [[ -z "${ARTIFACTS_VERSION}" ]]; then
log "ARTIFACTS_VERSION not set (using default: main)"
else
log "Using artifacts version: ${ARTIFACTS_VERSION}"
fi

echo ""

if [[ $missing_vars -eq 1 ]]; then
error "Required environment variables are missing. Please set them and try again."
fi

success "Environment variable check completed"
}
Comment on lines +162 to +220
Copy link
Collaborator

@Prajyot-Parab Prajyot-Parab Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.



#-------------------------------------------------------------------------
# Util for retrying any command, special case for curl downloads
#-------------------------------------------------------------------------
Expand Down Expand Up @@ -1694,6 +1755,9 @@ function main {

[[ -z "$ACTION" ]] && help
platform_checks
if [[ "$ACTION" != "help" ]]; then
check_required_env_vars
fi
Comment on lines +1758 to +1760
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this function is not required, precheck_input function already exists. (also almost all ENV vars have default values + appropriate guiding doc instructions are available)

setup_tools

case "$ACTION" in
Expand Down