-
Couldn't load subscription status.
- Fork 3
ARO validation automation #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
231931e to
e10daaf
Compare
9b42e7e to
afd5d7e
Compare
f5554be to
d0c05bf
Compare
d0c05bf to
17abde5
Compare
| - name: publicDNS | ||
| value: "false" | ||
| - name: jiraSecretName | ||
| value: "jira-secret-mj" | ||
| - name: jiraIssueKey | ||
| value: "SAPOCP-1587" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I do not quite understand, do we plan to update this for each new jira ticket? maybe create a copy, so for each jira ticket we will have separate workflow, so we can run it later and independently?
We for sure can and should delete workflows for some outdated or not supported versions / configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the great question! Let me clarify our workflow strategy for JIRA ticket management.
Our Current Approach:
We create a new JIRA ticket for each validation task, with approximately 3-4 validation tasks per month on average. Since our clusters (especially ARO and ROSA) are quite dynamic, we typically create new clusters for each validation cycle - particularly for cloud-based ones where we focus on testing major releases.
Workflow Management Strategy:
- New tickets = New workflows: Each JIRA ticket gets its own workflow file (like the current
aro-endpoint-test-run.yamlfor SAPOCP-1590) - No parallel execution: We don't run old workflows alongside new ones - each validation task is independent
- Proactive cleanup: We clean up workflow files once the cluster is deleted or the validation task is completed
Why This Works for Us:
This approach aligns well with our validation cycle where we're constantly testing new configurations and major releases. The workflow files serve as a snapshot of what was tested for each specific ticket, and we maintain a clean repository by removing outdated workflows.
Your suggestion about creating separate workflow files for each ticket is exactly what we're doing! The current file structure will evolve as we create new validation tasks, and we'll maintain a clean slate by removing completed workflows.
| * Kubeconfig and service access configured ✅ | ||
| * All connectivity tests passed ✅ | ||
| Ready for manual teardown approval. The pipeline will proceed with infrastructure cleanup once approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why manual tear down? Are we expected to do some manual steps before tear down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manual approval before teardown serves two purposes in our validation workflow.
-
EIC Uninstallation Testing: Similar to the installation process, we need to test the uninstallation of EIC through the web interface. Since EIC doesn't support API-based uninstallation yet, this requires manual steps that need to be performed before the cluster is torn down.
-
Demo/Reference Scenarios: Sometimes we may want to keep the cluster running for demo purposes or as a reference environment. The manual approval step gives us the flexibility to decide whether to proceed with teardown immediately or keep the infrastructure running for a longer period.
Once the manual steps are completed and approved, the pipeline proceeds with the automated cluster teardown.
84c05b7 to
96572aa
Compare
53b3cd5 to
e67583d
Compare
Configure explicit one-week timeout to override default 1-hour limit
and prevent pipeline failures during long-running operations.
This gives sufficient time for ARO deployment, manual approvals,
endpoint testing, and teardown operations.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Fix PostgreSQL and Redis deletion commands in ARO teardown task
Remove unsupported --no-wait flag from az postgres flexible-server delete
and az redis delete commands to prevent teardown failures.
The --no-wait flag is not supported by these specific Azure CLI commands
and was causing 'unrecognized arguments: --no-wait' errors during cleanup.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
enhance: add explicit PostgreSQL and Redis cleanup to teardown
- Add explicit PostgreSQL flexible server deletion
- Add explicit Redis cache deletion with proper name matching
- Keep existing generic resource cleanup as fallback
- Ensure all Azure services are properly cleaned up
This makes the teardown more robust and explicit about
cleaning up PostgreSQL and Redis services, while maintaining
the existing cleanup logic for other ARO-related resources.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: handle null tags in Azure resource cleanup query
- Add null checks for tags and tags.cluster before using contains()
- Fixes 'Invalid jmespath query' error in teardown task
- Query now safely handles resources without tags or cluster tags
The issue was that some Azure resources don't have tags or
have null tags.cluster values, causing the contains() function
to fail. Now we check for existence before using contains().
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: use direct az command with --file parameter for kubeconfig generation
- Replace 'make aro-kubeconfig > kubeconfig' with direct az command
- Use 'az aro get-admin-kubeconfig --file kubeconfig' to avoid file conflicts
- Fixes 'File kubeconfig already exists' error
The issue was that az aro get-admin-kubeconfig creates a kubeconfig
file by default, and redirecting output to the same filename caused
a conflict. Using --file parameter directly avoids this issue.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: correct kubeconfig generation command syntax
- Replace 'make aro-kubeconfig --file kubeconfig' with 'make aro-kubeconfig > kubeconfig'
- Fixes 'No rule to make target kubeconfig' error
- Use proper output redirection instead of invalid --file parameter
The aro-kubeconfig makefile target doesn't accept --file parameter,
it just runs the az aro get-admin-kubeconfig command and outputs
to stdout, which we redirect to the kubeconfig file.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: filter out az aro list-credentials command line from JSON output
- Add 'grep -v "az aro list-credentials"' to filter out the command line
- Fixes 'Invalid numeric literal at line 2, column 3' error
- Now only the actual JSON object will be passed to jq
The issue was that make aro-credentials was outputting both the
command line and the JSON result, causing jq to try to parse
the command line as JSON.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
debug: add detailed logging to see raw credentials output
- Add debug output to see what make aro-credentials actually returns
- Show both raw output and filtered JSON before jq parsing
- Help identify what's causing 'Invalid numeric literal' error
- Will help determine the exact content being passed to jq
This will show us the actual output structure and help
identify why the JSON parsing is still failing.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: use grep to filter out info messages from aro-credentials output
- Replace 'tail -n +2' with 'grep -v' to filter out specific info messages
- Filter out 'Variable is not defined' and 'Not all required variables are defined'
- More robust approach to handle variable output from required-environment-variables
- Fixes 'Invalid numeric literal' jq parsing error
This approach is more reliable than line-based filtering since
the number of info messages can vary.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: handle JSON output from aro-credentials properly
- Use 'tail -n +2' to skip the first line (info message) but keep all JSON lines
- Store full JSON in CREDENTIALS_JSON variable before parsing with jq
- Fixes 'parse error: Unmatched }' when trying to parse incomplete JSON
The issue was that 'tail -1' only kept the last line of the JSON,
breaking the JSON structure. Now we skip the first line but keep
the complete JSON object for proper jq parsing.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: apply tail -1 fix to all make commands with required-environment-variables
- Fix aro-deploy-task.yaml: make aro-cluster-exists, make aro-cluster-status,
make postgres-exists, make redis-exists
- Fix aro-teardown-task.yaml: make aro-cluster-exists (2 instances)
- Fix aro-validate-task.yaml: make aro-cluster-url, make aro-credentials,
make postgres-exists, make redis-exists
This resolves the issue where required-environment-variables function
was printing info messages that got captured in command substitution,
causing string comparisons to fail and infinite loops to occur.
All make commands that use required-environment-variables now use
'tail -1' to extract only the actual output, not the info messages.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: handle extra output from required-environment-variables in cluster status
- Use 'tail -1' to get only the last line from make aro-cluster-status
- Fixes issue where required-environment-variables function was printing
info messages that got captured in CLUSTER_STATUS variable
- Removes hex dump debugging since xxd command is not available
- Resolves infinite loop where 'Succeeded' status was not being detected
The issue was that CLUSTER_STATUS contained:
'az aro show --name "sapeic" --resource-group "manjun" --query "provisioningState" -o tsv\nSucceeded'
instead of just 'Succeeded'
Signed-off-by: mjiao <manjun.jiao@gmail.com>
debug: add detailed status debugging to validate task
- Add quotes around status output to see exact string
- Add status length and hex dump for debugging
- Help identify why 'Succeeded' status comparison is failing
This will help debug the infinite loop issue where
cluster shows 'Succeeded' but script continues waiting.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: use runtime evaluation for azure-set-subscription target
- Change {
"environmentName": "AzureCloud",
"homeTenantId": "64dc69e4-d083-49fc-9569-ebece1dd1408",
"id": "6a73742d-8c0a-4b2d-9c60-67c592a0df50",
"isDefault": true,
"managedByTenants": [
{
"tenantId": "935e2c6d-0fea-4cff-9e01-487490ca06c7"
}
],
"name": "EcoEng SAP",
"state": "Enabled",
"tenantId": "64dc69e4-d083-49fc-9569-ebece1dd1408",
"user": {
"name": "4b9ca7ac-1460-41ec-842e-776962b2c9ac",
"type": "servicePrincipal"
}
} to 87416(az account show) in azure-set-subscription
- Fixes issue where subscription ID was empty due to makefile parse-time evaluation
- Ensures subscription is set after Azure login, not before
This resolves the 'subscription of '' doesn't exist' error
in the validate-and-get-access step.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
refactor: remove duplicate checks and use makefile targets consistently
- Remove duplicate Azure CLI checks and use makefile targets as primary solution
- Remove redundant service checks in waiting loop
- Simplify cluster status checking logic
- Use make aro-cluster-exists, make aro-cluster-status, make postgres-exists, make redis-exists
- Clean up debugging output while maintaining essential logging
This eliminates redundant API calls and maintains the clean
makefile-based approach we established earlier.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: correct cluster status checking logic to prevent infinite loops
- Fix logic bug where Succeeded status was entering waiting loop
- Add explicit break statements when cluster is ready
- Only enter waiting loop for Creating/Updating states
- Prevent infinite waiting when cluster is already Succeeded
This fixes the issue where the pipeline was stuck waiting
for a cluster that was already in Succeeded state.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: add comprehensive debugging and safety checks for cluster existence
- Add direct Azure CLI cluster existence check for debugging
- Add makefile target result comparison
- Add final safety check before deployment to prevent conflicts
- Log cluster name and resource group for debugging
- Compare direct Azure CLI vs makefile target results
This will help identify why cluster existence detection is failing
and prevent PropertyChangeNotAllowed errors by double-checking
before attempting deployment.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: improve cluster existence detection and prevent duplicate deployments
- Add debug logging for cluster existence check results
- Restructure deployment logic to be more explicit about cluster existence
- Add deployment decision logging to help troubleshoot issues
- Prevent full ARO deployment when cluster already exists
- Fixes PropertyChangeNotAllowed error when trying to modify existing cluster
This ensures that if a cluster is already running, we only deploy
missing services instead of trying to recreate the entire cluster.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: disable private link service network policies on subnets
- Add privateLinkServiceNetworkPolicies: 'Disabled' to both master and worker subnets
- Fixes Azure deployment error: PrivateLinkServiceNetworkPoliciesCannotBeEnabledOnPrivateLinkServiceSubnet
- Prevents conflicts when Azure automatically configures subnets as private link service subnets
This resolves the network deployment failure in ARO cluster creation.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
fix: add missing export keywords and remove redundant makefile references
- Add export keywords to environment variables in aro-teardown-task.yaml
- Remove redundant -f bicep.makefile references in aro-deploy-task.yaml
- Ensure consistent environment variable export pattern across all tasks
- Fixes potential issues where environment variables might not be available to makefile targets
This ensures all Tekton tasks properly export environment variables
and use the correct makefile syntax for consistency.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
standardize environment variables
- Standardize secret key names to use UPPER_CASE environment
variables
- CLIENT_ID, CLIENT_SECRET, TENANT_ID, ARO_RESOURCE_GROUP,
ARO_DOMAIN, PULL_SECRET
- Remove manual export conversions from Tekton tasks
(aro-deploy, aro-validate, aro-teardown)
- Add 11 new Makefile targets for Azure operations
- aro-get-kubeconfig: Get ARO kubeconfig with insecure TLS
settings
- redis-get-info: Get Redis cache connection information
- postgres-delete, redis-delete: Individual service cleanup
- aro-resources-cleanup: Clean up ARO-related resources
- aro-cleanup-all-services: Comprehensive service cleanup
- aro-resource-group-create/exists: Resource group management
- aro-services-deploy-only/with-retry: Granular service
deployment
- aro-final-safety-check: Pre-deployment validation
- Update Tekton tasks to use centralized Makefile targets
- Replace ~70 lines of inline Azure CLI commands with make
calls
- Improve maintainability and enable local development
workflows
- Fix Tekton timeout configurations
- Update PipelineRun to use new timeouts syntax (pipeline:
168h, tasks: 120m)
- Add explicit step timeouts (aro-deploy: 120m, aro-validate:
30m, aro-teardown: 60m)
- Update documentation (README.md, CLAUDE.md) with new secret
format and Makefile targets
This enables developers to run the same Azure operations
locally that are used in CI/CD pipelines,
following infrastructure-as-code best practices with
centralized command definitions.
Signed-off-by: mjiao <manjun.jiao@gmail.com>
test further
Signed-off-by: mjiao <manjun.jiao@gmail.com>
test further the auto-approval
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Clean up duplicate and redundant Makefile targets while enhancing Bicep templates with cost-optimized testing configurations. Changes: - Remove duplicate targets from bicep.makefile: - aro-deploy → consolidated into aro-deploy-test - azure-services-deploy & aro-services-deploy-only → aro-services-deploy-test - aro-url → use existing aro-cluster-url - Move bicep-related targets from main Makefile to bicep.makefile - Enhance Bicep templates with testing-focused improvements: - Add cost-optimized parameter validation and defaults - Create test-specific parameter files - Add comprehensive testing outputs and metadata - Include testing tags for resource management - Update documentation and references to use simplified targets - Add consistent deployment names for better tracking This aligns the tooling with the testing-only use case while eliminating redundancy and improving cost optimization. Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
217effe to
9eb9c98
Compare
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
f66e8d6 to
a6aaffd
Compare
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>
Signed-off-by: mjiao <manjun.jiao@gmail.com>


No description provided.