-
Notifications
You must be signed in to change notification settings - Fork 263
Enable scale tests with 20 pods #4179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR increases the scale test capacity from 15 to 20 pods while improving test reliability through better timeout handling, retry logic, and error reporting. The changes also include infrastructure updates to support the increased pod density and fixes to RBAC propagation issues in storage account setup.
Key Changes
- Scale test now creates 20 pods (10 per cluster) instead of 15 (8+7 split)
- Reduced deletion timeouts from 30-60 attempts to 10 attempts with consistent 20s command timeouts
- Added RBAC propagation verification for storage accounts with retry logic
- Changed default deployment region from centraluseuap to eastus2
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/swiftv2/longRunningCluster/datapath_scale_test.go | Increases pod count to 20 (10 per cluster), updates timeouts to 50 minutes, and improves error handling with deferred verification |
| test/integration/swiftv2/longRunningCluster/datapath.go | Adds Reservations and Namespace fields to TestResources struct with default values |
| test/integration/swiftv2/helpers/az_helpers.go | Reduces deletion retry attempts from 30-60 to 10, adds consistent command timeouts, and new deletion error types |
| test/integration/manifests/swiftv2/long-running-cluster/pod-with-device-plugin.yaml | New manifest file defining pod template with device plugin resource requests for vnet-nic |
| hack/aks/Makefile | Adds PODS_PER_NODE variable to aks-nic-secondary-count tag for node pool configuration |
| .pipelines/swiftv2-long-running/template/long-running-pipeline-template.yaml | Adds ScaleTest job, updates storage account RBAC assignment to single account, and adds ScaleTest to DeleteTestResources dependencies |
| .pipelines/swiftv2-long-running/scripts/manage_storage_rbac.sh | Adds RBAC propagation verification with SAS token generation retry logic |
| .pipelines/swiftv2-long-running/scripts/create_storage.sh | Implements retry logic with exponential backoff for blob uploads |
| .pipelines/swiftv2-long-running/scripts/create_aks.sh | Defines PODS_PER_NODE=7 constant and passes it to Makefile |
| .pipelines/swiftv2-long-running/pipeline.yaml | Changes default region from centraluseuap to eastus2 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/integration/swiftv2/longRunningCluster/datapath_scale_test.go
Outdated
Show resolved
Hide resolved
.pipelines/swiftv2-long-running/template/long-running-pipeline-template.yaml
Show resolved
Hide resolved
d208b89 to
5821b8a
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: sivakami-projects <126191544+sivakami-projects@users.noreply.github.com>
5821b8a to
eadef4b
Compare
|
/azp run Azure Container Networking PR |
|
Azure Pipelines successfully started running 1 pipeline(s). |
paulyufan2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Reason for Change:
This PR increases the scale test capacity from 15 to 20 pods while improving test reliability through better timeout handling, retry logic, and error reporting. The changes also include infrastructure updates to support the increased pod density and fixes to RBAC propagation issues in storage account setup.
Key Changes
Scale test now creates 20 pods (10 per cluster) instead of 15 (8+7 split)
Reduced deletion timeouts from 30-60 attempts to 10 attempts with consistent 20s command timeouts
Added RBAC propagation verification for storage accounts with retry logic
Changed scheduled test region from centraluseuap to eastus2
Issue Fixed:
Requirements:
Notes: