-
Notifications
You must be signed in to change notification settings - Fork 4.8k
WIP: Core libraries for two node disruptive tests. #30332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jaypoulz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
// VerifyHypervisorConnectivity verifies SSH connectivity to the hypervisor and checks | ||
// that virsh and libvirt are available. | ||
func VerifyHypervisorConnectivity(sshConfig *SSHConfig, knownHostsPath string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rename to VerifyHypervisor or VerifyHypervisorAvailability (as you're checking for more than connectivity and there is already a VerifyConnectivity function)
|
||
// SSH to hypervisor, then to surviving node to run pcs debug-start | ||
// We need to chain the SSH commands: host -> hypervisor -> surviving node | ||
output, stderr, err := PcsCommand(fmt.Sprintf("%s && %s", pcsResourceDebugStop, formatPcsCommandString(pcsResourceDebugStart, pcsResourceDebugStartEnvVars)), sshConfig, localKnownHostsPath, remoteKnownHostsPath, nodeIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the inconsistency here between having a "formatPcsCommandString" for the second command and directly calling fmt.Sprintf for the first one (in the first parameter) makes this a little harder to read than it should. Is it worth also encapsulating the first one in a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code further, why aren't we doing this like the PcsDebugStart below, that uses ExecuteRemoteSSHCommand? That's much easier to parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just removed the utility function and replaced all of the pcs commands with the formatPcsCommand string option for simplicity :)
test/extended/two_node/utils/ssh.go
Outdated
|
||
// ExecuteRemoteSSHCommand executes a command on an OpenShift node via two-hop SSH (local → hypervisor → node). | ||
// Uses 'core' user for the node connection. | ||
func ExecuteRemoteSSHCommand(nodeIP, command string, sshConfig *SSHConfig, localKnownHostsPath, remoteKnownHostsPath string) (string, string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename nodeIP to remoteNodeIP, to make it more explicit. Also sshConfig to hypervisorSSHConfig. This way it's easier to know what info each parameter is providing to the function
323411e
to
ef0ff06
Compare
Job Failure Risk Analysis for sha: ef0ff06
|
ef0ff06
to
74d8539
Compare
Job Failure Risk Analysis for sha: 74d8539
|
74d8539
to
a97ec55
Compare
This commit introduces infrastructure for testing two-node OpenShift cluster disruption scenarios via hypervisor operations, enabling tests that replace or recover control plane nodes through VM management. New utility libraries (test/extended/two_node/utils/): Core utilities: - file.go: Temp file creation, template processing, resource backup - retry.go: Configurable retry logic and polling with timeouts - ssh.go: Direct and two-hop SSH (local→hypervisor→node) - validation.go: Input validation and security checks - etcd.go: Error classification, job management, polling - hypervisor.go: Connectivity verification and config helpers - libvirt.go: VM lifecycle via virsh (define/start/stop/destroy) - pacemaker.go: Cluster operations (node add/remove, status) Framework integration: - Added HypervisorConfig to cluster discovery - New --with-hypervisor-json flag for SSH configuration - Added [Requires:HypervisorSSHConfig] skip annotation - Test helpers: GetHypervisorConfig(), HasHypervisorConfig() Key features: Two-hop SSH support, VM management, etcd/Pacemaker control, security-focused validation, intelligent retry logic.
a97ec55
to
64d4f0e
Compare
@jaypoulz: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.