Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): add drain cases from manual test cases #2116

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chriscchien
Copy link
Contributor

@chriscchien chriscchien commented Sep 24, 2024

Which issue(s) this PR fixes:

Issue #9292

What this PR does / why we need it:

Implement test case in https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/node-drain-deletion/ to automation test cases

Special notes for your reviewer:

Verified on local with v1.7.1
image

Additional documentation or context

N/A

@chriscchien chriscchien requested a review from a team as a code owner September 24, 2024 12:35
@chriscchien chriscchien self-assigned this Sep 24, 2024
@chriscchien chriscchien force-pushed the node_drain_and_delete branch 4 times, most recently from bfed9a8 to e0353ac Compare September 25, 2024 02:30
And Wait for volume 0 detached
And Power off node 1

When Force drain node 2 and wait for 90 second dataEngine=${DATA_ENGINE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test steps of a robot test case should be self-explanatory, but the following steps are unclear in terms of what they are checking:

When Force drain node 2 and wait for 90 second dataEngine=${DATA_ENGINE}
And The drain process not completed
And Check instance-manager pod is running on node 2    dataEngine=${DATA_ENGINE}

Could these steps be made more readable to clarify what exactly is being validated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, removed dataEngine=${DATA_ENGINE} in When Force drain node 2 and wait for 90 second dataEngine=${DATA_ENGINE}and use data_engine = BuiltIn().get_variable_value("${DATA_ENGINE}") to get data engine setting

def get_instance_manager_on_node(node_name):
    data_engine = BuiltIn().get_variable_value("${DATA_ENGINE}")
    pods = get_all_pods_on_node(node_name)
    for pod in pods:
        labels = pod.metadata.labels
        if labels.get("longhorn.io/data-engine") == data_engine and \
           labels.get("longhorn.io/component") == "instance-manager":
            return pod.metadata.name
    return None

Test result

@@ -285,3 +285,25 @@ def get_name_suffix(*args):
if arg:
suffix += f"-{arg}"
return suffix


def check_popen_process_not_completed(process):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to introduce popen, process.poll(), process.terminate() and process.communicate()?

Can subporcess.check_output() with the timeout parameter accomplish the same goal?

ref: https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module

Copy link
Contributor Author

@chriscchien chriscchien Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in test case Setting Allow Node Drain with the Last Healthy Replica protects the last healthy replica with Pod Disruption Budget (PDB), fter change node-drain-policy from block-if-contains-last-replica to always-allow, it
need to validate if the drain process completed or not, currently I can not find a way to let subporcess.check_output() execute in background. thank you.

Copy link
Member

@yangchiu yangchiu Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the following approach feasible?

def force_drain_node():
    subprocess.check_output(timeout=90)
Force drain node ${node_id} and expect failure
    Run Keyword And Expect Failure    force drain node
Force drain node ${node_id} and expect success
    force drain node
And Set setting node-drain-policy to block-if-contains-last-replica
...
...
When Force drain node 2 and expect failure
And Check instance-manager pod is running on node 2

When Set setting node-drain-policy to always-allow
And Force drain node 2 and expect success
And Check PDB not exist    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yangchiu, updated and test result

${node_name} = get_node_by_index ${node_id}
check_node_cordoned ${node_name}

Force drain node ${node_id} and wait for ${duration} second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use subprocess.check_output() with the timeout parameter to eliminate the need for the explicit wait for ${duration} second?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some scenario, the drain process need to running in background, using timeout seems force terminated the process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't wait for ${duration} second also an explicit timeout?

@chriscchien chriscchien force-pushed the node_drain_and_delete branch 2 times, most recently from d427030 to 6a59971 Compare September 25, 2024 11:45
longhorn/longhorn-9292

Signed-off-by: Chris <chris.chien@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants