Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(support-bundle): should not timeout #1456

Conversation

c3y1huang
Copy link
Collaborator

@c3y1huang c3y1huang self-assigned this Jul 11, 2023
@c3y1huang c3y1huang force-pushed the test-support-bundle-download-timeout branch from 5e0c967 to 261462e Compare July 11, 2023 04:53
@c3y1huang c3y1huang force-pushed the test-support-bundle-download-timeout branch from 261462e to fc4c774 Compare July 11, 2023 05:26
@c3y1huang c3y1huang marked this pull request as ready for review July 11, 2023 05:29
@c3y1huang c3y1huang requested a review from a team as a code owner July 11, 2023 05:29
@innobead
Copy link
Member

@yangchiu @khushboo-rancher review this.

cc @longhorn/qa

ref: 6256

Signed-off-by: Chin-Ya Huang <chin-ya.huang@suse.com>
@c3y1huang c3y1huang force-pushed the test-support-bundle-download-timeout branch from fc4c774 to a680283 Compare July 11, 2023 05:52
Copy link
Contributor

@chriscchien chriscchien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangchiu
Copy link
Member

I tried to run the test case, but it failed with the following error:

        download_support_bundle(node_id, support_bundle_name, client)
>       wait_for_support_bundle_cleanup(client)

test_support_bundle.py:402: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

client = <longhorn.Client object at 0x7f5378b703d0>

    def wait_for_support_bundle_cleanup(client):  # NOQA
        ok = False
        for _ in range(RETRY_COUNTS):
            support_bundles = client.list_support_bundle()
            if len(support_bundles) == 0:
                ok = True
                break
    
            time.sleep(RETRY_INTERVAL)
>       assert ok
E       AssertionError

common.py:5547: AssertionError

https://ci.longhorn.io/job/public/job/v1.1.x/job/v1.1.x-longhorn-upgrade-tests-sles-arm64/169/

Do we need other modification to make it pass?

@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Jul 11, 2023

I tried to run the test case, but it failed with the following error:

        download_support_bundle(node_id, support_bundle_name, client)
>       wait_for_support_bundle_cleanup(client)

test_support_bundle.py:402: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

client = <longhorn.Client object at 0x7f5378b703d0>

    def wait_for_support_bundle_cleanup(client):  # NOQA
        ok = False
        for _ in range(RETRY_COUNTS):
            support_bundles = client.list_support_bundle()
            if len(support_bundles) == 0:
                ok = True
                break
    
            time.sleep(RETRY_INTERVAL)
>       assert ok
E       AssertionError

common.py:5547: AssertionError

https://ci.longhorn.io/job/public/job/v1.1.x/job/v1.1.x-longhorn-upgrade-tests-sles-arm64/169/

Do we need other modification to make it pass?

Failing at wait_for_support_bundle_cleanup is expected at the current code bases. This happens because the support bundle download timed out, the lifecycle is not completed and the CR doesn't get cleaned up. This is a reproduction of the issue described in longhorn/longhorn#6256.

To have a PASS run requires the fixes:

Ref: test results

Copy link
Member

@yangchiu yangchiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@khushboo-rancher khushboo-rancher merged commit 9a437d5 into longhorn:master Jul 12, 2023
@khushboo-rancher
Copy link
Collaborator

@mergify backport v1.5.x v1.4.x

@mergify
Copy link

mergify bot commented Jul 12, 2023

backport v1.5.x v1.4.x

✅ Backports have been created

khushboo-rancher added a commit that referenced this pull request Jul 12, 2023
test(support-bundle): should not timeout (backport #1456)
khushboo-rancher added a commit that referenced this pull request Jul 12, 2023
test(support-bundle): should not timeout (backport #1456)
@c3y1huang c3y1huang deleted the test-support-bundle-download-timeout branch July 12, 2023 01:29
@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Jul 13, 2023

Weird, so it's only happening in arm64. I am not sure if it could be related to running out of disk space. Checking...

Not related to the disk space. The dummy file was created successfully.

@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Jul 13, 2023

Weird, so it's only happening in arm64. I am not sure if it could be related to running out of disk space. Checking...

Observed in local cluster:

  1. The dummy file was created successfully. Download started.
  2. pytest got killed.
    test_support_bundle.py::test_support_bundle_should_not_timeout ^[[A./run.sh: line 3:    42 Killed                  pytest -v "$@"
    
  3. longhorn-test pod did not crash.

@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Jul 13, 2023

Possibly related to the download timeout. Checking...

Not related. Pytest was killed before being timed out.

@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Jul 13, 2023

The memory usage just before the pytest got killed:

MiB Mem : 7644.219 total,   93.559 free, 7395.656 used,  346.988 buff/cache

Comparing to the instance type we've used in amd64 vs arm64
amd64: t2.xlarge memory(16 GiB)
arm64: a1.xlarge memory(8 GiB)

The issue appears to be resolved when tested using an a1.2xlarge instance with 16 GiB of memory.

@yangchiu can you help to adjust the instance type for arm64 pipelines?

@yangchiu
Copy link
Member

@yangchiu can you help to adjust the instance type for arm64 pipelines?

Created PR longhorn/infra#130 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants