Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete next cell irrespective of last deletion #901

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

auniyal61
Copy link
Contributor

@auniyal61 auniyal61 commented Nov 27, 2024

Right now when multiple cells gets deleted if any one the cell deletion fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Closes #OSPRH-10550

@auniyal61
Copy link
Contributor Author

The jira report says, we need to do this for cellCreation as well, but I think its already taken care of at here https://github.com/openstack-k8s-operators/nova-operator/blob/main/controllers/nova_controller.go#L328-L356

Copy link
Contributor

@mrkisaolamb mrkisaolamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some func tests for cell deletion to assert if deletion is working and returning errors for all cells. Also it will be nice to assert what are conditions of nova instance

Copy link
Contributor

openshift-ci bot commented Dec 17, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: auniyal61
Once this PR has been reviewed and has the lgtm label, please ask for approval from mrkisaolamb. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/55d93ec55a534c64a74ed0e2e7f7761a

✔️ openstack-meta-content-provider SUCCESS in 47m 42s
nova-operator-kuttl RETRY_LIMIT in 17m 30s
nova-operator-tempest-multinode FAILURE in 22m 11s
nova-operator-tempest-multinode-ceph FAILURE in 23m 05s

@auniyal61 auniyal61 force-pushed the cell-deletion branch 4 times, most recently from ea9aefb to fe73ce1 Compare December 19, 2024 09:56
Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deletion logic looks good with a small request about ordering. I have couple of comments on the testing side.

controllers/nova_controller.go Outdated Show resolved Hide resolved
test/functional/nova_multicell_test.go Outdated Show resolved Hide resolved
test/functional/nova_multicell_test.go Outdated Show resolved Hide resolved
Comment on lines 206 to 207
delete(nova.Spec.CellTemplates, "cell2")
delete(nova.Spec.CellTemplates, "cell3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know that the deletion of cell2 is always before cell3 in novaCellList.Items at L605 in the controller? If cell3 can be before cell2 then the test below does not prove that the error in cell2 was just collected but not stopped the loop there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the way to make it testable is to sort the cells be cell name when iterated in the controller. We do that for cell creation anyhow in #903 to avoid the unstable ordering to cause unstable condition messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes novaCellList is already sorted and cell2 always comes before cell3 in reconciler.

g.Expect(nova.Status.RegisteredCells).NotTo(HaveKey(cell3.CellCRName.Name))
}, timeout, interval).Should(Succeed())

// NovaCellNotExists(cell3.CellCRName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yepp you can check if k8s returns not found for cell3 to prove that it is deleted. You can also check that the RegisteredCells list does not contain cell3 any more but still contains cell2.

test/functional/nova_multicell_test.go Outdated Show resolved Hide resolved
@@ -600,18 +600,26 @@ func (r *NovaReconciler) Reconcile(ctx context.Context, req ctrl.Request) (resul
return ctrl.Result{}, err
}

var deleteErrs []error

for _, cr := range novaCellList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would sort this list by cell name for two reasons:

  • Sort cell-names in nova_controller #903 does it for cell creation already to make the error messages stable even if the order of the list of cells in the api is not stable.
  • this makes testing possible as we can rely on the ordering to prove that even if cell2 failed to delete cell3 deletion is tried.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

novaCellList is already sorted. so deleteErrs will be sorted automatically.

@gibizer
Copy link
Contributor

gibizer commented Dec 20, 2024

Running the test locally shows that cell2 is also successfully deleted so the current test does not recreate the necessary scenario to prove that failure in one cell does not prevent the deletion of the other.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5839fe00e7f1456ba553d6adfcd1e5ee

✔️ openstack-meta-content-provider SUCCESS in 2h 43m 17s
nova-operator-kuttl RETRY_LIMIT in 29m 32s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 06m 12s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 26m 19s

@auniyal61 auniyal61 force-pushed the cell-deletion branch 2 times, most recently from ddcb2bf to d046279 Compare January 23, 2025 05:54
Right now when multiple cells gets deleted if any one the cell deletion
fails, the control exits with error msg.
This change stores the errors in a list and let next cells deleted.

Closes #OSPRH-10550
@auniyal61 auniyal61 force-pushed the cell-deletion branch 3 times, most recently from 4a961b5 to 20729e2 Compare January 23, 2025 10:07
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/6472550c14a1482fae1096456327090c

✔️ openstack-meta-content-provider SUCCESS in 3h 02m 18s
nova-operator-kuttl FAILURE in 36m 33s
✔️ nova-operator-tempest-multinode SUCCESS in 2h 02m 18s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 30m 11s

@gibizer
Copy link
Contributor

gibizer commented Jan 23, 2025

@auniyal61 so you pinged me on slack about the DeferCleanup issue.

so I am deleting a cell, but cell-resource (like secret ) is defined in setup (Before each or another funciton).
now once tests is finished deferecleanup runs and look for secret, but as we already deleted cell, its not present in DB, so defercleanup complains,
how to resolve this

The DeferCleanup doc says:
https://onsi.github.io/ginkgo/#cleaning-up-our-cleanup-code-defercleanup

DeferCleanup can be passed a function that takes no arguments and returns no value. You can also pass a function that returns values. DeferCleanup ignores all these return value except for the last. If the last return value is a non-nil error - a common go pattern - DeferCleanup will fail the spec.

So I guess the situation is that

  1. we create cell3 during the BeforeEach() section and put a DeferCleanup() call to the all the pieces we created
  2. then delete the cell during the normal It() section
  3. then the test runner executes the deferred cleanups where the client.Delete call returns a non nil error as the last return value because the resource is not found and that causes the test cleanup to fail as defined by the doc.

This means we have multiple options:

  • a) we wrap the client.Delete call and ignore the NotFound error. So cleanup still tries to delete the resources but if they do not exists then the cleanup still succeeds.
  • b) we don't create the cell3 in the shared CreateNovaWith3CellsAndEnsureReady() helper. But create it just for our test case (in our BeforeEach or in the It ) and we don't put DeferCleanup for the pieces of cell3 as we expect that the our test will delete them anyhow.

I would do option b) as it is more explicit about why we do thing differently.

(I need to get back to your other comments above a bit later)

@auniyal61
Copy link
Contributor Author

So I guess the situation is that

  1. we create cell3 during the BeforeEach() section and put a DeferCleanup() call to the all the pieces we created
  2. then delete the cell during the normal It() section
  3. then the test runner executes the deferred cleanups where the client.Delete call returns a non nil error as the last return value because the resource is not found and that causes the test cleanup to fail as defined by the doc.

Yes

This means we have multiple options:

  • a) we wrap the client.Delete call and ignore the NotFound error. So cleanup still tries to delete the resources but if they do not exists then the cleanup still succeeds.
  • b) we don't create the cell3 in the shared CreateNovaWith3CellsAndEnsureReady() helper. But create it just for our test case (in our BeforeEach or in the It ) and we don't put DeferCleanup for the pieces of cell3 as we expect that the our test will delete them anyhow.

I would do option b) as it is more explicit about why we do thing differently.

we have to create new-cell in CreateNovaWith3CellsAndEnsureReady, because that's where CR-spec is defined and saved, we can't pass and update spec later in our Describe("Nova multi cell deletion")

the oher way would be to redefine all cells again, which is just a duplicate code of CreateNovaWith3CellsAndEnsureReady.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b81fac0e1fef40fe95499e5f784d313c

✔️ openstack-meta-content-provider SUCCESS in 3h 22m 00s
nova-operator-kuttl FAILURE in 33m 33s
nova-operator-tempest-multinode FAILURE in 2h 40m 04s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 32m 21s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4b5237bf070745c18ac143e6d94bb68e

✔️ openstack-meta-content-provider SUCCESS in 2h 57m 35s
nova-operator-kuttl FAILURE in 35m 03s
nova-operator-tempest-multinode FAILURE in 2h 24m 44s
✔️ nova-operator-tempest-multinode-ceph SUCCESS in 2h 34m 21s

@auniyal61
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants