Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zbchaos): query current cluster topology #459

Merged
merged 7 commits into from
Dec 12, 2023

Conversation

lenaschoenburg
Copy link
Member

Adds a new command zbchaos cluster status that uses GET /actuator/cluster under the hood and pretty-prints the deserialized response.

Part 1 of X for #458

@lenaschoenburg lenaschoenburg force-pushed the os/support-dynamic-scaling branch from 9ee2b11 to d01034d Compare December 11, 2023 16:10
Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I haven't tested it yet. I'll try it out tomorrow (though I assume you already tested it ;))

go-chaos/cmd/cluster.go Show resolved Hide resolved
go-chaos/cmd/cluster.go Show resolved Hide resolved
internal.LogInfo("Change %d not yet started", flags.changeId)
}

time.Sleep(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 I would add some maximum time out or something, this looks to me like it could potentially loop forever? I know we assume the change will eventually complete, but still... 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should definitely fail at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point 👍 Picking a good value is tricky though, I'd expect scaling to take anywhere from 5 minutes to >30 minutes..

Using that in an automated chaos test is problematic anyway because we'll run into job timeouts... 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W00t 30 minutes 😆 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set a timeout of 25 minutes now

Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oleschoenburg 👍🏼 I had some comments see below :)

go-chaos/cmd/cluster.go Outdated Show resolved Hide resolved
go-chaos/cmd/cluster.go Show resolved Hide resolved
internal.LogInfo("Change %d not yet started", flags.changeId)
}

time.Sleep(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should definitely fail at some point.

go-chaos/cmd/cluster.go Outdated Show resolved Hide resolved
}
}

func queryTopology(port int) (*CurrentTopology, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Would be great if you could add a IT for it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a minimal one now. None of the other commands have integration tests so I hope I did it in a way that is acceptable: ffd0a20

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No integration tests for other commands you say?

shock

@lenaschoenburg lenaschoenburg force-pushed the os/support-dynamic-scaling branch from 6bdb486 to ebd49d5 Compare December 12, 2023 12:00
Co-authored-by: Christopher Kujawa (Zell) <zelldon91@googlemail.com>
Copy link
Member

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oleschoenburg 👍🏼

ChangeStatusUnknown ChangeStatus = "UNKNOWN"
)

func describeChangeStatus(topology *CurrentTopology, changeId int64) ChangeStatus {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Feels like it makes sense to have this type of status enum inside the returned topology? 🤔

Copy link
Member Author

@lenaschoenburg lenaschoenburg Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our API doesn't offer an endpoint to query "what's the status of change X". If we would offer that, the endpoint would return a similar enum as we have defined here, yeah.


func CreateZeebeContainer(t *testing.T, ctx context.Context) testcontainers.Container {
req := testcontainers.ContainerRequest{
Image: "camunda/zeebe:SNAPSHOT",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Lets pin it to a version where we know this has it. Or is it not released yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pin it to 8.4.0-alpha2 but that way we won't notice when zbchaos stops working with newer Zeebe versions, for example if we change the API

Comment on lines +54 to +56
require.Nil(t, topology.LastChange)
require.Nil(t, topology.PendingChange)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I guess this doesn't help much, so we will see only if this works if you run an experiment for it.

@lenaschoenburg lenaschoenburg merged commit 5cb7422 into main Dec 12, 2023
2 checks passed
@lenaschoenburg lenaschoenburg deleted the os/support-dynamic-scaling branch December 12, 2023 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants