-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(zbchaos): query current cluster topology #459
Conversation
ab55d1e
to
9ee2b11
Compare
9ee2b11
to
d01034d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I haven't tested it yet. I'll try it out tomorrow (though I assume you already tested it ;))
go-chaos/cmd/cluster.go
Outdated
internal.LogInfo("Change %d not yet started", flags.changeId) | ||
} | ||
|
||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 I would add some maximum time out or something, this looks to me like it could potentially loop forever? I know we assume the change will eventually complete, but still... 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should definitely fail at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point 👍 Picking a good value is tricky though, I'd expect scaling to take anywhere from 5 minutes to >30 minutes..
Using that in an automated chaos test is problematic anyway because we'll run into job timeouts... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W00t 30 minutes 😆 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set a timeout of 25 minutes now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oleschoenburg 👍🏼 I had some comments see below :)
go-chaos/cmd/cluster.go
Outdated
internal.LogInfo("Change %d not yet started", flags.changeId) | ||
} | ||
|
||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should definitely fail at some point.
go-chaos/cmd/cluster.go
Outdated
} | ||
} | ||
|
||
func queryTopology(port int) (*CurrentTopology, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 Would be great if you could add a IT for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a minimal one now. None of the other commands have integration tests so I hope I did it in a way that is acceptable: ffd0a20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6bdb486
to
ebd49d5
Compare
Co-authored-by: Christopher Kujawa (Zell) <zelldon91@googlemail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oleschoenburg 👍🏼
ChangeStatusUnknown ChangeStatus = "UNKNOWN" | ||
) | ||
|
||
func describeChangeStatus(topology *CurrentTopology, changeId int64) ChangeStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Feels like it makes sense to have this type of status enum inside the returned topology? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our API doesn't offer an endpoint to query "what's the status of change X". If we would offer that, the endpoint would return a similar enum as we have defined here, yeah.
|
||
func CreateZeebeContainer(t *testing.T, ctx context.Context) testcontainers.Container { | ||
req := testcontainers.ContainerRequest{ | ||
Image: "camunda/zeebe:SNAPSHOT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔧 Lets pin it to a version where we know this has it. Or is it not released yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can pin it to 8.4.0-alpha2 but that way we won't notice when zbchaos stops working with newer Zeebe versions, for example if we change the API
require.Nil(t, topology.LastChange) | ||
require.Nil(t, topology.PendingChange) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess this doesn't help much, so we will see only if this works if you run an experiment for it.
Adds a new command
zbchaos cluster status
that usesGET /actuator/cluster
under the hood and pretty-prints the deserialized response.Part 1 of X for #458