-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus add nodes gauge for SQS mode #1083
base: main
Are you sure you want to change the base?
Conversation
6e910d0
to
91e34ac
Compare
@stevehipwell can you please give me a help on this. Now i only stuck at writing unit test for Do you think i should refactor this file along with this PR or should I open another separate PR (eg: make opentelemetry.go testable) |
@phuhung273 I'm not a maintainer here but I like to see untestable code refactored to be testable, following the boy scout rule. If you do the work in this PR you can always split the refactoring to a separate PR before it's merged. @LikithaVemulapalli what do you think? |
Yes I agree with @stevehipwell here, let's separate refactoring PR to have a clear idea on the changes made. @phuhung273 if you want to test your changes for this PR let me know I will approve and run so for the future commits you can verify if the existing tests are working, if there are any conflicts or not, once you change the PR status to ready I will run the workflow, appreciate for your contribution. Thanks! |
This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this PR to never become stale, please ask a maintainer to apply the "stalebot-ignore" label. |
/remove-lifecycle stale |
This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want this PR to never become stale, please ask a maintainer to apply the "stalebot-ignore" label. |
Hi @phuhung273, could you resolve the conflicts in |
b9e1617
to
95fc5a9
Compare
95fc5a9
to
f4ae6de
Compare
Thanks and Happy new year @tiationg-kho. Conflicts resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @phuhung273,
Thanks for resolving the conflicts. I have left some comments.
Also, we can run the e2e test in local (docker desktop) to make sure our modification is valid.
- Run all e2e test cases:
make e2e-test
- Run certain e2e test case:
./test/k8s-local-cluster-test/run-test -a ./test/e2e/<test-case> -d -b e2e-test
|
||
for { | ||
result, err := h.ec2ServiceClient.DescribeInstances(&ec2.DescribeInstancesInput{ | ||
Filters: []*ec2.Filter{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a filter for instance state here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course we can, but I wonder if we should ? I saw cases where instances enter Stopped
state instead of Terminated
. Without filterring user can discover such case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the comments.
Maybe we could rename nodesGauge
and instancesGauge
to let these 2 metrics more self-explain?
Then we could decide we need a filter here or not. And would also know should separate these 2 metrics, or filter one based on another (opentelemetry.go
).
} | ||
} | ||
|
||
func (m Metrics) serveNodeMetrics() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add nil check for both results. (instanceIdsMap
, nodeInstanceIds
)
We should not use instanceIdsMap
in second block if we got error from GetInstanceIdsMapByTagKey
.
Consider we filter the nodes result based on instances result, we would not have a chance to record any result (nodes > instances). Do you think this is a potential issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have addressed the 2 nil check.
About the edge cases, I did think of if but could not find anyway to do 2 metrics separately:
- For instances we already have a NTH managed tag so we can filter base on it
- For nodes i cannot see any label/annotation that we can filter upon
Therefore, I decided to filter nodes based on instances result. Do you know any info we can use to filter nodes independently
Issue #, if available:
Close #785
Description of changes:
k get node
return 5aws ec2 describe-instances
return 2Identify 3 nodes no longer under NTH control
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.