Skip to content

[processor/resourcedetection] ec2 detector start returns an error if network call fails #37426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

atoulme
Copy link
Contributor

@atoulme atoulme commented Jan 22, 2025

Description

The resourcedetection processor fails to start the processor if the ec2 detector fails to start because network is unreachable

Link to tracking issue

Fixes #35936

Testing

Added a unit test.

@atoulme atoulme requested review from dashpole and a team as code owners January 22, 2025 23:51
@github-actions github-actions bot added the processor/resourcedetection Resource detection processor label Jan 22, 2025
@github-actions github-actions bot requested a review from Aneurysm9 January 22, 2025 23:52
@atoulme
Copy link
Contributor Author

atoulme commented Jan 22, 2025

I am not very happy with this fix but this is the simplest change. It allows to make sure the collector only starts if it is initialized properly rather than reporting invalid information. It supposes that the collector will be restarted by the operating system after this failure.

…detector fails to start because network is unreachable
@atoulme atoulme force-pushed the detect_network_error branch from ec28237 to 452dad4 Compare January 22, 2025 23:58
var requestSendError *smithyhttp.RequestSendError
if errors.As(err, &requestSendError) {
d.logger.Warn("EC2 metadata endpoint unreachable", zap.Error(err))
return pcommon.NewResource(), "", err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here saying why we return an error for this particular use case?

Also, are we sure that smithyhttp.RequestSendError is fired only when ec2 is in not ready state? Will it be a different error if it's not an ec2 instance?

Also, smithyhttp.RequestSendError might be unreliable. If the SDK is changed to use something else, we might miss it.

I'm thinking if we can have a configuration option instead. Happy to here other opinions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as the initial solution, if the first 2 Qs above are addressed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can work on adding an integration test to make sure this works as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment here saying why we return an error for this particular use case?

I put a log warning above, can I maybe make that warning better, maybe more actionable? I will do that first. Comments tend to rot...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a log warning above, can I maybe make that warning better, maybe more actionable? I will do that first. Comments tend to rot...

The warning is for the end user. When someone reads the code, it's unclear why this particular error should be propagated while others are not.

@dmitryax dmitryax self-requested a review January 23, 2025 00:22
@atoulme
Copy link
Contributor Author

atoulme commented Jan 23, 2025

On further investigation, I think this approach is not valid and I will explain myself on #35936. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/resourcedetection Resource detection processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[processor/resourcedetection] The ec2 detector does not retry on initial failure
3 participants