Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding AgentHealth Middleware to Exporters #266

Merged
merged 6 commits into from
Dec 16, 2024

Conversation

Paramadon
Copy link


Description of the Issue

The CloudWatch Agent lacked an agenthealth configuration to monitor API health by tracking HTTP status codes for specific responses. Monitoring the status codes (200, 400, 408, 419, and 429) across all APIs is critical for diagnosing issues and ensuring comprehensive observability of API behaviors. Without this configuration, users were unable to quickly identify trends in API health metrics or correlate specific status codes with performance issues.


Changes Made

This change adds agent health to the emf_exporter and awscloudwatchlogs exporter. AwsXrayExporter already has handlers being added.

  • Added agenthealth Configuration:

    • Configured the CloudWatch Agent to track the following status codes across all APIs:
      • 200: Success
      • 400: Bad Request
      • 408: Request Timeout
      • 419: Authentication Timeout
      • 429: Too Many Requests
    • This ensures that health metrics for these common response codes are captured and reported.
  • Updated CloudWatch Agent Configuration JSON:

    • Introduced a new section under the metrics collection configuration to specify agenthealth metrics.
    • Ensured compatibility with existing metrics and logs collection by integrating the new configuration seamlessly.
  • Validated the Configuration:

    • Tested the updated configuration with the CloudWatch Agent to ensure the new metrics are collected and reported correctly.
    • Confirmed that the metrics appear as expected in CloudWatch for all specified APIs.

Impact

  • Provides real-time visibility into API status codes, helping identify anomalies or patterns that indicate issues.
  • Enhances observability for critical services by adding detailed health metrics.

Testing

  • Deployed the updated CloudWatch Agent health configuraion
  • Verified that metrics for status codes 200, 400, 408, 419, and 429 are captured in the header as you can see from the image below:
Screenshot 2024-11-15 at 4 36 21 PM

@Paramadon Paramadon requested a review from mxiamxia as a code owner December 10, 2024 21:34
@Paramadon Paramadon requested a review from wangzlei as a code owner December 10, 2024 21:36
exporter/awsemfexporter/emf_exporter.go Outdated Show resolved Hide resolved
retryCnt: *awsConfig.MaxRetries,
collectorID: collectorIdentifier.String(),
retryCnt: config.AWSSessionSettings.MaxRetries,
collectorID: uuid.New().String(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change from uuid.NewRandom()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it back to uuid.NewRandom().

Comment on lines +161 to +162
emf.pusherMapLock.Lock()
defer emf.pusherMapLock.Unlock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good find

exporter/awsemfexporter/emf_exporter.go Outdated Show resolved Hide resolved
sender = registry.Register(set.ID, cfg.TelemetryConfig, xrayClient, opts...)
}

var xrayClient awsxray.XRayClient

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with this exporter, but is it possible that that the reference to xrayClient on line 78 could happen before the initialization in the exporthelper.WithStart?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So based off the BaseExporter struct and the Start funciton of the Base exporter, the Start function is called before the exporter begins processing data. I believe this is a standard practice in the OpenTelemetry Collector's component lifecycle.

// baseExporter contains common fields between different exporter types.
type baseExporter struct {
	component.StartFunc
	component.ShutdownFunc

	signal component.DataType

	batchMergeFunc      exporterbatcher.BatchMergeFunc[Request]
	batchMergeSplitfunc exporterbatcher.BatchMergeSplitFunc[Request]

	marshaler   exporterqueue.Marshaler[Request]
	unmarshaler exporterqueue.Unmarshaler[Request]

	set    exporter.Settings
	obsrep *ObsReport

	// Message for the user to be added with an export failure message.
	exportFailureMessage string

	// Chain of senders that the exporter helper applies before passing the data to the actual exporter.
	// The data is handled by each sender in the respective order starting from the queueSender.
	// Most of the senders are optional, and initialized with a no-op path-through sender.
	batchSender   requestSender
	queueSender   requestSender
	obsrepSender  requestSender
	retrySender   requestSender
	timeoutSender *timeoutSender // timeoutSender is always initialized.

	consumerOptions []consumer.Option
}
func (be *baseExporter) Start(ctx context.Context, host component.Host) error {
	// First start the wrapped exporter.
	if err := be.StartFunc.Start(ctx, host); err != nil {
		return err
	}

	// If no error then start the batchSender.
	if err := be.batchSender.Start(ctx, host); err != nil {
		return err
	}

	// Last start the queueSender.
	return be.queueSender.Start(ctx, host)
}

@@ -34,6 +34,7 @@ type cwlExporter struct {
collectorID string
svcStructuredLog *cwlogs.Client
pusherFactory cwlogs.MultiStreamPusherFactory
params exp.Settings

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name this something like exporter settings?

@@ -39,6 +39,7 @@ type emfExporter struct {
pusherMap map[cwlogs.StreamKey]cwlogs.Pusher
svcStructuredLog *cwlogs.Client
config *Config
set exporter.Settings

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name this settings

exporter/awsemfexporter/emf_exporter.go Show resolved Hide resolved
Copy link

@okankoAMZ okankoAMZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking comments

@Paramadon Paramadon merged commit 8e059f1 into aws-cwa-dev Dec 16, 2024
137 of 146 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants