Skip to content

Conversation

j7m4
Copy link
Collaborator

@j7m4 j7m4 commented Sep 30, 2025

Summary by CodeRabbit

  • New Features

    • Added end-to-end OpenTelemetry tracing across the CLI and sync commands with automatic context propagation.
    • Instrumented AWS, Azure, Google Cloud, Kubernetes, GitHub, Salesforce, Tailscale, and Terraform operations for detailed spans and counts.
    • Telemetry configurable via environment variables; supports standard OTLP collectors and Datadog.
  • Documentation

    • Added a comprehensive telemetry setup and troubleshooting guide.
  • Chores

    • Added telemetry dependencies.
    • Updated .gitignore to ignore bin/ctrlc.

Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Walkthrough

This PR integrates OpenTelemetry tracing across the CLI: adds a telemetry module, wraps command execution with root spans, instruments multiple cloud/service sync paths, injects trace headers in API requests, updates go.mod to depend on OTEL, adds telemetry docs, and ignores bin/ctrlc. Some Google sync functions gain a real context parameter.

Changes

Cohort / File(s) Summary
Telemetry core
internal/telemetry/telemetry.go
New OTEL setup (init, exporter, resource, propagator), span helpers (root, API, error/success, attributes), and wrapper utility.
CLI entry
cmd/ctrlc/ctrlc.go
Wraps execution with telemetry init, root span, context propagation, error/success tagging.
API propagation
internal/api/client.go
Injects W3C trace context into HTTP headers via OTEL propagator.
Docs
docs/TELEMETRY.md
Adds telemetry configuration and usage guide.
Dependencies
go.mod
Promotes OTEL libs to direct deps; aligns versions for otel/sdk/trace/exporter.
Git ignore
.gitignore
Adds /bin/ctrlc.
AWS sync instrumentation
cmd/ctrlc/root/sync/aws/ec2/ec2.go, .../eks/eks.go, .../networks/networks.go, .../rds/rds.go
Adds spans around list/describe calls, error tagging, attributes for counts, success marking; per-call contexts.
Azure sync instrumentation
cmd/ctrlc/root/sync/azure/aks/aks.go, .../networks/networks.go
Adds root/list spans, subscription/group attributes, error tagging, resource count attributes.
Google sync instrumentation (context updates)
cmd/ctrlc/root/sync/google/cloudsql/cloudsql.go, .../networks/networks.go
Adds spans and attributes; updates function signatures to accept/use ctx.
Google sync instrumentation (others)
cmd/ctrlc/root/sync/google/bigtable/bigtable.go, .../buckets/buckets.go, .../cloudrun/cloudrun.go, .../gke/gke.go, .../projects/projects.go, .../redis/redis.go, .../secrets/secrets.go, .../vms/vms.go
Adds spans for list operations, error tagging, counts, success marking; logs counts.
GitHub sync
cmd/ctrlc/root/sync/github/pullrequests.go
Adds function and per-page spans for PRs/commits; attributes for repo, state, page, counts; error tagging.
Kubernetes sync
cmd/ctrlc/root/sync/kubernetes/kubernetes.go
Adds spans for namespaces/deployments listing; error tagging and count attributes.
Salesforce sync
cmd/ctrlc/root/sync/salesforce/accounts/accounts.go, .../opportunities/opportunities.go, .../common/util.go
Adds spans for processing, describe, paginate, per-batch; attributes for limits/counts; error tagging; context-aware utilities.
Tailscale sync
cmd/ctrlc/root/sync/tailscale/tailscale.go
Adds spans for listing devices, processing, and upsert; attributes and error/success tagging.
Terraform workspaces
cmd/ctrlc/root/sync/terraform/terraform_workspaces.go
Adds spans for listing, per-variable reads, retries; org/workspace attributes; count metrics and error tagging.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as ctrlc
  participant Tel as telemetry.Init
  participant Tr as telemetry.RootSpan
  participant Cmd as Root Command
  participant API as Ctrlplane API Client

  User->>CLI: Start process
  CLI->>Tel: InitTelemetry(ctx)
  Tel-->>CLI: shutdown()
  CLI->>Tr: StartRootSpan(command, args)
  activate Tr
  CLI->>Cmd: Execute(ctx with span)
  Cmd->>API: NewAPIKeyClientWithResponses(req)
  API->>API: Inject trace headers (traceparent)
  API-->>Cmd: Response
  Cmd-->>CLI: Result (ok/error)
  alt error
    CLI->>Tr: SetSpanError
  else success
    CLI->>Tr: SetSpanSuccess
  end
  Tr-->>CLI: End
  CLI->>Tel: shutdown(timeout)
Loading
sequenceDiagram
  autonumber
  participant Sync as sync: processX
  participant Prov as Cloud Provider API
  participant Ot as telemetry

  Sync->>Ot: Start span (process_x)
  Sync->>Ot: Start span (list_x)
  Sync->>Prov: List X (ctx)
  alt failure
    Sync->>Ot: SetSpanError(list/process)
    Sync-->>Caller: error
  else success
    Sync->>Ot: Add attributes (found count)
    Sync->>Ot: SetSpanSuccess(list)
    Sync->>Ot: End list span
    Sync-->>Caller: resources
    Sync->>Ot: Add attributes (processed count)
    Sync->>Ot: SetSpanSuccess(process)
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • aws/az network scans #26 — Touches AWS/Azure networks sync code that this PR instruments with telemetry; likely a follow-up adding observability to recently changed networking sync flows.

Poem

A rabbit taps keys with tracing delight,
Threads of spans shimmer, emerald and white.
From clouds to tails, each hop leaves a clue,
traceparent whispers, “I’m following you.”
Carrots of counts, successes in view—
OTEL-lit burrows, and metrics anew. 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely references the addition of OTEL hooks and the Datadog OTEL option, which are key components of these telemetry integrations, although it may not capture the full breadth of tracing additions across the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch j7m4/otel-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (12)
cmd/ctrlc/root/sync/google/gke/gke.go (1)

103-108: Consider using the returned context for proper context propagation.

The child span context is discarded (using _). While this works for the immediate API call, best practice is to use the returned context to ensure proper trace context propagation throughout the call chain.

Apply this diff to use the returned context:

-	_, listSpan := telemetry.StartSpan(ctx, "google.gke.list_clusters",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.gke.list_clusters",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("google.project_id", project),
 		),
 	)
 
 	parent := fmt.Sprintf("projects/%s/locations/-", project)
-	resp, err := gkeClient.Projects.Locations.Clusters.List(parent).Do()
+	resp, err := gkeClient.Projects.Locations.Clusters.List(parent).Context(listCtx).Do()
cmd/ctrlc/root/sync/google/cloudsql/cloudsql.go (1)

250-255: Consider capturing the child context.

The child context returned from StartSpan is discarded with _. While the span itself is correctly captured, best practice is to use the returned context to maintain proper parent-child span relationships in the trace hierarchy.

Apply this diff if you want to follow OpenTelemetry best practices:

-	_, listSpan := telemetry.StartSpan(ctx, "google.cloudsql.list_instances",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.cloudsql.list_instances",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("google.project_id", project),
 		),
 	)

Note: This change is optional as the current implementation will still work, but using the child context can improve trace visualization and context propagation if additional spans are added within this scope later.

cmd/ctrlc/root/sync/terraform/terraform_workspaces.go (1)

247-276: Consider recording conversion failures in span attributes.

While the overall tracing structure is correct, individual workspace conversion failures (lines 264-267) are logged but not reflected in span attributes. For enhanced observability, consider tracking failed conversions.

You could add a counter for failed conversions and record it as an attribute:

 	workspaceResources := []WorkspaceResource{}
+	failedConversions := 0
 	for _, workspace := range workspaces {
 		workspaceResource, err := convertWorkspaceToResource(ctx, workspace, client)
 		if err != nil {
 			log.Error("Failed to convert workspace to resource", "error", err, "workspace", workspace.Name)
+			failedConversions++
 			continue
 		}
 		workspaceResources = append(workspaceResources, workspaceResource)
 		time.Sleep(50 * time.Millisecond)
 	}
 
+	if failedConversions > 0 {
+		telemetry.AddSpanAttribute(span, "terraform.workspaces_failed", failedConversions)
+	}
 	telemetry.AddSpanAttribute(span, "terraform.workspaces_processed", len(workspaceResources))
 	telemetry.SetSpanSuccess(span)
docs/TELEMETRY.md (1)

233-237: Optional: Add language identifier to fenced code block.

The static analysis tool flagged this fenced code block for missing a language specifier. While not critical for error message examples, adding text or plaintext would satisfy linting rules.

Apply this diff:

-```
+```text
 grpc: addrConn.createTransport failed to connect to {...}: connection refused

</blockquote></details>
<details>
<summary>internal/telemetry/telemetry.go (4)</summary><blockquote>

`15-15`: **Consider upgrading to the latest semantic conventions version.**

The code uses `semconv v1.20.0`, but OpenTelemetry Go v1.35.0 (released March 2025) is available. While v1.20.0 remains functional, upgrading to a more recent semconv version (e.g., v1.26.0 or later) would provide updated attribute definitions and align with current OTel standards.

---

`86-92`: **Direct array index mutation is fragile.**

Lines 88 and 91 directly mutate `attrs[0]` and `attrs[1]` to override service name and version when `DD_SERVICE` or `DD_VERSION` are set. This assumes the first two elements are always `ServiceNameKey` and `ServiceVersionKey`, which is brittle if the initialization order changes.



Refactor to use a map or post-process the slice:

```diff
-	attrs := []attribute.KeyValue{
-		semconv.ServiceNameKey.String(serviceName),
-		semconv.ServiceVersionKey.String(serviceVersion),
-	}
+	serviceName := serviceName
+	serviceVersion := serviceVersion

 	// Add Datadog-specific attributes if enabled
 	if os.Getenv("DATADOG_ENABLED") == "true" {
 		if env := os.Getenv("DD_ENV"); env != "" {
-			attrs = append(attrs, attribute.String("deployment.environment", env))
+			// Add deployment environment
 		}
 		if service := os.Getenv("DD_SERVICE"); service != "" {
-			// Override service name if DD_SERVICE is set
-			attrs[0] = semconv.ServiceNameKey.String(service)
+			serviceName = service
 		}
 		if version := os.Getenv("DD_VERSION"); version != "" {
-			attrs[1] = semconv.ServiceVersionKey.String(version)
+			serviceVersion = version
 		}
+		// ... DD_TAGS parsing
+	}
+
+	attrs := []attribute.KeyValue{
+		semconv.ServiceNameKey.String(serviceName),
+		semconv.ServiceVersionKey.String(serviceVersion),
 	}
+	// Append Datadog env and tags after

209-230: Type switch missing []int and []int64 cases.

The AddSpanAttribute function handles most common types but omits []int and []int64 slices. While less common, these types could appear in practice. Consider adding cases or falling back to converting them to []string for completeness.

Apply this diff to add support for integer slices:

 	case []string:
 		span.SetAttributes(attribute.StringSlice(key, v))
+	case []int:
+		span.SetAttributes(attribute.IntSlice(key, v))
+	case []int64:
+		span.SetAttributes(attribute.Int64Slice(key, v))
 	default:
 		span.SetAttributes(attribute.String(key, fmt.Sprintf("%v", v)))

233-238: Lazy tracer initialization is safe but may cause confusion.

GetTracer and other functions initialize the global tracer if nil, but since InitTelemetry should be called first, this fallback is defensive. Consider documenting that InitTelemetry must be called before using span functions, or always use the fallback to avoid confusion.

cmd/ctrlc/root/sync/tailscale/tailscale.go (1)

96-112: Consider preserving span context for proper trace hierarchy.

The list_devices span is started from ctx and immediately ended after the List call completes. However, the context returned from StartSpan (line 96) is shadowing the original ctx but isn't used for subsequent operations. This breaks the trace parent-child relationship: the process_devices span (line 116) will also start from the original ctx, making it a sibling rather than a child of list_devices.

Apply this diff to preserve the span context and establish the correct hierarchy:

-	ctx, span := telemetry.StartSpan(ctx, "tailscale.list_devices",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "tailscale.list_devices",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("tailscale.tailnet", tailnet),
 		),
 	)
 
-	devices, err := tsc.Devices().List(ctx)
+	devices, err := tsc.Devices().List(listCtx)
 	if err != nil {
-		telemetry.SetSpanError(span, err)
-		span.End()
+		telemetry.SetSpanError(listSpan, err)
+		listSpan.End()
 		return fmt.Errorf("failed to list devices: %w", err)
 	}
 
-	telemetry.AddSpanAttribute(span, "tailscale.devices_found", len(devices))
-	telemetry.SetSpanSuccess(span)
-	span.End()
+	telemetry.AddSpanAttribute(listSpan, "tailscale.devices_found", len(devices))
+	telemetry.SetSpanSuccess(listSpan)
+	listSpan.End()
 
 	log.Info("Found Tailscale devices", "count", len(devices))
 
-	processCtx, processSpan := telemetry.StartSpan(ctx, "tailscale.process_devices",
+	processCtx, processSpan := telemetry.StartSpan(listCtx, "tailscale.process_devices",
cmd/ctrlc/root/sync/google/projects/projects.go (1)

55-60: Consider renaming the span attribute for consistency.

The attribute key google.projects.projects_found contains redundant "projects". Consider using google.cloudresourcemanager.projects_found to align with the span name, or simply google.projects.found for brevity.

Example:

-telemetry.AddSpanAttribute(listSpan, "google.projects.projects_found", len(resp.Projects))
+telemetry.AddSpanAttribute(listSpan, "google.cloudresourcemanager.projects_found", len(resp.Projects))
cmd/ctrlc/root/sync/google/networks/networks.go (1)

124-129: Unused context variable in child span.

Line 124 creates a child span but discards the returned context with _. This is acceptable when the context isn't needed for subsequent calls, but consider using the returned context for consistency if there are nested operations.

If you plan to add nested telemetry in the future, consider:

-	_, listSpan := telemetry.StartSpan(ctx, "google.compute.list_networks",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.compute.list_networks",

Then use listCtx for any nested operations. For now, this pattern is acceptable since List is a leaf operation.

cmd/ctrlc/ctrlc.go (1)

65-68: Command name extraction is functional but could be more precise.

The current approach joins all arguments, which works for observability but may not reflect the actual command structure (e.g., flags will be included). Consider using the actual executed command name after execution for more accurate spans.

However, this is acceptable for the current use case.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c22bab5 and 7a21d69.

📒 Files selected for processing (30)
  • .gitignore (1 hunks)
  • cmd/ctrlc/ctrlc.go (2 hunks)
  • cmd/ctrlc/root/sync/aws/ec2/ec2.go (3 hunks)
  • cmd/ctrlc/root/sync/aws/eks/eks.go (3 hunks)
  • cmd/ctrlc/root/sync/aws/networks/networks.go (8 hunks)
  • cmd/ctrlc/root/sync/aws/rds/rds.go (6 hunks)
  • cmd/ctrlc/root/sync/azure/aks/aks.go (4 hunks)
  • cmd/ctrlc/root/sync/azure/networks/networks.go (4 hunks)
  • cmd/ctrlc/root/sync/github/pullrequests.go (5 hunks)
  • cmd/ctrlc/root/sync/google/bigtable/bigtable.go (3 hunks)
  • cmd/ctrlc/root/sync/google/buckets/buckets.go (3 hunks)
  • cmd/ctrlc/root/sync/google/cloudrun/cloudrun.go (2 hunks)
  • cmd/ctrlc/root/sync/google/cloudsql/cloudsql.go (3 hunks)
  • cmd/ctrlc/root/sync/google/gke/gke.go (3 hunks)
  • cmd/ctrlc/root/sync/google/networks/networks.go (9 hunks)
  • cmd/ctrlc/root/sync/google/projects/projects.go (4 hunks)
  • cmd/ctrlc/root/sync/google/redis/redis.go (3 hunks)
  • cmd/ctrlc/root/sync/google/secrets/secrets.go (5 hunks)
  • cmd/ctrlc/root/sync/google/vms/vms.go (3 hunks)
  • cmd/ctrlc/root/sync/kubernetes/kubernetes.go (6 hunks)
  • cmd/ctrlc/root/sync/salesforce/accounts/accounts.go (3 hunks)
  • cmd/ctrlc/root/sync/salesforce/common/util.go (8 hunks)
  • cmd/ctrlc/root/sync/salesforce/opportunities/opportunities.go (3 hunks)
  • cmd/ctrlc/root/sync/tailscale/tailscale.go (3 hunks)
  • cmd/ctrlc/root/sync/terraform/terraform_workspaces.go (6 hunks)
  • docs/API_TELEMETRY_GUIDE.md (1 hunks)
  • docs/TELEMETRY.md (1 hunks)
  • go.mod (1 hunks)
  • internal/api/client.go (1 hunks)
  • internal/telemetry/telemetry.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-15T17:46:32.343Z
Learnt from: adityachoudhari26
PR: ctrlplanedev/cli#4
File: cmd/ctrlc/root/sync/terraform/terraform.go:35-45
Timestamp: 2025-01-15T17:46:32.343Z
Learning: In the Ctrlplane CLI, the API client (`api.NewAPIKeyClientWithResponses`) handles invalid credentials internally by returning an error, making explicit validation of apiURL and apiKey unnecessary.

Applied to files:

  • cmd/ctrlc/root/sync/kubernetes/kubernetes.go
🧬 Code graph analysis (25)
cmd/ctrlc/root/sync/google/buckets/buckets.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/salesforce/accounts/accounts.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • AddSpanAttribute (209-230)
  • SetSpanError (194-199)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/google/cloudsql/cloudsql.go (2)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/aws/ec2/ec2.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/google/redis/redis.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/github/pullrequests.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/salesforce/opportunities/opportunities.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • AddSpanAttribute (209-230)
  • SetSpanError (194-199)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/aws/eks/eks.go (3)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/aws/common/regions.go (1)
  • GetAccountID (71-78)
cmd/ctrlc/root/sync/tailscale/tailscale.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/resource_provider.go (1)
  • NewResourceProvider (11-50)
cmd/ctrlc/root/sync/google/cloudrun/cloudrun.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/aws/rds/rds.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/google/networks/networks.go (2)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/salesforce/common/util.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • AddSpanAttribute (209-230)
  • SetSpanError (194-199)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/google/bigtable/bigtable.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.go (1)
internal/api/client.gen.go (3)
  • ClientWithResponses (6654-6656)
  • NewClientWithResponses (6660-6666)
  • WithRequestEditorFn (2442-2447)
cmd/ctrlc/root/sync/google/projects/projects.go (2)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/google/secrets/secrets.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/ctrlc.go (1)
internal/telemetry/telemetry.go (4)
  • InitTelemetry (28-72)
  • StartRootSpan (174-191)
  • SetSpanError (194-199)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/kubernetes/kubernetes.go (3)
internal/api/client.go (1)
  • NewAPIKeyClientWithResponses (12-28)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/azure/networks/networks.go (3)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
internal/api/client.gen.go (1)
  • CreateResource (168-176)
cmd/ctrlc/root/sync/azure/common/resourceGroup.go (2)
  • ResourceGroupInfo (10-13)
  • GetResourceGroupInfo (15-42)
cmd/ctrlc/root/sync/google/vms/vms.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • SetSpanSuccess (202-206)
  • AddSpanAttribute (209-230)
cmd/ctrlc/root/sync/azure/aks/aks.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/google/gke/gke.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/terraform/terraform_workspaces.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
cmd/ctrlc/root/sync/aws/networks/networks.go (1)
internal/telemetry/telemetry.go (4)
  • StartSpan (241-246)
  • SetSpanError (194-199)
  • AddSpanAttribute (209-230)
  • SetSpanSuccess (202-206)
🪛 markdownlint-cli2 (0.18.1)
docs/API_TELEMETRY_GUIDE.md

314-314: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/TELEMETRY.md

234-234: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: docker (linux/amd64)
🔇 Additional comments (124)
cmd/ctrlc/root/sync/google/gke/gke.go (5)

15-15: LGTM! Appropriate imports for OpenTelemetry integration.

The telemetry and OpenTelemetry imports are correctly added to support the tracing functionality.

Also applies to: 18-19


94-100: LGTM! Parent span properly initialized with deferred cleanup.

The parent span is correctly created with appropriate span kind, attributes, and a deferred End() call ensuring proper cleanup.


113-118: LGTM! Error handling correctly marks both spans.

Both the child span (listSpan) and parent span are appropriately marked with errors before returning, ensuring observability of failures at both levels of the trace hierarchy.


120-122: LGTM! Success path properly records metrics and closes child span.

The child span is correctly marked with the number of clusters found, marked as successful, and explicitly ended before continuing with cluster processing.


136-137: LGTM! Parent span records final metrics and success status.

The parent span appropriately records the total number of processed clusters and is marked as successful. The deferred End() call at line 100 will properly close this span.

cmd/ctrlc/root/sync/google/cloudsql/cloudsql.go (5)

14-18: LGTM! Telemetry imports added correctly.

The new imports for telemetry, OpenTelemetry attributes, and tracing are necessary for the instrumentation added to processInstances.


240-240: Good change! Context now properly utilized.

The function signature correctly changes from discarding the context (_) to using it (ctx) for span propagation and telemetry.


241-247: LGTM! Root span properly initialized.

The root span is correctly created with:

  • Appropriate span kind (SpanKindClient)
  • Descriptive name following consistent naming pattern
  • Project ID attribute for filtering/grouping
  • Deferred End() ensuring cleanup

257-264: LGTM! Error handling properly instruments spans.

The error path correctly:

  • Records the error on the child span
  • Ends the child span explicitly
  • Records the error on the root span (which will be ended by defer)
  • Propagates the error to the caller

266-282: LGTM! Success path properly instrumented.

The success path correctly:

  • Records meaningful metrics (instances_found, total_instances) with namespaced attribute names
  • Marks both spans as successful using the telemetry helper
  • Explicitly ends the child span after recording its metrics
  • Allows the root span to be ended by the deferred call

The instrumentation provides good observability into the Cloud SQL sync process.

cmd/ctrlc/root/sync/terraform/terraform_workspaces.go (4)

14-17: LGTM! Telemetry imports are appropriate.

The added imports for OpenTelemetry tracing and the internal telemetry package align with the PR objectives and are used consistently throughout the file.


50-103: LGTM! Well-structured tracing implementation.

The function correctly implements nested span patterns with proper lifecycle management:

  • Root span captures workspace-level metrics
  • Child spans track individual variable operations
  • All code paths properly end spans
  • Context propagation flows correctly through the child spans to the API call
  • Error and success states are recorded appropriately

175-213: LGTM! Retry logic properly instrumented.

The span correctly encompasses the entire retry operation, providing visibility into the full request lifecycle including all retry attempts. The conditional attribute recording for workspace count is appropriate and avoids potential nil pointer issues.


215-245: LGTM! Pagination orchestration properly traced.

The function correctly creates a parent span that encompasses all paginated calls, with proper context propagation to child operations. The total workspace count provides valuable high-level metrics.

cmd/ctrlc/root/sync/google/vms/vms.go (6)

13-13: LGTM! Clean telemetry imports.

The imports align with the project's telemetry module and OpenTelemetry conventions.

Also applies to: 16-17


92-98: LGTM! Proper parent span initialization.

The span correctly:

  • Uses SpanKindClient for the GCP API interaction
  • Attaches the project ID as an attribute for traceability
  • Ensures cleanup via defer span.End()
  • Propagates context properly

111-116: Error handling is thorough.

The error path correctly:

  • Records the error on the list span
  • Ends the list span before returning
  • Propagates the error to the parent span
  • Returns a wrapped error

Note: If you adopt the defer listSpan.End() pattern from the previous comment, remove the explicit listSpan.End() on line 113 to avoid double-ending the span.


118-119: Success path properly marked.

The list span is correctly marked successful and ended.

Note: If you adopt the defer listSpan.End() pattern, remove the explicit listSpan.End() on line 119 to avoid double-ending the span.


142-143: LGTM! Proper span finalization with metrics.

Adding the VM count as a span attribute provides valuable observability, and the success marker clearly indicates the operation completed without errors.


100-106: Child span won't nest properly under parent.

The context returned by StartSpan is discarded (line 101), which breaks the parent-child relationship in the trace. Child spans must use the context returned from their StartSpan call to properly nest under the parent.

Apply this diff to fix the span hierarchy:

-	// Create span for AggregatedList call
-	_, listSpan := telemetry.StartSpan(ctx, "google.compute.list_instances",
+	// Create child span for AggregatedList call
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.compute.list_instances",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("google.project_id", project),
 		),
 	)
+	defer listSpan.End()

Then update line 109 to use listCtx if the API call accepts context (though the current AggregatedList call doesn't use context, so this is primarily for proper trace hierarchy).

Likely an incorrect or invalid review comment.

docs/API_TELEMETRY_GUIDE.md (8)

1-17: LGTM! Clear and comprehensive summary.

The opening section effectively communicates the guide's purpose and provides a complete inventory of implemented integrations across major cloud providers and services.


70-86: LGTM! Excellent documentation of automatic trace propagation.

The W3C Trace Context standard with traceparent header is the correct approach for distributed tracing. The automatic, transparent propagation via the API client is well-designed and clearly documented.


87-181: LGTM! Well-structured implementation patterns.

All three telemetry patterns demonstrate correct usage:

  • Proper span lifecycle management with defer
  • Appropriate error handling and span status
  • Correct context propagation
  • Good practices for pagination with child spans

These examples provide clear, idiomatic Go patterns for developers to follow.


182-207: LGTM! Helpful reference section.

Providing concrete file paths with line numbers gives developers clear examples to follow when implementing telemetry in similar services.


208-256: LGTM! Well-organized attribute taxonomy.

The attribute naming is consistent and follows semantic convention patterns. Service-specific prefixes (e.g., terraform., github., aws.) provide clear namespacing while common attributes enable cross-service analysis.


257-289: LGTM! Practical checklist and testing guidance.

The implementation checklist is comprehensive, and the testing section covers the essential scenarios: build validation, graceful degradation with telemetry disabled, and successful telemetry export.


290-312: LGTM! Clear articulation of benefits and Datadog integration.

The benefits section effectively communicates the value of the telemetry implementation, and the Datadog-specific features accurately describe APM capabilities including unified service tagging and infrastructure correlation.


18-69: Datadog OTLP endpoint addresses are correct.
The default local receiver port (0.0.0.0:4317) and the agentless intake endpoints (api.datadoghq.com:4317 for US, api.datadoghq.eu:4317 for EU) match Datadog’s current documentation.

cmd/ctrlc/root/sync/google/cloudrun/cloudrun.go (3)

157-161: LGTM: Error handling is correct.

The error path properly records the error on the span, ends it, and returns. The span lifecycle is managed correctly.


163-165: LGTM: Success path correctly annotates and closes the span.

The success path adds a useful count attribute, marks the span as successful, and properly ends it. The attribute naming follows good conventions.


173-173: LGTM: Useful logging addition.

The log statement provides good runtime visibility into the number of services found.

docs/TELEMETRY.md (9)

1-11: LGTM: Clear introduction to telemetry features.

The feature list accurately describes the telemetry capabilities added in this PR.


12-46: LGTM: Comprehensive configuration documentation.

The environment variable documentation covers both standard OpenTelemetry variables and Datadog-specific configuration. The dual-mode support (Agent vs. direct intake) is well explained.


47-131: LGTM: Practical and accurate usage examples.

The examples cover common telemetry backends (Jaeger, Datadog, custom OTLP) with correct configuration. The Datadog Agent YAML snippet is particularly helpful.


132-150: LGTM: Span attributes are well-documented.

The documented attributes follow OpenTelemetry semantic conventions and provide good visibility into CLI operations.


151-176: LGTM: External API tracing coverage is comprehensive.

The documentation covers a wide range of cloud providers and services. The mention of W3C Trace Context propagation via traceparent header is technically accurate.


177-227: LGTM: Integration examples follow correct patterns.

The Go code examples demonstrate proper span lifecycle management with deferred cleanup, error handling, and attribute setting. The helper function examples (WithTelemetry, StartAPISpan) provide clear guidance for developers.


228-244: LGTM: Practical troubleshooting guidance.

The troubleshooting section addresses the most common issue (connection refused) with clear, actionable solutions.


245-294: LGTM: Thorough Datadog troubleshooting guide.

The Datadog-specific troubleshooting covers common issues comprehensively, including Agent verification, connectivity checks, and regional endpoint configuration. The diagnostic commands are accurate and practical.


295-308: LGTM: Debugging and performance notes are helpful.

The debug logging example is correct, and the performance notes accurately describe the telemetry implementation (async export, graceful shutdown, non-blocking initialization).

internal/telemetry/telemetry.go (6)

28-40: LGTM: Telemetry initialization with graceful degradation.

The initialization correctly handles both explicit disablement and missing configuration, returning no-op shutdown functions. This prevents telemetry from blocking CLI execution when not configured.


174-191: LGTM: Root span creation with appropriate attributes.

The StartRootSpan function correctly initializes the tracer if nil, uses SpanKindInternal for CLI commands, and attaches relevant command metadata. Good use of semantic attributes.


194-206: LGTM: Span error and success helpers are safe.

Both SetSpanError and SetSpanSuccess correctly check for nil spans before setting attributes. The error recording uses the standard RecordError API.


259-274: LGTM: Generic wrapper with proper span lifecycle.

The WithTelemetry function correctly defers span.End(), propagates the context, and marks success/error states. The use of generics makes this reusable across different return types.


142-147: Require resolvable hostname for DD_OTLP_GRPC_ENDPOINT
credentials.NewClientTLSFromCert(nil, "") uses the endpoint’s hostname for SNI and certificate validation. If DD_OTLP_GRPC_ENDPOINT is set to an IP address (e.g. 127.0.0.1:4317), the TLS handshake will fail—use a DNS name (e.g. api.datadoghq.com:4317, api.datadoghq.eu:4317 or your Agent’s host) instead.


104-116: Verify environment merge order and rerun shell test
Line 106’s resource.WithFromEnv() is applied after WithAttributes(...), so any OTEL_RESOURCE_ATTRIBUTES or OTEL_SERVICE_NAME in the environment will overwrite explicit attributes. The grep test didn’t search files due to filter globs. Rerun without filters, for example:

rg -n 'OTEL_SERVICE_NAME|OTEL_RESOURCE_ATTRIBUTES' -g '*.go'

to confirm no direct references, and consult OTel v1.30.0 docs to validate that WithFromEnv() merges last. If explicit attrs must take precedence, call WithFromEnv() before WithAttributes(...).

cmd/ctrlc/root/sync/google/bigtable/bigtable.go (5)

108-114: LGTM: Parent span creation with appropriate attributes.

The parent span google.bigtable.process_instances is correctly created with SpanKindClient and the project ID attribute. Good use of context propagation.


117-122: Child span context should derive from parent span context.

Line 117 creates the list_instances span using the same ctx returned from the parent span creation (line 108). However, the child span context returned on line 117 is discarded (assigned to _). This is intentional if you don't need to propagate the child span context further, but ensure this is the desired behavior. Typically, child operations would use the parent span's context.

The pattern is correct: the parent context (line 108's ctx) is reused for the child span, and the child span is manually ended. No issue here.


127-132: LGTM: Dual error reporting on parent and child spans.

On error, both the list span and parent span are marked with errors and the list span is explicitly ended before returning. This ensures proper span lifecycle even on error paths.


134-136: LGTM: List span attributes and success marking.

Attributes are recorded for the count of instances found, and the span is marked successful before being ended. Good telemetry hygiene.


150-151: LGTM: Parent span attributes and success marking.

The parent span records the total processed instances and is marked successful. The deferred span.End() on line 114 will close the span on function return.

cmd/ctrlc/root/sync/google/buckets/buckets.go (5)

99-105: LGTM: Parent span creation for bucket processing.

The parent span google.storage.process_buckets is correctly created with SpanKindClient and the project ID attribute.


108-113: LGTM: Child span creation for listing buckets.

The child span google.storage.list_buckets reuses the parent context correctly, and the returned context is discarded as intended.


118-123: LGTM: Dual error reporting on parent and child spans.

Both spans are marked with errors, and the list span is explicitly ended before returning. Proper error path handling.


125-127: LGTM: List span attributes and success marking.

The buckets found count is recorded, and the span is marked successful and ended.


141-142: LGTM: Parent span attributes and success marking.

The total buckets processed is recorded, and the parent span is marked successful before the deferred End() completes.

cmd/ctrlc/root/sync/tailscale/tailscale.go (2)

176-182: LGTM: Proper span nesting for upsert operation.

The upsert_resources span correctly derives from processCtx, establishing the proper parent-child relationship with the process_devices span. The deferred End() ensures cleanup even on early returns.


191-191: LGTM: Context propagation through UpsertResource call.

Using upsertCtx here correctly propagates the trace context into the downstream UpsertResource operation, maintaining the span hierarchy.

cmd/ctrlc/root/sync/google/redis/redis.go (1)

135-137: LGTM: Proper success marking and attribute recording.

The parent span correctly records the total instances count and marks success before the deferred End() executes.

cmd/ctrlc/root/sync/google/secrets/secrets.go (2)

155-157: LGTM: Proper aggregation and success marking.

The parent span correctly records the total secret count across all pages and marks success before the deferred End() executes.


109-135: Fix: propagate the list span context for distributed tracing

Use the returned listCtx from telemetry.StartSpan and pass it to the Secret Manager call:

-   _, listSpan := telemetry.StartSpan(ctx, "google.secretmanager.list_secrets",
+   listCtx, listSpan := telemetry.StartSpan(ctx, "google.secretmanager.list_secrets",
...
-   response, err := call.Do()
+   response, err := call.Context(listCtx).Do()

Verify that the generated Secret Manager client (*ProjectsSecretsListCall) supports .Context(context.Context)—if not, this will remain a client library limitation.

cmd/ctrlc/root/sync/google/projects/projects.go (2)

10-10: LGTM!

The telemetry imports are correctly added and align with the PR's OpenTelemetry integration objectives.

Also applies to: 13-13


49-53: LGTM!

Error handling correctly records the error on the span and ends it before returning. This ensures traces properly capture failure states.

cmd/ctrlc/root/sync/azure/aks/aks.go (5)

22-22: LGTM!

Telemetry imports are correctly added.

Also applies to: 25-26


166-172: LGTM!

The root span is correctly created with proper attributes and deferred cleanup. The context is captured for use in child operations.


186-199: LGTM!

The list span is correctly created and the span-aware context (listCtx) is properly used in the pager.NextPage() call, ensuring trace propagation.


201-204: LGTM!

Error handling correctly marks both the list span and parent span with errors, and properly ends the list span before returning.


230-232: LGTM!

The telemetry attributes are well-structured: the list span captures clusters_found and the parent span captures clusters_processed. Both spans are properly marked as successful.

Also applies to: 241-242

cmd/ctrlc/root/sync/kubernetes/kubernetes.go (4)

11-11: LGTM!

Telemetry imports are correctly added.

Also applies to: 14-15


119-136: LGTM!

The namespace listing span is correctly implemented with proper context propagation, error handling, and success markers.


144-161: LGTM!

The deployment listing span is correctly implemented with proper context propagation, error handling, and success markers.


182-182: LGTM!

The log line provides good visibility into the resources found, including individual counts and totals.

cmd/ctrlc/root/sync/azure/networks/networks.go (5)

6-8: LGTM!

Imports are correctly added for telemetry and supporting functionality.

Also applies to: 20-20, 23-24


164-172: LGTM!

The root span is correctly created with proper attributes and deferred cleanup.


199-216: LGTM!

The list span is correctly created within the goroutine with proper context usage for the pager operation and appropriate error handling.


209-210: LGTM!

The telemetry attributes are well-structured with clear separation:

  • Per-resource-group list spans track local networks_found and subnets_found
  • The root span aggregates global networks_found, networks_processed, subnets_found, and subnets_processed

The counting logic correctly distinguishes between networks and subnets by checking the Kind field.

Also applies to: 223-223, 230-254, 265-280


261-261: LGTM!

The log message is correctly updated from "clusters" to "networks", fixing a copy-paste error from similar sync commands.

cmd/ctrlc/root/sync/github/pullrequests.go (3)

294-302: LGTM! Well-structured root span initialization.

The span correctly wraps the entire fetch operation with appropriate span kind and attributes for the GitHub API call.


314-357: LGTM! Proper per-page span lifecycle and error handling.

The per-page child spans correctly:

  • Use the parent context for nesting
  • Pass pageCtx to the API call for trace propagation
  • Record errors on both page and parent spans
  • Track per-page metrics and mark success
  • End properly in all paths

365-428: LGTM! Consistent telemetry pattern for commit fetching.

The instrumentation mirrors the fetchPRs pattern with proper span lifecycle, per-page child spans, error propagation, and success marking.

cmd/ctrlc/root/sync/google/networks/networks.go (4)

114-171: LGTM! Context propagation and span lifecycle are correct.

The function signature now accepts context, creates proper parent/child spans, and tracks both networks_found and total_networks metrics. Error handling correctly marks both spans.


270-326: LGTM! Proper telemetry for subnet processing.

Context parameter added, spans properly nested, and aggregated list results tracked with total_subnets attribute.


449-495: LGTM! Firewall processing telemetry is consistent.

Follows the established pattern with proper error handling and success marking.


644-700: LGTM! Forwarding rules telemetry matches the pattern.

Consistent implementation with proper span lifecycle and metrics tracking.

internal/api/client.go (1)

17-24: LGTM! Standard OpenTelemetry trace propagation.

The implementation correctly:

  • Uses the global propagator (configured elsewhere, likely W3C Trace Context format)
  • Injects span context into HTTP headers via HeaderCarrier
  • Enables distributed tracing to the Ctrlplane API

This allows spans created in the CLI to be linked with spans in the downstream service.

cmd/ctrlc/root/sync/salesforce/opportunities/opportunities.go (5)

12-12: LGTM!

The telemetry imports are appropriate for the tracing instrumentation added to this file.

Also applies to: 16-17


103-111: LGTM!

The span initialization is correct with appropriate span kind (Client) for external API calls and descriptive attributes. The deferred span.End() ensures proper span lifecycle management.


113-115: LGTM!

Conditionally adding the where clause as a span attribute only when present is a good practice.


124-128: LGTM!

Error handling correctly records the error on the span before returning. The deferred span.End() will properly close the span.


130-141: LGTM!

The success path correctly adds observability attributes (records found and processed) and marks the span as successful.

cmd/ctrlc/root/sync/aws/eks/eks.go (6)

20-20: LGTM!

The telemetry imports are appropriate for the instrumentation added.

Also applies to: 23-24


137-143: LGTM!

The outer span initialization is correct with appropriate span kind and region attribute. The deferred span.End() ensures the span closes after all nested operations complete.


148-152: LGTM!

Error handling correctly records the error on the outer span before returning.


156-175: LGTM!

The ListClusters span is properly managed with explicit End() calls in both error and success paths. The use of a child context (listCtx) ensures proper trace propagation.


179-199: LGTM!

The DescribeCluster span is correctly managed with explicit End() calls. Using continue on error is appropriate since this is processing multiple clusters in a loop.


215-216: LGTM!

The outer span is correctly marked as successful with a useful total clusters metric.

cmd/ctrlc/root/sync/salesforce/accounts/accounts.go (5)

11-11: LGTM!

The telemetry imports are appropriate for the instrumentation added.

Also applies to: 15-16


103-111: LGTM!

The span initialization follows the correct pattern with appropriate span kind and attributes.


113-115: LGTM!

Conditionally adding the where clause attribute is a good practice.


124-128: LGTM!

Error handling correctly records the error on the span before returning.


130-141: LGTM!

The success path correctly adds observability metrics and marks the span as successful.

cmd/ctrlc/ctrlc.go (4)

4-4: LGTM!

The new imports are appropriate for the telemetry integration.

Also applies to: 6-7, 11-11


43-62: LGTM!

The telemetry initialization is well-designed:

  • Gracefully handles initialization failures with a warning
  • Properly defers shutdown with a reasonable timeout
  • Ensures telemetry doesn't block CLI exit

70-72: LGTM!

The root span is correctly created and deferred to ensure it closes after command execution.


75-88: LGTM!

The command execution correctly:

  • Propagates the telemetry context to subcommands via SetContext
  • Records errors and success on the root span
  • Ensures the span lifecycle completes properly
cmd/ctrlc/root/sync/aws/ec2/ec2.go (3)

19-24: LGTM!

The OpenTelemetry imports are correctly added to support span creation and attribute management.


98-112: LGTM!

The span lifecycle is correctly managed:

  • Client span kind is appropriate for the AWS SDK call
  • Region is captured as an attribute
  • Error handling properly records the error and returns early
  • Reservations count is recorded after successful describe

259-260: LGTM!

The span is correctly marked successful after all instances are processed, with the count recorded as an attribute for observability.

cmd/ctrlc/root/sync/aws/rds/rds.go (4)

18-23: LGTM!

The OpenTelemetry imports are correctly added, consistent with the telemetry pattern used across the codebase.


158-189: LGTM!

The nested span pattern correctly instruments the paginated DescribeDBInstances call:

  • Parent span tracks the overall process_instances operation
  • Child spans track individual describe calls
  • Error handling propagates to both spans appropriately
  • Child spans are explicitly ended in both success and error paths

206-207: LGTM!

The parent span is correctly finalized with the total instances processed and marked successful only after all paginated results are processed.


507-552: LGTM!

The parameter group fetch operation is correctly instrumented:

  • Span captures the parameter group name as an attribute
  • Error handling properly records errors before early return
  • Success is marked only after all paginated parameters are fetched
  • The parameters_fetched count provides useful observability
cmd/ctrlc/root/sync/aws/networks/networks.go (6)

14-18: LGTM! Telemetry imports added correctly.

The new imports for OpenTelemetry tracing and the internal telemetry package are appropriate for instrumenting this file.


180-186: LGTM! Root span properly initialized.

The parent span for processNetworks is correctly created with appropriate span kind and region attribute. The deferred span.End() ensures cleanup.


203-222: LGTM! Nested span lifecycle managed correctly.

The describe_vpcs span for each pagination iteration is properly:

  • Started with context propagation to the AWS SDK call (line 210)
  • Annotated with error on failure (line 214-215)
  • Enriched with VPC count attribute on success (line 220)
  • Closed after each iteration (line 222)

246-247: LGTM! Parent span finalized correctly.

The processed VPC count and success status are recorded on the parent span after all processing completes.


317-349: LGTM! Subnet retrieval properly instrumented.

The span lifecycle in getAwsSubnets is correct:

  • Context passed to AWS SDK call (line 334)
  • Errors recorded on span (line 336)
  • Success attributes added before return (lines 347-348)

353-381: LGTM! Subnet processing span correctly implemented.

The process_subnets span properly wraps the subnet processing loop with appropriate attributes for total and processed counts.

cmd/ctrlc/root/sync/salesforce/common/util.go (6)

14-18: LGTM! Telemetry imports added correctly.

The OpenTelemetry imports are appropriate for instrumenting Salesforce query operations.


37-56: LGTM! Query span initialization and validation are correct.

The span is properly initialized with relevant attributes, and the target type validation error is correctly recorded on the span before returning.


96-113: LGTM! Context propagation and span finalization are correct.

The context is properly passed to logAvailableFields (line 99) and paginateQuery (line 105), enabling nested span creation. Error and success paths correctly update the span.


116-158: LGTM! Field description properly instrumented.

The logAvailableFields function now accepts context and properly instruments the Salesforce describe operation with comprehensive error handling and success attributes.


198-277: LGTM! Pagination with per-batch spans is excellently instrumented.

The nested span pattern is correctly implemented:

  • Parent span tracks overall pagination
  • Each batch gets its own span with proper attributes
  • Batch spans are closed in all paths (error, empty, success)
  • Context is propagated to executeQuery (line 231)
  • Summary attributes recorded on parent span before completion

280-316: LGTM! Query execution properly instrumented.

The executeQuery function now accepts context and correctly instruments the query execution with comprehensive error handling at each stage (request, read, unmarshal).

test.bash

/bin
/ctrlc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verify the ignore pattern is correct.

The pattern /ctrlc ignores a file at the repository root, but based on the AI summary and the existing /bin entry, you likely intended /bin/ctrlc to ignore the built binary under the bin directory.

Apply this diff if the binary is built to bin/ctrlc:

 /bin
-/ctrlc
+/bin/ctrlc
 .idea
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/ctrlc
/bin
/bin/ctrlc
.idea
🤖 Prompt for AI Agents
In .gitignore around line 5, the pattern "/ctrlc" ignores a file at repo root
but you intended to ignore the built binary under bin; update the pattern to
"/bin/ctrlc" (or the actual build path if different) so the built binary in the
bin directory is ignored, and verify there are no duplicate or conflicting
entries for /bin elsewhere in the file.

Comment on lines +147 to +153
// Create span for listing Cloud Run services
_, listSpan := telemetry.StartSpan(ctx, "google.cloudrun.list_services",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", *project),
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Context from StartSpan is discarded.

The span context returned from telemetry.StartSpan is being discarded (assigned to _). While the Google Cloud Run API call on line 155 may not propagate trace context, discarding the context breaks the telemetry chain and prevents child spans from being properly linked.

Apply this diff to propagate the context:

-		// Create span for listing Cloud Run services
-		_, listSpan := telemetry.StartSpan(ctx, "google.cloudrun.list_services",
+		// Create span for listing Cloud Run services
+		ctx, listSpan := telemetry.StartSpan(ctx, "google.cloudrun.list_services",
 			trace.WithSpanKind(trace.SpanKindClient),
 			trace.WithAttributes(
 				attribute.String("google.project_id", *project),
 			),
 		)

Then pass the updated ctx to subsequent operations that support context propagation (lines 175, 127).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Create span for listing Cloud Run services
_, listSpan := telemetry.StartSpan(ctx, "google.cloudrun.list_services",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", *project),
),
)
// Create span for listing Cloud Run services
ctx, listSpan := telemetry.StartSpan(ctx, "google.cloudrun.list_services",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", *project),
),
)
🤖 Prompt for AI Agents
In cmd/ctrlc/root/sync/google/cloudrun/cloudrun.go around lines 147 to 153, the
code discards the context returned from telemetry.StartSpan (assigning it to
`_`), breaking the telemetry chain; capture the returned ctx (e.g., ctx,
listSpan := telemetry.StartSpan(...)) and defer listSpan.End(), then use that
ctx for subsequent operations that accept context (replace uses on lines 175 and
127 to pass the updated ctx) so child spans are properly linked and context
propagates.

Comment on lines +41 to +48
// Create span for listing projects
_, listSpan := telemetry.StartSpan(ctx, "google.cloudresourcemanager.list_projects",
trace.WithSpanKind(trace.SpanKindClient),
)

// List all projects
resp, err := crm.Projects.List().Do()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use the span-aware context for the List() call.

The StartSpan returns a context that carries the active span, but it's discarded with _. The subsequent crm.Projects.List().Do() call should use this context to properly propagate the trace and establish parent-child span relationships.

Apply this diff:

-// Create span for listing projects
-_, listSpan := telemetry.StartSpan(ctx, "google.cloudresourcemanager.list_projects",
-	trace.WithSpanKind(trace.SpanKindClient),
-)
-
-// List all projects
-resp, err := crm.Projects.List().Do()
+// Create span for listing projects
+listCtx, listSpan := telemetry.StartSpan(ctx, "google.cloudresourcemanager.list_projects",
+	trace.WithSpanKind(trace.SpanKindClient),
+)
+
+// List all projects
+resp, err := crm.Projects.List().Context(listCtx).Do()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Create span for listing projects
_, listSpan := telemetry.StartSpan(ctx, "google.cloudresourcemanager.list_projects",
trace.WithSpanKind(trace.SpanKindClient),
)
// List all projects
resp, err := crm.Projects.List().Do()
// Create span for listing projects
listCtx, listSpan := telemetry.StartSpan(ctx, "google.cloudresourcemanager.list_projects",
trace.WithSpanKind(trace.SpanKindClient),
)
// List all projects
resp, err := crm.Projects.List().Context(listCtx).Do()
🤖 Prompt for AI Agents
In cmd/ctrlc/root/sync/google/projects/projects.go around lines 41 to 48, the
context returned by StartSpan was discarded (`_`) so the List call does not
receive the span-aware context; replace the discard with a named span context
(e.g., spanCtx, listSpan := telemetry.StartSpan(...)), use that span-aware
context when invoking crm.Projects.List().Context(spanCtx).Do() (or otherwise
pass spanCtx into the List call) so the trace is propagated, and ensure you call
listSpan.End() (typically via defer) after creating the span.

Comment on lines +93 to +121
ctx, span := telemetry.StartSpan(ctx, "google.redis.process_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)
defer span.End()

// Create span for ListInstances call
_, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)

parent := fmt.Sprintf("projects/%s/locations/-", project)
instances, err := redisClient.Projects.Locations.Instances.List(parent).Do()

if err != nil {
telemetry.SetSpanError(listSpan, err)
listSpan.End()
telemetry.SetSpanError(span, err)
return nil, fmt.Errorf("failed to list Redis instances: %w", err)
}

telemetry.AddSpanAttribute(listSpan, "google.redis.instances_found", len(instances.Instances))
telemetry.SetSpanSuccess(listSpan)
listSpan.End()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix: List call doesn't use trace context.

The trace instrumentation has a context propagation issue:

  • Line 93 creates a new context from the parent span, shadowing the ctx parameter
  • Line 102 creates a child list span from this updated context (correct)
  • However, line 110's List(parent).Do() doesn't pass any context, so the trace context is not propagated to the Google Cloud API client

This means the Redis List API call won't be associated with the list_instances span in the distributed trace.

Apply this diff to propagate the list span context:

 	// Create span for ListInstances call
-	_, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("google.project_id", project),
 		),
 	)
 
 	parent := fmt.Sprintf("projects/%s/locations/-", project)
-	instances, err := redisClient.Projects.Locations.Instances.List(parent).Do()
+	instances, err := redisClient.Projects.Locations.Instances.List(parent).Context(listCtx).Do()

Note: Verify that the Google Cloud Redis client supports .Context() for trace propagation. If not, this is a limitation of the client library.


🏁 Script executed:

#!/bin/bash
# Check if the Google Redis client's List call supports Context method
ast-grep --pattern $'$_.Projects.Locations.Instances.List($$$).Context($$$)'

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -eux
sed -n '1,200p' cmd/ctrlc/root/sync/google/redis/redis.go

Length of output: 6338


Fix: propagate list span context into Redis List call

The list span is started but its context isn't passed to the API call; capture the child context and call .Context(listCtx) so the List RPC is associated with the span. (pkg.go.dev)

 	// Create span for ListInstances call
-	_, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
+	listCtx, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
 		trace.WithSpanKind(trace.SpanKindClient),
 		trace.WithAttributes(
 			attribute.String("google.project_id", project),
 		),
 	)
 
 	parent := fmt.Sprintf("projects/%s/locations/-", project)
-	instances, err := redisClient.Projects.Locations.Instances.List(parent).Do()
+	instances, err := redisClient.Projects.Locations.Instances.List(parent).Context(listCtx).Do()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ctx, span := telemetry.StartSpan(ctx, "google.redis.process_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)
defer span.End()
// Create span for ListInstances call
_, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)
parent := fmt.Sprintf("projects/%s/locations/-", project)
instances, err := redisClient.Projects.Locations.Instances.List(parent).Do()
if err != nil {
telemetry.SetSpanError(listSpan, err)
listSpan.End()
telemetry.SetSpanError(span, err)
return nil, fmt.Errorf("failed to list Redis instances: %w", err)
}
telemetry.AddSpanAttribute(listSpan, "google.redis.instances_found", len(instances.Instances))
telemetry.SetSpanSuccess(listSpan)
listSpan.End()
ctx, span := telemetry.StartSpan(ctx, "google.redis.process_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)
defer span.End()
// Create span for ListInstances call
listCtx, listSpan := telemetry.StartSpan(ctx, "google.redis.list_instances",
trace.WithSpanKind(trace.SpanKindClient),
trace.WithAttributes(
attribute.String("google.project_id", project),
),
)
parent := fmt.Sprintf("projects/%s/locations/-", project)
instances, err := redisClient.Projects.Locations.Instances.
List(parent).
Context(listCtx).
Do()
if err != nil {
telemetry.SetSpanError(listSpan, err)
listSpan.End()
telemetry.SetSpanError(span, err)
return nil, fmt.Errorf("failed to list Redis instances: %w", err)
}
telemetry.AddSpanAttribute(listSpan, "google.redis.instances_found", len(instances.Instances))
telemetry.SetSpanSuccess(listSpan)
listSpan.End()
🤖 Prompt for AI Agents
In cmd/ctrlc/root/sync/google/redis/redis.go around lines 93 to 121, the code
starts a list span but discards the child context, so the Redis List RPC is not
associated with that span; change the span creation to capture the child context
(e.g. listCtx, listSpan := telemetry.StartSpan(ctx, ...)) and pass that context
into the API call by calling .Context(listCtx) on the List request before .Do(),
then keep the existing error handling and span finish calls.

Comment on lines 314 to 321
```
ctrlc sync aws eks
├─ aws.eks.process_clusters (200ms)
│ ├─ aws.eks.list_clusters (150ms)
│ └─ aws.eks.describe_cluster (50ms)
└─ POST /api/v1/workspaces/{id}/resource-providers/{id}/resources (300ms)
└─ [Ctrlplane backend spans continue the trace...]
``` No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to fenced code block.

The trace visualization example is clear and helpful. However, add a language specifier to the fenced code block for consistency with markdown linting rules.

Apply this diff:

-```
+```text
 ctrlc sync aws eks
 ├─ aws.eks.process_clusters (200ms)

Based on static analysis hints.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

314-314: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/API_TELEMETRY_GUIDE.md around lines 314 to 321 the fenced code block
lacks a language specifier; update the opening fence from ``` to ```text so the
block starts with ```text followed by the existing trace lines (leave the
closing ``` intact) to satisfy markdown linting and static analysis.

Comment on lines +38 to +41
go.opentelemetry.io/otel v1.35.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.30.0
go.opentelemetry.io/otel/sdk v1.35.0
go.opentelemetry.io/otel/trace v1.35.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify version consistency across OpenTelemetry packages.

The OpenTelemetry dependencies show a version mismatch:

  • Core packages (otel, otel/sdk, otel/trace) are at v1.35.0
  • The OTLP trace exporter (otlptrace/otlptracegrpc) is at v1.30.0

Please confirm this version difference is intentional. Typically, OpenTelemetry packages are released together and should use consistent versions to avoid compatibility issues.

Run the following script to check the latest available version of the OTLP trace gRPC exporter:


🏁 Script executed:

#!/bin/bash
# Check the latest version of the OTLP trace gRPC exporter package

# Query the Go module proxy for available versions
curl -s 'https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc/@v/list' | tail -5

echo ""
echo "Currently using version: v1.30.0"
echo "Other OTEL packages are at: v1.35.0"

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Fetch all available versions and determine the highest semver version
VERSIONS=$(curl -s 'https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc/@v/list')
HIGHEST=$(echo "$VERSIONS" | sort -V | tail -n1)
echo "Highest available version: $HIGHEST"

# Check if the code’s specified versions actually exist
for VER in v1.30.0 v1.35.0; do
  HTTP=$(curl -s -o /dev/null -w "%{http_code}" "https://proxy.golang.org/go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc/@v/${VER}.info")
  echo "Version ${VER} exists HTTP status: ${HTTP}"
done

Length of output: 114


Upgrade OTLP exporter version to match core OTEL packages
In go.mod, bump go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc from v1.30.0 to at least v1.35.0 (exporter v1.35.0 and v1.38.0 are available; v1.38.0 is the latest).

🤖 Prompt for AI Agents
In go.mod around lines 38 to 41, the otlptracegrpc exporter is pinned to v1.30.0
which mismatches the core go.opentelemetry.io/otel packages; update the module
version for go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc to
at least v1.35.0 (preferably v1.38.0) so all OTEL packages use compatible
versions, then run go get ./... or go mod tidy to update go.sum and ensure the
build uses the new exporter version.

Comment on lines 82 to 102
if os.Getenv("DATADOG_ENABLED") == "true" {
if env := os.Getenv("DD_ENV"); env != "" {
attrs = append(attrs, attribute.String("deployment.environment", env))
}
if service := os.Getenv("DD_SERVICE"); service != "" {
// Override service name if DD_SERVICE is set
attrs[0] = semconv.ServiceNameKey.String(service)
}
if version := os.Getenv("DD_VERSION"); version != "" {
attrs[1] = semconv.ServiceVersionKey.String(version)
}
if tags := os.Getenv("DD_TAGS"); tags != "" {
// Parse DD_TAGS format: key1:value1,key2:value2
for _, tag := range strings.Split(tags, ",") {
parts := strings.SplitN(strings.TrimSpace(tag), ":", 2)
if len(parts) == 2 {
attrs = append(attrs, attribute.String(parts[0], parts[1]))
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify DD_ENV → deployment.environment mapping.

Line 84 maps DD_ENV to deployment.environment, which is not a standard OTel semantic convention attribute. Datadog typically expects deployment.environment or the standard deployment.environment.name. Confirm this attribute name aligns with your Datadog intake expectations or use the standard semconv attribute if available.


🌐 Web query:

Datadog OpenTelemetry semantic conventions for deployment environment attribute name

💡 Result:

Datadog maps the OpenTelemetry resource attribute deployment.environment.name (preferred) to the Datadog env tag. If you run an older Agent/collector, use deployment.environment instead. [1][2]

Sources:

  • Datadog OpenTelemetry semantic mapping (deployment.environment.name → env). [1]
  • Datadog correlate / unified service tagging docs (notes about using deployment.environment for older Agent / exporter). [2]

Use the standard OTel semconv deployment.environment.name for DD_ENV
Replace the literal "deployment.environment" with semconv.DeploymentEnvironmentNameKey.String(env) (Datadog maps deployment.environment.nameenv); only fall back to the legacy deployment.environment for older agent/collector support.

🤖 Prompt for AI Agents
internal/telemetry/telemetry.go around lines 82 to 102: the code currently sets
the environment attribute using the literal key "deployment.environment";
replace that with the OTel semantic convention by using
semconv.DeploymentEnvironmentNameKey.String(env) when DD_ENV is set, and keep a
fallback to the legacy attribute.String("deployment.environment", env) for older
agents/collectors if necessary (e.g., append the semconv key first, and only
append the legacy key when a compatibility flag or runtime check indicates older
support). Ensure the rest of the DD_* handling remains unchanged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
internal/telemetry/telemetry.go (1)

81-97: Use standard OTel semantic convention for deployment environment.

Line 83 uses the literal "deployment.environment", but Datadog maps the standard OTel semantic convention deployment.environment.name to its env tag. Consider using semconv.DeploymentEnvironmentKey.String(env) or the appropriate semconv constant for better compatibility.

Based on past review comments and learnings about Datadog OTEL semantic conventions.

Apply this diff:

 	if os.Getenv("DATADOG_ENABLED") == "true" {
 		if env := os.Getenv("DD_ENV"); env != "" {
-			attrs = append(attrs, attribute.String("deployment.environment", env))
+			attrs = append(attrs, semconv.DeploymentEnvironmentKey.String(env))
 		}

Note: Verify that semconv.DeploymentEnvironmentKey exists in semconv/v1.20.0. If not available, you may need to upgrade to a newer semconv version or use the string literal "deployment.environment.name".

🧹 Nitpick comments (1)
internal/telemetry/telemetry.go (1)

136-141: Simplify TLS configuration logic.

The TLS/insecure decision uses string prefix matching for localhost detection. Consider using a more robust approach or document the assumption that endpoint format is always host:port without scheme.

Current logic works but could be more explicit:

// Current approach works for "localhost:4317" and "127.0.0.1:4317"
// but would fail for "http://localhost:4317" (though that's invalid for gRPC endpoint)
if strings.HasPrefix(endpoint, "localhost") || strings.HasPrefix(endpoint, "127.0.0.1") {
    opts = append(opts, otlptracegrpc.WithInsecure())
} else {
    opts = append(opts, otlptracegrpc.WithTLSCredentials(credentials.NewClientTLSFromCert(nil, "")))
}

Optional: Add a comment clarifying that the endpoint is expected to be in host:port format without a scheme.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7a21d69 and 7a89a06.

📒 Files selected for processing (2)
  • docs/TELEMETRY.md (1 hunks)
  • internal/telemetry/telemetry.go (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/TELEMETRY.md

203-203: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: docker (linux/amd64)
🔇 Additional comments (3)
internal/telemetry/telemetry.go (3)

28-71: LGTM! Solid telemetry initialization with proper no-op fallback.

The initialization logic correctly handles disabled telemetry and missing endpoints with no-op shutdown functions, preventing CLI operation failures. The setup includes proper resource creation, exporter configuration, batch processing, and global provider registration.


203-224: LGTM! Comprehensive type handling for span attributes.

The AddSpanAttribute function provides excellent type safety and covers all common attribute types, with a sensible fallback for unsupported types.


252-268: LGTM! Elegant telemetry wrapper with automatic state management.

The WithTelemetry generic function provides a clean API for wrapping operations with telemetry, automatically handling span lifecycle and success/error states. The use of generics makes it type-safe and reusable.

# Send traces directly to Datadog without local Agent
export DATADOG_ENABLED=true
export DD_API_KEY=your_datadog_api_key_here
export OTLP_EXPORTER_GRPC_ENDPOINT=api.datadoghq.com:4317 # US region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix environment variable name.

Line 87 uses OTLP_EXPORTER_GRPC_ENDPOINT, but this is inconsistent with the documented DD_OTLP_GRPC_ENDPOINT variable on line 30. Based on the configuration section, this should be either DD_OTLP_GRPC_ENDPOINT (Datadog-specific) or OTEL_EXPORTER_OTLP_ENDPOINT (standard OTLP).

Apply this diff:

-export OTLP_EXPORTER_GRPC_ENDPOINT=api.datadoghq.com:4317  # US region
+export DD_OTLP_GRPC_ENDPOINT=api.datadoghq.com:4317  # US region

Or use the standard variable:

-export OTLP_EXPORTER_GRPC_ENDPOINT=api.datadoghq.com:4317  # US region
+export OTEL_EXPORTER_OTLP_ENDPOINT=api.datadoghq.com:4317  # US region
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export OTLP_EXPORTER_GRPC_ENDPOINT=api.datadoghq.com:4317 # US region
export DD_OTLP_GRPC_ENDPOINT=api.datadoghq.com:4317 # US region
🤖 Prompt for AI Agents
In docs/TELEMETRY.md around line 87, the environment variable is incorrectly
named OTLP_EXPORTER_GRPC_ENDPOINT; replace it with the documented Datadog name
DD_OTLP_GRPC_ENDPOINT=api.datadoghq.com:4317 to match line 30 (or alternatively
use the standard OTEL_EXPORTER_OTLP_ENDPOINT if you prefer the OTLP standard) so
the example and docs are consistent.

Comment on lines +203 to +205
```
grpc: addrConn.createTransport failed to connect to {...}: connection refused
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language identifier to code block.

The empty code block at line 203 should either specify a language (e.g., text or console) or be removed if it serves no purpose.

Based on static analysis hint.

Apply this diff:

-```
+```text
 grpc: addrConn.createTransport failed to connect to {...}: connection refused

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

203-203: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

In docs/TELEMETRY.md around lines 203 to 205, the fenced code block is missing a
language identifier; update the block to include a language (e.g., text) so it
reads as a text/console snippet, or remove the block if unnecessary—specifically
replace the existing empty-fenced block with a fenced block like text followed by the line "grpc: addrConn.createTransport failed to connect to {...}: connection refused" and the closing to ensure correct syntax highlighting.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +115 to +151
func createOTLPExporter(ctx context.Context) (sdktrace.SpanExporter, error) {
var opts []otlptracegrpc.Option

// Check if Datadog is explicitly enabled
if os.Getenv("DATADOG_ENABLED") == "true" {
endpoint := os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
if endpoint == "" {
endpoint = "localhost:4317"
}
opts = append(opts, otlptracegrpc.WithEndpoint(endpoint))

// Add Datadog API key as header if provided
// This is required when sending directly to Datadog intake (not via Agent)
if apiKey := os.Getenv("DD_API_KEY"); apiKey != "" {
opts = append(opts, otlptracegrpc.WithHeaders(map[string]string{
"dd-api-key": apiKey,
}))
}

// Datadog Agent typically doesn't require TLS for localhost
// But Datadog intake endpoints always require TLS
if strings.HasPrefix(endpoint, "localhost") || strings.HasPrefix(endpoint, "127.0.0.1") {
opts = append(opts, otlptracegrpc.WithInsecure())
} else {
// Use TLS for remote endpoints (Datadog intake or remote Agent)
opts = append(opts, otlptracegrpc.WithTLSCredentials(credentials.NewClientTLSFromCert(nil, "")))
}
} else {
// Standard OTLP configuration
// Check if insecure connection is requested
if os.Getenv("OTEL_EXPORTER_OTLP_INSECURE") == "true" {
opts = append(opts, otlptracegrpc.WithInsecure())
}
}

return otlptracegrpc.New(ctx, opts...)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify Datadog endpoint configuration precedence.

The function checks DATADOG_ENABLED first, then reads OTEL_EXPORTER_OTLP_ENDPOINT. However, the documentation (TELEMETRY.md line 30) mentions DD_OTLP_GRPC_ENDPOINT as a Datadog-specific variable. Ensure the implementation aligns with the documented configuration.

Run the following script to check if DD_OTLP_GRPC_ENDPOINT is referenced elsewhere:

If the variable is documented but not used, either:

  1. Update the code to check DD_OTLP_GRPC_ENDPOINT before OTEL_EXPORTER_OTLP_ENDPOINT when Datadog is enabled, or
  2. Update the documentation to remove the reference to DD_OTLP_GRPC_ENDPOINT

🏁 Script executed:

#!/bin/bash
# Description: Search for references to DD_OTLP_GRPC_ENDPOINT in the codebase

rg -n "DD_OTLP_GRPC_ENDPOINT" --type go

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search all files for references to DD_OTLP_GRPC_ENDPOINT
rg -n "DD_OTLP_GRPC_ENDPOINT"

Length of output: 166


Honor DD_OTLP_GRPC_ENDPOINT in Datadog exporter
TELEMETRY.md defines DD_OTLP_GRPC_ENDPOINT as the Datadog-specific OTLP gRPC endpoint, but createOTLPExporter only reads OTEL_EXPORTER_OTLP_ENDPOINT. Add a check for DD_OTLP_GRPC_ENDPOINT (when DATADOG_ENABLED is true) before falling back to OTEL_EXPORTER_OTLP_ENDPOINT, or remove it from the docs.

🤖 Prompt for AI Agents
In internal/telemetry/telemetry.go around lines 115 to 151, the Datadog branch
currently only reads OTEL_EXPORTER_OTLP_ENDPOINT; update it to first check
DD_OTLP_GRPC_ENDPOINT when DATADOG_ENABLED is true, use that if present,
otherwise fall back to OTEL_EXPORTER_OTLP_ENDPOINT, and only then default to
"localhost:4317". Preserve the existing logic that sets headers for DD_API_KEY
and chooses WithInsecure vs WithTLSCredentials based on localhost/127.0.0.1.
Ensure environment variable names and comments are updated to reflect the new
precedence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant