feat: add telemetry instrumentation for Copilot agent flows#7199
feat: add telemetry instrumentation for Copilot agent flows#7199wbreza wants to merge 12 commits intoAzure:mainfrom
Conversation
Add a new CopilotService gRPC service that exposes Copilot agent capabilities to extensions through 8 RPCs: - CreateSession: Create agent sessions with full configuration (model, reasoning effort, system message, mode, headless) - ResumeSession: Resume existing sessions by ID - ListSessions: List available sessions for a working directory - Initialize: Non-interactive first-run configuration - SendMessage: Send prompts and block until completion - GetUsageMetrics: Cumulative usage metrics across calls - GetFileChanges: Track files created/modified/deleted - StopSession: Clean up session resources Core agent changes: - Add WithHeadless option for silent operation via gRPC - Add HeadlessCollector as display replacement for headless mode - Add cumulative usage metrics tracking (GetCumulativeUsage) - Auto-approve permissions in headless mode - Add GetFileChanges() to watch.Watcher interface Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…C API Add azd demo copilot command that implements an interactive chat loop showcasing the full CopilotService gRPC integration: - CreateSession with configurable model, reasoning, system message, mode - Initialize to display resolved model/reasoning configuration - SendMessage in an interactive chat loop with per-turn usage metrics - GetUsageMetrics for cumulative session stats on exit - GetFileChanges for file change summary (created/modified/deleted) - StopSession for cleanup - ListSessions + ResumeSession via --resume flag Flags: --model, --reasoning-effort, --system-message, --mode, --resume Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…early The CopilotService.CreateSession gRPC handler was only creating the CopilotAgent struct without starting the underlying Copilot SDK client. This caused startup errors (auth, binary download, runtime crashes) to surface only on the first SendMessage call, making debugging confusing. Changes: - Add CopilotAgent.EnsureStarted() public method for eager client startup - Call EnsureStarted() in both CreateSession and ResumeSession handlers - Fix double 'failed to create session' error wrapping in ensureSession Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion Rework the CopilotService gRPC API based on design review: API changes (8 RPCs → 6): - Remove CreateSession, ResumeSession — session lifecycle is now implicit - SendMessage handles session creation/resumption inline via optional session_id field. First call creates a session, subsequent calls reuse it. Passing an SDK session_id resumes that session. - SendMessage response now includes per-turn file changes - Initialize is session-independent — just warms up client and resolves config Architecture changes: - CopilotAgent now owns file change accumulation (watcher per SendMessage, changes appended to agent cache) - New PrintSessionMetrics() prints usage + file changes (replaces auto-print) - Remove auto-printing of usage/file changes from SendMessage — callers decide when to display (init calls PrintSessionMetrics at the end) - gRPC service is thin routing layer — maps session IDs to agents Demo extension updated to use new API — SendMessage with inline config, Initialize for warmup, GetUsageMetrics/GetFileChanges at end. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove per-turn usage metrics display — shown only at end via GetUsageMetrics - Remove headless=true — agent output (thinking, tools, responses) should be visible in the demo chat experience Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent changes: - Remove stored sessionCtx from CopilotAgent — use atomic.Pointer for active context, updated per SendMessage call (SDK callbacks read it) - Use context.WithoutCancel in ensureSession so SDK client/session outlive individual gRPC request contexts (fixes CLI process crash on 2nd message) - Improve Stop() — clear cleanup tasks (idempotent), return first error - Skip plugin management in headless mode - Plugin detection fallback: when 'copilot plugin list' reports no plugins but they exist on disk, scan directory directly (CLI version mismatch workaround) - PrintSessionMetrics now prints file changes before usage metrics Demo extension: - Remove per-turn usage display and headless mode - Remove turn number prefixes from prompt - Remove 'Sending to agent' message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e tests Extract Agent and AgentFactory interfaces from the concrete CopilotAgent and CopilotAgentFactory types, enabling unit testing of the gRPC service layer without a real Copilot SDK runtime. Interface changes: - Agent interface: Initialize, SendMessage, SendMessageWithRetry, ListSessions, GetUsage, GetFileChanges, SessionID, Stop - AgentFactory interface: Create(ctx, opts...) (Agent, error) - Rename GetCumulativeUsage → GetUsage, GetAccumulatedFileChanges → GetFileChanges Tests (12 test cases): - SendMessage: new session, reuse session, resume SDK session, empty prompt - GetUsageMetrics: valid session, unknown session, empty session_id - GetFileChanges: valid session with changes - StopSession: valid cleanup, unknown session - Initialize: delegates to agent - SendMessage with file changes in response Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace separate GetUsage/GetFileChanges/PrintSessionMetrics with a unified AgentMetrics type: - AgentMetrics.String() prints file changes then usage metrics - UsageMetrics.Format() renamed to String() (implements fmt.Stringer) - watch.FileChanges named slice type with String() for formatted display - watch.FileChange.String() for individual file change display - Agent interface: GetMetrics() replaces GetUsage/GetFileChanges/PrintSessionMetrics - AgentFactory returns AgentFactory interface, not concrete type - All callers updated: init.go, error middleware, gRPC service, tests - watch.PrintChangedFiles deprecated in favor of GetFileChanges().String() Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… safety Watch tests: - Replace time.Sleep with require.Eventually for deterministic polling - Sort FileChanges by path for stable output Agent (copilot_agent.go): - Fix watcher leak: use per-turn context.WithCancel, cancel in defer - Log NewWatcher errors instead of ignoring - Add sync.Mutex to guard cumulative metrics and file changes - GetMetrics() acquires lock for thread-safe reads gRPC service (copilot_service.go): - Stop newly-created agent if SendMessage fails (prevents process leak) - Log Stop() errors in StopSession instead of silently dropping Proto: - Update headless field comment to not imply a proto3 default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing to re-open with correct base. This PR depends on #7172 and should be reviewed/merged after it. |
There was a problem hiding this comment.
Pull request overview
This PR adds OpenTelemetry instrumentation and new usage attributes around Copilot agent lifecycle and per-message processing, and extends the gRPC surface area to expose Copilot session/message/metrics/file-change APIs to extension consumers.
Changes:
- Add Copilot tracing fields + spans/usage attributes for initialization, session lifecycle, per-message usage, and consent counts.
- Add gRPC CopilotService protobuf + generated clients/servers and wire it into the azd gRPC server/client.
- Add headless-mode support (collector + permission auto-approval), plus file-change tracking surfaced via agent metrics and gRPC.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/watch/watch.go | Adds GetFileChanges() + typed FileChange/FileChanges formatting used by agent metrics and gRPC responses. |
| cli/azd/pkg/watch/watch_test.go | Adds tests for file change tracking via watcher. |
| cli/azd/pkg/azdext/copilot.pb.go | Generated protobuf types for CopilotService messages (sessions, message, usage, file changes). |
| cli/azd/pkg/azdext/copilot_grpc.pb.go | Generated gRPC stubs for CopilotService. |
| cli/azd/pkg/azdext/azd_client.go | Adds AzdClient.Copilot() accessor for the new gRPC service. |
| cli/azd/internal/tracing/fields/fields.go | Adds new AttributeKeys for copilot session/init/message/mode/consent usage fields. |
| cli/azd/internal/grpcserver/server.go | Registers CopilotService with the gRPC server. |
| cli/azd/internal/grpcserver/server_test.go | Updates server tests to include CopilotService server wiring. |
| cli/azd/internal/grpcserver/prompt_service_test.go | Updates prompt service test server setup to include CopilotService. |
| cli/azd/internal/grpcserver/copilot_service.go | Implements CopilotService gRPC server routing to agents, plus usage/file-change conversion. |
| cli/azd/internal/grpcserver/copilot_service_test.go | Adds unit tests for CopilotService session/message/metrics/file-change/stop behaviors. |
| cli/azd/internal/agent/types.go | Adds Agent/AgentFactory interfaces, AgentMetrics, usage formatting via String(), and per-turn file changes. |
| cli/azd/internal/agent/types_test.go | Updates tests for renamed usage formatting (String() instead of Format()). |
| cli/azd/internal/agent/headless_collector.go | Adds headless collector for SDK events and usage metrics aggregation. |
| cli/azd/internal/agent/headless_collector_test.go | Adds unit tests for headless collector usage accumulation and idle signaling. |
| cli/azd/internal/agent/copilot_agent_factory.go | Returns AgentFactory interface and Create() now returns Agent. |
| cli/azd/internal/agent/copilot_agent.go | Adds spans/attributes, headless send path, cumulative metrics + file changes, consent counters, and session-cancellation detachment. |
| cli/azd/internal/agent/copilot/cli.go | Improves plugin detection with a CLI-output fallback to scanning plugin directories. |
| cli/azd/internal/agent/consent/workflow_consent.go | Records consent-scope selection as usage attributes. |
| cli/azd/grpc/proto/copilot.proto | Adds CopilotService protobuf definition (Initialize, ListSessions, SendMessage, metrics, file changes, StopSession). |
| cli/azd/extensions/microsoft.azd.demo/internal/cmd/root.go | Registers new demo copilot command. |
| cli/azd/extensions/microsoft.azd.demo/internal/cmd/copilot.go | Adds demo extension command that exercises CopilotService gRPC API in a chat loop. |
| cli/azd/extensions/microsoft.azd.demo/extension.yaml | Documents the new demo copilot command usage. |
| cli/azd/extensions/azure.ai.agents/version.txt | Bumps extension version to 0.1.16-preview. |
| cli/azd/extensions/azure.ai.agents/extension.yaml | Syncs extension.yaml version to 0.1.16-preview. |
| cli/azd/extensions/azure.ai.agents/CHANGELOG.md | Adds 0.1.16-preview changelog entry. |
| cli/azd/cmd/middleware/error.go | Uses UsageMetrics.String() when displaying agent usage in error middleware; updates factory type to interface. |
| cli/azd/cmd/init.go | Renames InitMethod to copilot, adds environment init method, and records aggregate copilot metrics after session completion. |
| cli/azd/cmd/container.go | Wires CopilotService into IoC container for gRPC server. |
* Initial plan * Prepare azure.ai.agents 0.1.16-preview patch release (Azure#7141, Azure#7175, Azure#7181) Co-authored-by: rajeshkamal5050 <11532743+rajeshkamal5050@users.noreply.github.com> * Remove registry.json changes - to be updated after builds are generated Co-authored-by: rajeshkamal5050 <11532743+rajeshkamal5050@users.noreply.github.com> * Add Breaking Changes section in CHANGELOG.md and move Azure#7181 under it Co-authored-by: rajeshkamal5050 <11532743+rajeshkamal5050@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: rajeshkamal5050 <11532743+rajeshkamal5050@users.noreply.github.com>
d1dd40a to
75a1163
Compare
Add OpenTelemetry spans and usage attributes to track Copilot agent session lifecycle, initialization prompts, message usage, and consent decisions. All instrumentation is in the core agent packages so it works for both direct CLI and gRPC extension framework consumers. Changes: - Define 16 new AttributeKey fields in internal/tracing/fields/fields.go covering session (id, isNew, messageCount), init prompts (isFirstRun, reasoningEffort, model, consentScope), per-message metrics (model, tokens, billingRate, premiumRequests, durationMs), and consent counts (approvedCount, deniedCount) - Add copilot.initialize span in CopilotAgent.Initialize() tracking reasoning level, model selection, and isFirstRun - Add copilot.message span in CopilotAgent.SendMessage() tracking per-message usage metrics and cumulative message count - Add copilot.session span in ensureSession() tracking session creation vs resumption with hashed session ID - Track consent approved/denied counts in permission handler and record as usage attributes on agent Stop() - Track workflow consent scope selection in PromptWorkflowConsent() - Rename InitMethod from 'agent' to 'copilot' in cmd/init.go - Add InitMethod='environment' for previously untracked init branch - Record aggregate copilot metrics as usage attributes in initAppWithAgent Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
75a1163 to
d86b808
Compare
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Summary
Adds OpenTelemetry spans and usage attributes to track Copilot agent session lifecycle, initialization prompts, message usage, and consent decisions. All instrumentation lives in the core agent packages (internal/agent/) so telemetry works for both direct CLI integration and gRPC extension framework consumers.
Built on top of #7172.
Telemetry Fields Added (16 new AttributeKeys)
Session
Initialization Prompts
Per-Message Metrics
Consent (lightweight)
Spans Added
Init Telemetry Fixes
Testing