fix: rebuild cobra command tree on each workflow re-execution#7171
fix: rebuild cobra command tree on each workflow re-execution#7171wbreza wants to merge 5 commits intoAzure:mainfrom
Conversation
The workflowCmdAdapter previously cached a single rootCmd cobra command tree as a singleton. When the ErrorMiddleware retry loop or gRPC WorkflowService re-executed commands (e.g. after Copilot agent fixes a provision failure), the stale tree with cancelled contexts caused 'context cancelled' exceptions. Changed workflowCmdAdapter to use a command factory that calls NewRootCmd() on each ExecuteContext() call, producing a fresh cobra command tree with no stale state. This follows the same proven pattern used in auto_install.go. Fixes Azure#6530 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds NewRootCmdPreservesMiddlewareChain test that creates two real command trees via NewRootCmd() with the same container, verifying both have the full command set (version, provision, deploy) and are distinct instances. This confirms the workflowCmdAdapter factory approach preserves all middleware (debug, ux, telemetry, error, loginGuard, hooks, extensions) since they are registered inside NewRootCmd() itself, not passed externally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The AzdCommandRunner interface previously split arg-setting and execution into separate SetArgs/ExecuteContext calls. Since the workflowCmdAdapter is a singleton, concurrent gRPC workflow requests could race on the shared args field. Merged both into a single ExecuteContext(ctx, args) call, eliminating mutable state on the adapter and making it inherently thread-safe. Updated all callers: workflow runner, init action, add action, and workflow service tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses workflow retries re-executing through a cached Cobra rootCmd tree (via workflowCmdAdapter), which can retain stale state and lead to context canceled failures on subsequent workflow executions (notably in retry loops and gRPC workflow execution paths).
Changes:
- Updates the workflow command runner interface to accept args per execution (
ExecuteContext(ctx, args)), eliminating the mutableSetArgsstep. - Refactors
workflowCmdAdapterto build a fresh Cobra command tree for every execution via a factory (NewRootCmdper run). - Updates/extends unit tests to validate per-execution command-tree freshness and correct arg propagation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/workflow/runner.go | Switches workflow step execution to pass args directly into the runner per step. |
| cli/azd/internal/grpcserver/workflow_service_test.go | Updates mocks/assertions for the new runner interface shape (ExecuteContext(ctx, args)). |
| cli/azd/internal/cmd/add/add.go | Updates follow-up command execution to use the new runner method signature. |
| cli/azd/cmd/init.go | Updates azd up invocation during init to use the new runner method signature. |
| cli/azd/cmd/container_test.go | Refactors adapter tests for factory-based command creation and adds freshness/middleware-preservation coverage. |
| cli/azd/cmd/container.go | Implements factory-based workflowCmdAdapter and registers it in the container. |
When rebuilding the cobra command tree per workflow step, global flags (like --trace-log-file, --debug, --cwd) from the original process invocation were lost because the fresh tree had never parsed them. This caused the Test_CLI_Telemetry_NestedCommands functional test to fail — it expects nested commands to report the parent's global flags in telemetry spans. Fix: extractGlobalArgs() parses os.Args against the global flag set at adapter construction time and captures explicitly-set flags. These are merged with step args on each ExecuteContext call, ensuring the fresh command tree sees and parses all global flags. Also addresses PR review feedback: - Always call SetArgs (never fall back to os.Args parsing) - Use correct Go spelling 'context canceled' in comments Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed both review comments in 42bb45a:
|
When both step args and globalArgs are nil, append returns nil. Cobra treats SetArgs(nil) as 'use os.Args[1:]', which would cause workflow steps to re-execute the parent process arguments instead of the intended step command. Ensure mergedArgs is always a non-nil slice. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Problem
When
azd upruns and a step (e.g.,provision) fails, the Copilot agent troubleshoots the issue and the user retries the command. The retry re-executes through the same cachedrootCmdsingleton viaworkflowCmdAdapter, which accumulates stale context/command state from previous executions. This causes "context cancelled" exceptions on the retry.Two re-execution paths trigger this:
ErrorMiddleware retry loop (
cmd/middleware/error.go:289): callsnext(ctx)which re-runs the action ->workflowRunner.Run()->rootCmd.ExecuteContext()on the same cached cobra command tree.gRPC WorkflowService (
internal/grpcserver/workflow_service.go:41): receives workflow requests from extensions/MCP and callsrunner.Run()->rootCmd.ExecuteContext()on the same cached tree.Root cause chain
\
workflowCmdAdapter (singleton) -> holds cached rootCmd (named instance "root-cmd")
|
workflow.Runner.Run() calls stepCtx, cancel := context.WithCancel(ctx)
|
rootCmd.ExecuteContext(stepCtx) reuses the same command tree
|
cobra_builder RunE creates cmdContainer scope, registers ctx, resolves singletons
|
cancel() called after step -> stepCtx cancelled -> stale references persist
|
Next execution: cobra command state still references cancelled context
-> "context cancelled" exception
\\
Example scenario
azd upprovisionstep failsnext(ctx)re-runs the action through the same cachedrootCmdSolution
Changed
workflowCmdAdapterto use a command factory that callsNewRootCmd()on eachExecuteContext()call, producing a fresh cobra command tree with no stale state.This follows the same proven pattern already used in
cmd/auto_install.go(lines 402, 417, 484) whereNewRootCmd()is called multiple times with the same IoC container. The container handles re-registration gracefully by overwriting.Key changes
cmd/container.go:workflowCmdAdapterstruct now holds anewCommandfactory (func() *cobra.Command) instead of a cached*cobra.CommandNewRootCmd(false, nil, container)to build a fresh tree per executionSetArgsstores args locally;ExecuteContextapplies them to the fresh treecmd/container_test.go:FreshCommandTreeOnEachExecutiontest verifying distinct tree instances per callWhy this works
RunEclosures (incobra_builder.go) create freshcmdContainerscopes with the new contextclearCommandContextmitigation (root.go:565) remains as defense-in-depthRelates to #6530