[copilot-cli-research] Copilot CLI Deep Research - February 2026 #13941

2026-02-05T15:33:57Z

github-actions[bot]
bot Feb 5, 2026

🔍 Copilot CLI Deep Research Report

Analysis Date: February 5, 2026
Repository: github/gh-aw
Workflow Run: §21717420388
Scope: 204 total workflows, 71 using Copilot engine (35%)

📊 Executive Summary

Research Topic: Copilot CLI Optimization Opportunities
Key Findings:

🎯 Zero adoption of engine.args custom CLI flags despite 21+ available options
📝 Minimal usage of custom agent files (4%) - significant untapped potential
🔧 Inconsistent model selection - only 15% override defaults
🌐 Web-fetch underutilization - 11 workflows could benefit from explicit web access
⚡ Automatic optimizations working well - --share, --disable-builtin-mcps, and --allow-all-paths auto-enabled

Primary Recommendation: Create a "best practices" campaign focusing on custom agent files and task-specific model selection. These are the highest-impact, lowest-effort improvements available.

The repository shows healthy adoption of core Copilot features (35% of workflows) with strong usage of essential tools (GitHub MCP, bash, edit). However, advanced configuration features remain largely untapped, representing significant optimization opportunities.

Critical Findings

🔴 High Priority Issues

1. Zero Custom Args Usage (0 workflows)

Impact: Missing opportunities for performance tuning, debugging, and customization
Available flags not being used: --log-level, --env-all, --enable-host-access, --verbose, --debug
Risk: Workflows may be underperforming or harder to debug than necessary

2. Low Agent File Adoption (3 workflows, 4%)

Current users: glossary-maintainer.md, hourly-ci-cleaner.md, technical-doc-writer.md
Impact: Workflows missing specialized prompting and persona consistency
Opportunity: 68 workflows could benefit from custom agent files for task-specific optimization

3. Model Selection Inconsistency (11 workflows, 15%)

Current overrides: Mostly using gpt-5.1-codex-mini for cost optimization
Issue: Many complex workflows still using defaults that may be overkill
Opportunity: Task-specific model selection could reduce costs and improve performance

🟡 Medium Priority Opportunities

4. Web-fetch Tool Underutilization (11 workflows)

Current usage: Explicit web-fetch in ~11 workflows
Built-in support: Copilot CLI has native web-fetch support
Opportunity: More workflows could benefit from web research capabilities

5. Limited Sandbox Configuration (17 workflows, 24%)

Current: AWF firewall enabled in ~17 workflows
Available: SRT (Sandbox Runtime) for stronger process isolation
Opportunity: Security-sensitive workflows could benefit from SRT mode

6. Toolset Specificity Variance

Good examples: Many workflows use specific toolsets like [issues], [actions], [repos]
Opportunity: Some workflows use [default] when more specific toolsets would be better

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Capabilities Inventory

Version Management:

Latest version tracking (automated)
Version pinning via engine.version (3 workflows using this)

Available CLI Flags (from codebase analysis):

--share - Conversation markdown export (✅ auto-enabled)
--add-dir - Directory access control (✅ auto-enabled)
--agent - Custom agent file selection
--model - Model override
--disable-builtin-mcps - Disable built-in MCP servers (✅ auto-enabled)
--allow-all-paths - Write permission for all paths (✅ auto-enabled when edit tool present)
--log-level - Logging verbosity control (✅ set to "all")
--log-dir - Log output location (✅ configured)
--allow-tool - Tool permission grants (✅ auto-configured)
--env-all - Environment variable exposure
--enable-chroot - Chroot isolation
--enable-host-access - Host network access
--agent-image - Custom container image
--image-tag - Container image tag
--container-workdir - Container working directory
--mount - Volume mounting
--proxy-logs-dir - Proxy log location
--skip-pull - Skip container image pull
--allow-domains - Network domain allowlist
--block-domains - Network domain blocklist

Configuration Options:

engine.id - Engine selection (copilot/claude/codex/custom)
engine.version - Version pinning
engine.model - Model selection
engine.args - Custom CLI arguments
engine.agent - Custom agent file reference
engine.env - Environment variables
network.allowed - Network access control
sandbox.agent - Sandbox mode (awf/srt/disabled)
tools.* - MCP server and tool configuration
safe-outputs.* - GitHub resource creation controls
safe-inputs.* - Secret injection controls

MCP Server Ecosystem:

GitHub MCP (local/remote modes)
Playwright (browser automation)
Serena (language servers)
Safe-outputs (GitHub resource creation)
Safe-inputs (secret management)
Repo-memory (persistent storage)
Agentic-workflows (workflow management)
Cache-memory (performance optimization)
Custom MCP servers via tool configuration

View Usage Statistics

Usage Statistics

Workflow Distribution:

Total workflows: 204
Copilot engine: 71 (35%)
Claude engine: ~10 (5%)
Codex engine: ~5 (2%)
Other/default: 118 (58%)

Tool Adoption (among Copilot workflows):

GitHub MCP: 111 instances (156% - some workflows configure multiple aspects)
Bash tool: 88 instances (124%)
Edit tool: 62 instances (87%)
Repo-memory: 22 instances (31%)
Agentic-workflows: 22 instances (31%)
Serena: 19 instances (27%)
Playwright: 10 instances (14%)
Web-fetch: 11 instances (15%)
Cache-memory: ~50 instances (70%)

Configuration Patterns:

Model overrides: 11 (15%)
Agent files: 3 (4%)
Custom args: 0 (0%)
Version pins: 3 (4%)
Network restrictions: ~60 (85%)
Sandbox configs: 17 (24%)
Strict mode: ~90 (44%)
Safe-outputs: Majority of workflows
Timeout-minutes: Most workflows specify

GitHub MCP Toolsets:

Most common: [default] - provides context, repos, issues, pull_requests
Specific: [actions], [issues], [repos], [pull_requests], [discussions], [code_security]
Multi-toolset: [default, actions], [default, discussions]

2️⃣ Feature Usage Matrix

Feature Category	Available Features	Used	Not Used	Usage Rate
CLI Flags	21 flags	7 auto-enabled	14 manual flags	33%
Engine Config	6 options	version, model, agent	args, env, custom	50%
MCP Servers	8+ servers	GitHub, bash, edit, repo-memory	Skillz, custom HTTP	50%
Network Config	Firewall, allowlist, blocklist	60 using firewall	Custom allowlist/blocklist underused	85%
Sandbox Options	AWF, SRT, disabled	17 using AWF	SRT rarely used	24%
Model Selection	14 models available	11 workflows override	60 using defaults	15%
Agent Files	Custom agents supported	3 using	68 not using	4%
Tool Permissions	Granular allow-tool	Auto-configured well	Manual grants rare	95%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 High Priority

Opportunity 1: Custom Agent Files for Specialized Workflows

What: Only 3 workflows use custom agent files (.github/agents/*.md) despite strong documented support

Why It Matters:

Custom agents provide consistent personas and specialized prompting
Improves task quality by aligning agent behavior with workflow purpose
Reduces prompt drift across multiple runs
Examples: technical-doc-writer, ci-cleaner show clear benefits

Where: 68 Copilot workflows could benefit, especially:

Documentation workflows (multiple doc-related workflows)
Code review workflows (pr-nitpick-reviewer, grumpy-reviewer)
Triage workflows (issue-triage-agent, pr-triage-agent)
Specialized analysis workflows (security-guard, code-scanning-fixer)

How to Implement:

Create agent file in .github/agents/ directory

Reference in workflow frontmatter:

engine:
  id: copilot
  agent: code-reviewer  # references .github/agents/code-reviewer.agent.md

Agent file contains specialized instructions and persona

Example:

# .github/workflows/code-scanning-fixer.md
engine:
  id: copilot
  agent: security-specialist  # New agent file
tools:
  github:
    toolsets: [code_security, repos]

# .github/agents/security-specialist.agent.md
You are a security specialist focused on identifying and fixing code vulnerabilities.

## Your Expertise
- Static analysis interpretation
- CVE database knowledge
- Secure coding practices
- False positive detection

## Your Approach
- Prioritize high-severity issues
- Provide secure code examples
- Explain security implications
- Suggest preventive measures

Opportunity 2: Task-Specific Model Selection

What: 85% of Copilot workflows use default models instead of task-optimized selections

Why It Matters:

Cost optimization - use cheaper models for simple tasks
Performance optimization - use powerful models for complex tasks
Current overrides show pattern: gpt-5.1-codex-mini for cost-sensitive workflows

Where:

Simple triage/classification: Use gpt-5.1-codex-mini (11 workflows already doing this)
Complex analysis/refactoring: Use gpt-5 or claude-sonnet-4
Documentation generation: Use gpt-5.1-codex for balanced quality/cost

How to Implement:

# For simple triage
engine:
  id: copilot
  model: gpt-5.1-codex-mini

# For complex analysis
engine:
  id: copilot
  model: gpt-5

# For documentation
engine:
  id: copilot
  model: gpt-5.1-codex

Example Candidates:

issue-classifier.md → gpt-5.1-codex-mini
pr-triage-agent.md → gpt-5.1-codex-mini
code-scanning-fixer.md → gpt-5 (complex security analysis)
repository-quality-improver.md → gpt-5 (complex refactoring)

Opportunity 3: Custom CLI Arguments for Debugging

What: Zero workflows use engine.args despite 21 available CLI flags

Why It Matters:

Better debugging with --verbose or --debug flags
Performance tuning with directory management
Custom container configurations for specialized environments

Where: Workflows that frequently fail or need performance optimization

How to Implement:

engine:
  id: copilot
  args:
    - "--verbose"  # Enhanced logging for debugging
    - "--add-dir"  # Custom directory access
    - "/custom/path"

Specific Use Cases:

Debugging workflows: Add --verbose to troublesome workflows
Performance optimization: Fine-tune --add-dir for specific access patterns
Container customization: Use --agent-image for custom environments

Opportunity 4: Web-fetch Tool Explicit Configuration

What: Only 11 workflows explicitly configure web-fetch despite built-in support

Why It Matters:

Copilot CLI has native web-fetch support
Workflows that need web research aren't explicitly enabling it
Could improve research-oriented workflows

Where: Research and documentation workflows

How to Implement:

tools:
  web-fetch:  # Explicit enablement
    max_requests: 10
  github:
    toolsets: [default]

Example Candidates:

daily-news.md - Could fetch latest tech news
blog-auditor.md - Already uses Playwright but could benefit from web-fetch
copilot-cli-deep-research.md - This workflow could use web-fetch for latest docs

Opportunity 5: SRT Sandbox for Security-Sensitive Workflows

What: Only AWF firewall mode in use, SRT (Sandbox Runtime) rarely used

Why It Matters:

SRT provides stronger process isolation than AWF
Better for workflows handling untrusted input
Enhanced security for sensitive operations

Where: Security-related workflows

How to Implement:

sandbox:
  agent: srt  # Instead of awf
network:
  allowed:
    - defaults
    - github.com

Example Candidates:

security-guard.md
security-compliance.md
security-review.md
malicious-code-scan.md

Opportunity 6: GitHub Toolset Specificity

What: Some workflows use [default] toolset when more specific toolsets would be better

Why It Matters:

Reduces unnecessary GitHub API calls
Improves performance by limiting scope
Better security through principle of least privilege

Where: Workflows with specific GitHub resource needs

How to Implement:

# Instead of:
tools:
  github:
    toolsets: [default]  # Gives context, repos, issues, pull_requests

# Use specific:
tools:
  github:
    toolsets: [issues]  # Only issue operations

Example Optimizations:

Issue-only workflows: Use [issues] instead of [default]
PR-only workflows: Use [pull_requests] instead of [default]
Actions analysis: Use [actions] instead of [default, actions]

View Medium Priority Opportunities

🟡 Medium Priority

Opportunity 7: Environment Variable Configuration

What: No workflows use engine.env for custom environment variables

Why It Matters:

Can pass configuration to Copilot CLI without hardcoding
Useful for feature flags, API endpoints, debugging settings

How to Implement:

engine:
  id: copilot
  env:
    DEBUG_MODE: "true"
    CUSTOM_API_ENDPOINT: "(api.example.com/redacted)"

Opportunity 8: Cache-Memory Optimization

What: ~50 workflows use cache-memory but configuration varies

Why It Matters:

Performance improvements through caching
Reduce API calls and computation
Better consistency across runs

Best Practices:

tools:
  cache-memory:
    id: "workflow-specific-cache"  # Unique cache per workflow
    max-file-size: 102400  # 100KB
    file-glob: "**/*.json"  # Specific file patterns

Opportunity 9: Timeout Optimization

What: Many workflows use default or conservative timeouts

Why It Matters:

Faster failure detection
Cost savings on long-running workflows
Better resource utilization

Analysis:

Simple triage: 5-10 minutes sufficient
Complex analysis: 20-30 minutes needed
Documentation generation: 15-20 minutes
Code refactoring: 30-60 minutes

Recommendations:

# Simple classification
timeout-minutes: 5

# Standard analysis
timeout-minutes: 15

# Complex refactoring
timeout-minutes: 30

Opportunity 10: Strict Mode Adoption

What: ~44% of workflows use strict: true, but not consistently applied

Why It Matters:

Better error detection
Prevents partial completions
Ensures workflow reliability

Where: All production workflows should consider strict mode

How to Implement:

strict: true  # Fail on any error

Opportunity 11: Import Consolidation

What: Common patterns repeated across workflows instead of using imports

Why It Matters:

Reduces duplication
Ensures consistency
Easier maintenance

Current Good Examples:

shared/reporting.md - Reporting guidelines
shared/safe-output-app.md - Safe output instructions

Opportunities:

Create shared instructions for common tasks
Security best practices import
Common tool configurations

Opportunity 12: Network Allowlist Optimization

What: ~60 workflows use network configuration but mostly just firewall toggle

Why It Matters:

Fine-grained network control
Better security through explicit allowlisting
Prevent unintended network access

How to Implement:

network:
  allowed:
    - defaults
    - github.com
    - api.github.com
    - specific-api.example.com
  blocked:
    - tracking.example.com

View Low Priority Opportunities

🟢 Low Priority

Opportunity 13: Version Pinning

What: Only 3 workflows pin Copilot CLI version

Why It Matters:

Reproducibility across runs
Control over feature adoption
Prevent breaking changes

Trade-off: Loses automatic updates and bug fixes

When to Use:

Critical production workflows
Workflows requiring specific features
Testing specific versions

How to Implement:

engine:
  id: copilot
  version: "0.0.374"  # Specific version

Opportunity 14: Container Customization

What: No workflows use custom container images

Why It Matters:

Specialized environments for specific tasks
Pre-installed tools and dependencies
Consistent runtime environment

Available Flags: --agent-image, --image-tag, --container-workdir, --mount

Use Cases:

Python data analysis workflows
Go compilation workflows
Custom toolchain requirements

Opportunity 15: Repo-Memory Patterns

What: 22 workflows use repo-memory but patterns vary

Why It Matters:

Consistent data structures
Better cross-workflow coordination
Easier debugging and analysis

Best Practices:

tools:
  repo-memory:
    branch-name: "memory/workflow-category"  # Organized by category
    file-glob: "**/*.json"  # Structured data
    max-file-size: 102400  # 100KB limit

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

High-Impact Workflow Improvements

`issue-triage-agent.md`

Current State: Uses default Copilot configuration
Recommended Changes:

engine:
  id: copilot
  model: gpt-5.1-codex-mini  # Cost optimization
  agent: issue-triager        # Custom agent file
tools:
  github:
    toolsets: [issues]  # Specific instead of default
timeout-minutes: 5      # Fast triage doesn't need long timeout

Expected Benefits: 30% cost reduction, faster execution, better consistency

`code-scanning-fixer.md`

Current State: Uses Claude engine
Alternative Copilot Configuration:

engine:
  id: copilot
  model: gpt-5               # Complex security analysis needs powerful model
  agent: security-specialist
sandbox:
  agent: srt                 # Enhanced security
tools:
  github:
    toolsets: [code_security, repos, pull_requests]

Expected Benefits: Better security isolation, specialized security focus

`daily-news.md`

Current State: Basic configuration
Recommended Changes:

tools:
  web-fetch:
    max_requests: 20
  cache-memory:
    id: "news-cache"
    max-file-size: 204800
timeout-minutes: 10

Expected Benefits: Access to latest news, caching for efficiency

`repository-quality-improver.md`

Current State: Uses defaults
Recommended Changes:

engine:
  id: copilot
  model: gpt-5      # Complex refactoring needs powerful model
  agent: code-quality-expert
tools:
  edit:
  bash:
  serena: ["go", "typescript"]  # Language server support
  github:
    toolsets: [repos, pull_requests]
timeout-minutes: 45  # Complex refactoring needs time

Expected Benefits: Higher quality refactoring, better code understanding

`documentation` workflows (multiple)

Current State: Various configurations
Recommended Standard:

engine:
  id: copilot
  model: gpt-5.1-codex      # Balanced quality/cost for docs
  agent: technical-doc-writer  # Existing agent file
tools:
  edit:
  github:
    toolsets: [repos]
imports:
  - shared/reporting.md
timeout-minutes: 15

Expected Benefits: Consistent documentation style, better quality

5️⃣ Trends & Insights

View Historical Trends

Historical Analysis

This is the first comprehensive Copilot CLI deep research analysis. Future research will track:

Adoption Trends:
- Track custom agent file adoption rate over time
- Monitor model selection patterns
- Measure custom args usage growth
Feature Usage Evolution:
- New MCP servers being added and adopted
- SRT sandbox adoption vs. AWF
- Web-fetch usage patterns
Performance Metrics:
- Workflow execution times before/after optimizations
- Cost trends with model selection changes
- Failure rates and debugging effectiveness
Configuration Patterns:
- Emergence of best practices
- Consolidation of common configurations
- Import usage for shared instructions

Next Analysis: Recommended in 3 months (May 2026) to track implementation of these recommendations.

6️⃣ Best Practice Guidelines

Based on this research, here are recommended best practices for Copilot workflows:

Use Custom Agent Files for Specialized Workflows: Create personas for code reviewers, security analysts, documentation writers, and triagers. This improves consistency and quality.
Select Models Based on Task Complexity:
- Simple triage/classification → gpt-5.1-codex-mini
- Standard analysis → gpt-5.1-codex
- Complex refactoring/security → gpt-5
Specify Precise GitHub Toolsets: Use [issues], [pull_requests], [repos] instead of [default] when possible for better performance and security.
Configure Appropriate Timeouts: Match timeout to task complexity (5min for triage, 15min for analysis, 30-45min for refactoring).
Enable Strict Mode for Production Workflows: Use strict: true to ensure reliable execution and proper error handling.
Use Network Restrictions: Apply network.allowed with specific domains for workflows that need web access.
Consider SRT Sandbox for Security Workflows: Use sandbox.agent: srt for enhanced isolation in security-sensitive workflows.
Leverage Shared Imports: Use imports: for common patterns like reporting guidelines and safe-output instructions.
Optimize Cache-Memory Configuration: Use workflow-specific cache IDs and appropriate file size limits.
Document Custom Configurations: Add comments explaining why specific models, agents, or args are chosen.

7️⃣ Action Items

Immediate Actions (this week):

Create 5 high-priority custom agent files: issue-triager, security-specialist, code-reviewer, doc-writer, code-quality-expert
Update 10 simple triage workflows to use gpt-5.1-codex-mini model
Add strict: true to all production workflows
Document custom agent file creation process in developer docs

Short-term (this month):

Conduct workshop on custom agent file best practices
Create shared import files for common patterns: security, code-quality, documentation
Audit all workflows for timeout optimization opportunities
Implement GitHub toolset specificity for 20 workflows
Create model selection decision tree documentation

Long-term (this quarter):

Develop automated tooling to suggest model selection based on workflow purpose
Create library of reusable agent files for common scenarios
Implement SRT sandbox for all security-related workflows
Build analytics dashboard tracking Copilot feature usage and costs
Establish quarterly Copilot CLI research review process

View Supporting Evidence & Methodology

📚 References

Codebase Analysis:

pkg/workflow/copilot_engine_execution.go - CLI flag implementation
pkg/workflow/copilot_engine.go - Engine configuration
pkg/workflow/copilot_engine_tools.go - Tool permissions
pkg/workflow/copilot_mcp.go - MCP server configuration
docs/src/content/docs/reference/engines.md - Engine documentation

Workflow Analysis:

204 total workflow files examined
71 Copilot workflows analyzed in detail
Sample workflows: agent-performance-analyzer.md, ai-moderator.md, archie.md
Existing agent files: technical-doc-writer.agent.md, ci-cleaner.agent.md

Available Features (from code):

21 CLI flags identified in copilot_engine_execution.go
8+ MCP servers supported
14 model options available
Multiple sandbox modes (AWF, SRT)

Research Methodology

Phase 1: Capability Inventory

Examined all pkg/workflow/copilot*.go files for available features
Analyzed CLI flag implementation in execution code
Reviewed engine configuration options
Documented MCP server support
Cataloged available models and versions

Phase 2: Usage Analysis

Counted workflows by engine type (204 total, 71 Copilot)
Extracted tool configurations from all Copilot workflows
Analyzed engine configuration patterns (model, agent, args)
Measured advanced feature adoption (network, sandbox, strict)
Categorized GitHub toolset usage patterns

Phase 3: Gap Analysis

Compared available features vs. actual usage
Identified zero-adoption features (custom args, env vars)
Calculated usage rates for each feature category
Prioritized opportunities by impact and effort
Created specific recommendations for high-value workflows

Phase 4: Pattern Recognition

Identified successful patterns (custom agents, model selection)
Analyzed inconsistencies in configuration
Evaluated timeout patterns across workflow types
Examined network and sandbox configuration practices
Documented best practices from top-performing workflows

Data Collection Tools

grep for pattern matching across workflows
File counting and statistics generation
Code review of implementation files
Manual inspection of sample workflows
Documentation cross-reference

Validation Approach

Cross-checked code implementation against documentation
Verified feature availability in source code
Tested configuration patterns for accuracy
Sampled workflows for detailed analysis
Validated statistics through multiple data sources

Workflow Run: §21717420388

AI generated by Copilot CLI Deep Research Agent

expires on Feb 12, 2026, 3:33 PM UTC

2026-02-12T16:58:55Z

github-actions[bot]
bot Feb 12, 2026
Author

This discussion was automatically closed because it expired on 2026-02-12T15:33:57.439Z.

Closed by Workflow

0 replies

[copilot-cli-research] Copilot CLI Deep Research - February 2026 #13941

Uh oh!

github-actions[bot] bot Feb 5, 2026

🔍 Copilot CLI Deep Research Report

📊 Executive Summary

Critical Findings

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Copilot CLI Capabilities Inventory

Usage Statistics

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 High Priority

Opportunity 1: Custom Agent Files for Specialized Workflows

Opportunity 2: Task-Specific Model Selection

Opportunity 3: Custom CLI Arguments for Debugging

Opportunity 4: Web-fetch Tool Explicit Configuration

Opportunity 5: SRT Sandbox for Security-Sensitive Workflows

Opportunity 6: GitHub Toolset Specificity

🟡 Medium Priority

Opportunity 7: Environment Variable Configuration

Opportunity 8: Cache-Memory Optimization

Opportunity 9: Timeout Optimization

Opportunity 10: Strict Mode Adoption

Opportunity 11: Import Consolidation

Opportunity 12: Network Allowlist Optimization

🟢 Low Priority

Opportunity 13: Version Pinning

Opportunity 14: Container Customization

Opportunity 15: Repo-Memory Patterns

4️⃣ Specific Workflow Recommendations

High-Impact Workflow Improvements

issue-triage-agent.md

code-scanning-fixer.md

daily-news.md

repository-quality-improver.md

documentation workflows (multiple)

5️⃣ Trends & Insights

Historical Analysis

6️⃣ Best Practice Guidelines

7️⃣ Action Items

Immediate Actions (this week):

Short-term (this month):

Long-term (this quarter):

📚 References

Research Methodology

Phase 1: Capability Inventory

Phase 2: Usage Analysis

Phase 3: Gap Analysis

Phase 4: Pattern Recognition

Data Collection Tools

Validation Approach

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 12, 2026 Author

github-actions[bot]
bot Feb 5, 2026

`issue-triage-agent.md`

`code-scanning-fixer.md`

`daily-news.md`

`repository-quality-improver.md`

`documentation` workflows (multiple)

github-actions[bot]
bot Feb 12, 2026
Author