[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-13 #15296

2026-02-13T05:03:59Z

github-actions[bot]
bot Feb 13, 2026

Daily NLP-based clustering analysis of copilot agent task prompts to identify patterns, success factors, and opportunities for optimization.

Executive Summary

Analyzed 1,354 copilot-created PRs from the last 30 days using advanced NLP clustering techniques (TF-IDF vectorization + K-means). Identified 7 distinct task patterns with varying success rates (56%-72%). Overall agent performance shows 67.1% success rate with notable differences across task types.

Key Findings:

Fix tasks struggle most (56% success rate) - primary improvement opportunity
Test and Mixed tasks perform best (72% success rate) - effective patterns to replicate
Remove tasks require most iterations (5.1 avg commits) - complexity indicator
Most tasks are workflow/infrastructure related (workflows, github, mcp keywords dominate)

Analysis Metrics

Metric	Value
Total Tasks Analyzed	1,354
Date Range	2026-01-21 to 2026-02-13 (24 days)
Clusters Identified	7
Overall Success Rate	67.1% (908 merged, 427 closed, 19 open)
Avg Commits per Task	3.6
Clustering Method	K-means with TF-IDF (150 features, 1-3 grams)

Cluster Performance Overview

Rank	Cluster	Theme	Size	Success Rate	Avg Commits	Complexity
1	Cluster 2	Mixed (Workflows)	339 (25.0%)	72.3% ✅	4.1	Medium
2	Cluster 1	Test	265 (19.6%)	72.1% ✅	3.5	Low
3	Cluster 7	Update	74 (5.5%)	70.3% ⚠️	4.4	Medium
4	Cluster 3	Remove	57 (4.2%)	68.4% ⚠️	5.1	High
5	Cluster 4	Mixed (MCP)	248 (18.3%)	66.9% ⚠️	3.4	Low
6	Cluster 6	Docs	69 (5.1%)	66.7% ⚠️	2.8	Low
7	Cluster 5	Fix	302 (22.3%)	56.0% ❌	2.9	Medium

Legend: ✅ High (≥70%) | ⚠️ Moderate (60-69%) | ❌ Low (<60%)

Detailed Cluster Analysis

Cluster 2: Mixed (Workflows) - 25.0% of tasks

Performance

Success Rate: 72.3% (245 merged, 91 closed, 3 open)
Keywords: workflows, github, set, agentic, workflow
Avg Commits: 4.1
Characteristics: General workflow configuration and infrastructure changes

Why It Works

Clear, structured workflow modifications
Well-defined scope and requirements
Existing patterns to follow
Good documentation coverage

Representative Examples

#12589 - Implement deterministic SHA-256 hash for workflow frontmatter
#12453 - Add W3C-style security architecture specification
#15055 - Update developer-docs-consolidator

Cluster 1: Test - 19.6% of tasks

Performance

Success Rate: 72.1% (191 merged, 74 closed, 9 open)
Keywords: test, tests, error, workflow, job
Avg Commits: 3.5
Characteristics: Test infrastructure, CI fixes, test coverage improvements

Why It Works

Concrete verification criteria (tests pass/fail)
Clear success conditions
Existing test patterns in codebase
Automated validation

Representative Examples

#11283 - Add retry logic to Copilot CLI installation
#12371 - Add missing test patterns to CI (353 untested functions)
#11778 - Phase 1: Consolidate validation helpers

Cluster 5: Fix - 22.3% of tasks ⚠️ NEEDS ATTENTION

Performance

Success Rate: 56.0% (169 merged, 133 closed, 4 open) ❌
Keywords: issue, workflow, comments, workflows, section
Avg Commits: 2.9
Characteristics: Bug fixes, issue resolutions, problem investigations

Why It Struggles

Often poorly defined problems
Unclear reproduction steps
Multiple root causes
Incomplete context in prompts
Investigation required before fixing

Improvement Opportunities

Better Context: Include error logs, stack traces, reproduction steps
Root Cause First: Investigate before attempting fixes
Scope Definition: Break down multi-issue PRs into focused tasks
Verification Steps: Define explicit success criteria

Representative Examples

#13005 - Document safe output patterns
#12662 - Add workflow health monitoring command
#15240 - Add allowed-repos support

Cluster 4: Mixed (MCP) - 18.3% of tasks

Performance

Success Rate: 66.9% (166 merged, 80 closed, 2 open)
Keywords: mcp, server, tool, mcp server, workflow
Avg Commits: 3.4
Characteristics: MCP server configuration, tool integration

Representative Examples

#15277 - Update Copilot CLI to 0.0.409
#11877 - Refactor mcp_inspect.go (1011→285 lines)
#11560 - Document Tavily MCP server secret dependency

Cluster 7: Update - 5.5% of tasks

Performance

Success Rate: 70.3% (52 merged, 22 closed, 1 open)
Keywords: project, handler, update, safe, safe outputs
Avg Commits: 4.4
Characteristics: Dependency updates, version bumps, refactoring

Representative Examples

#12967 - Unify safe output processing
#12208 - Add issues:write permission
#12157 - Track campaign label permission errors

Cluster 6: Docs - 5.1% of tasks

Performance

Success Rate: 66.7% (46 merged, 23 closed, 0 open)
Keywords: node, firewall, docs, aw, gh aw
Avg Commits: 2.8
Characteristics: Documentation updates, dependency updates

Representative Examples

#12576 - Bundle Node.js dependency updates for /docs
#12552 - Bundle Node.js docs dependencies
#14511 - Bump docs NPM dependencies

Cluster 3: Remove - 4.2% of tasks

Performance

Success Rate: 68.4% (39 merged, 18 closed, 0 open)
Keywords: campaign, project, security, md, workflow
Avg Commits: 5.1 (highest iteration count)
Characteristics: Deprecation, cleanup, removal tasks

Why More Iterations

Cascading changes
Breaking changes require careful handling
Multiple dependent components
Thorough testing needed

Representative Examples

#11087 - Replace campaign fusion with dispatch-only workers
#12310 - Expand test coverage for campaign injection
#12053 - Add project field with campaign orchestration

Recommendations

🎯 Priority 1: Improve Fix Task Success Rate (Currently 56%)

Problem: Fix tasks have the lowest success rate at 56%, representing 302 tasks (22.3% of all work).

Root Causes:

Vague problem descriptions ("fix the bug")
Missing reproduction steps
Incomplete error context
Multiple issues bundled together
Requires investigation before fixing

Action Items:

Template for Fix Tasks:

## Problem
- What's broken: [specific behavior]
- Expected: [what should happen]
- Actual: [what actually happens]

## Context
- Error logs: [paste full error]
- Reproduction: [exact steps]
- Environment: [relevant details]

## Success Criteria
- [ ] Error no longer occurs
- [ ] Tests pass
- [ ] No regressions

Two-Phase Approach:
- Phase 1: Investigation task (gather context, identify root cause)
- Phase 2: Fix task (with clear root cause and solution)
Better Scoping: Break "fix multiple issues" into separate focused PRs

✅ Priority 2: Replicate Success Patterns

Successful Clusters: Test (72.1%) and Mixed Workflows (72.3%)

Success Factors:

Clear, concrete objectives
Automated validation (tests pass/fail)
Existing patterns to follow
Well-structured prompts

Action Items:

Analyze successful PR prompts for common patterns
Create prompt templates for each cluster type
Document "what good looks like" for each task category

📊 Priority 3: Monitor Iteration Counts

Observation: Remove tasks require 5.1 commits on average (42% more than average).

Action Items:

Flag tasks with >4 commits for review
Understand if high iteration count indicates:
- Task complexity (expected)
- Poor initial instructions (fixable)
- Inadequate context (fixable)
Track iteration count trends over time

🔍 Priority 4: Improve Prompt Quality

General Observations:

Many prompts lack specific success criteria
Context often incomplete
Multiple concerns mixed in single task

Action Items:

Prompt Checklist:
- Clear objective stated
- Success criteria defined
- Relevant context provided
- Single focus (not mixing concerns)
- Examples provided (where helpful)
Prompt Engineering Training: Share successful prompt examples

Methodology

Data Collection

Source: GitHub PR search API for copilot-created PRs
Period: 2026-01-21 to 2026-02-13 (24 days)
Volume: 1,354 PRs with full metadata (bodies, titles, commits, reviews, files)

Text Processing

Prompt Extraction: Cleaned PR bodies to extract task descriptions
Preprocessing: Removed markdown, URLs, code blocks, HTML comments
Keyword Extraction: Basic term frequency for initial insights

Clustering Analysis

Vectorization: TF-IDF with 150 features, 1-3 grams, English stop words
Algorithm: K-means clustering with silhouette score optimization
Cluster Count: Optimal k=7 (elbow method + silhouette score)
Theme Identification: Pattern matching on cluster keywords
Validation: Manual review of sample tasks per cluster

Metrics

Success Rate: Merged PRs / Total PRs
Complexity: Avg files changed, commits, code churn
Iterations: Avg commits per PR (proxy for revisions needed)

Next Steps

Weekly Tracking: Run this analysis weekly to track trends
Prompt Templates: Create cluster-specific prompt templates
Success Metrics: Track success rate changes after implementing recommendations
Deep Dive: Investigate failed Fix tasks to identify common failure modes
Benchmarking: Compare against human-created PRs for similar tasks

Analysis Run: §21974945192

AI generated by Copilot Agent Prompt Clustering Analysis

expires on Feb 20, 2026, 5:03 AM UTC

2026-02-20T05:07:09Z

github-actions[bot]
bot Feb 20, 2026
Author

This discussion was automatically closed because it expired on 2026-02-20T05:03:58.699Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-13 #15296

Uh oh!

{{title}}

Uh oh!

Performance

Why It Works

Representative Examples

Performance

Why It Works

Representative Examples

Performance

Why It Struggles

Improvement Opportunities

Representative Examples

Performance

Representative Examples

Performance

Representative Examples

Performance

Representative Examples

Performance

Why More Iterations

Representative Examples

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 2026-02-13 #15296

Uh oh!

github-actions[bot] bot Feb 13, 2026

Executive Summary

Analysis Metrics

Cluster Performance Overview

Detailed Cluster Analysis

Performance

Why It Works

Representative Examples

Performance

Why It Works

Representative Examples

Performance

Why It Struggles

Improvement Opportunities

Representative Examples

Performance

Representative Examples

Performance

Representative Examples

Performance

Representative Examples

Performance

Why More Iterations

Representative Examples

Recommendations

🎯 Priority 1: Improve Fix Task Success Rate (Currently 56%)

✅ Priority 2: Replicate Success Patterns

📊 Priority 3: Monitor Iteration Counts

🔍 Priority 4: Improve Prompt Quality

Methodology

Data Collection

Text Processing

Clustering Analysis

Metrics

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 20, 2026 Author

github-actions[bot]
bot Feb 13, 2026

github-actions[bot]
bot Feb 20, 2026
Author