PR Template Patterns in Popular Open Source Projects

A data-driven analysis of pull request templates across 3,747 high-quality open source repositories.

Motivation

I maintain a small Python library and wanted to improve my PR template. The problem: I have no intuition for what a good template looks like. Rather than copy a single popular project and hope for the best, I wanted to understand the patterns — what do well-maintained projects consistently ask for, and what's optional?

This project collects PR templates from popular open source repositories, extracts structural features using LLM-based analysis, and synthesizes the findings into actionable guidance.

Dataset

Stage	Count
Repos discovered (BigQuery)	36,324
Repos passing quality filter	5,636
PR template files collected	5,621
After dedup + cleaning	3,747
Features extracted	3,746

Quality filter: ≥100 stars AND (≥10 watchers OR ≥20 forks). Median repository in the final dataset has 971 stars and 234 forks.

Methodology

Data Collection

Discovery: BigQuery public dataset (github_repos.files) identified repositories containing PR template files (paths matching pull_request_template, markdown only).

Collection: GitHub GraphQL API in batches of 40 with 1-second delays. Each qualifying repository's metadata (stars, forks, watchers) and template file content were retrieved.

Cleaning

Size filtering: Removed files <200 bytes (trivial/placeholder templates)
Deduplication: SHA-256 hash of file contents removed 1,046 fork duplicates
Encoding validation: Removed files failing UTF-8 decode

Feature Extraction

Structured extraction using gemini-2.5-flash-lite-preview-09-2025 with constrained JSON output (40 async workers, temperature=0). Each template was analyzed for:

Structural metadata: section count, headings, word count, checklist presence
Content categories: 11 boolean fields for what information the template requests (description, testing, related issues, etc.)
Checklist composition: free-form topic labels, later normalized to a 36-category taxonomy
Subjective assessments: tone (formal/neutral/casual), friction level (low/medium/high), specificity (generic/project-specific)

Extraction succeeded on 3,746 of 3,747 templates (1 failure due to a model output-length bug). Full schema in src/schema.py.

Normalization

Free-form checklist topics from the LLM were mapped to 36 canonical categories via regex patterns. Coverage: 73.7% of 14,173 raw topic mentions matched a category. The remaining 26.3% are predominantly singletons (project-specific language not capturable by a generic taxonomy).

Analysis

Univariate prevalence for all boolean, categorical, and numeric features
Fork-tier stratification (tier 1: 10–49 forks, tier 2: 50–199, tier 3: 200–999, tier 4: 1000+) as a proxy for project maturity and contribution volume
Correlation analysis: phi coefficients (boolean pairs) and point-biserial correlations (numeric × boolean) with Benjamini-Hochberg FDR correction

Findings

What Templates Ask For

Feature	Overall	Tier 4 (1000+ forks)	Recommendation
Description of changes	78%	78%	Must have
Checklist	75%	74%	Must have
Related issues	69%	72%	Must have
Testing evidence	69%	68%	Strongly recommended
Placeholder text	61%	66%	Strongly recommended
Documentation reminder	47%	44%	Context dependent
Motivation / context	43%	44%	Context dependent
Change type classification	26%	24%	Optional
Breaking changes	20%	16%	Optional
Screenshots	14%	14%	Optional
Reviewer notes	13%	13%	Optional

The "must have" threshold is ≥70% prevalence among tier 4 projects. "Strongly recommended" is 50–70%. Features below 50% tier 4 prevalence are context-dependent or optional.

A few observations:

The core four are very stable across tiers. Description, checklist, related issues, and testing all exceed 64% in every tier. Projects converge on these regardless of size.
Placeholder text increases with maturity. 56% in tier 1 → 66% in tier 4. Larger projects invest more in guiding contributors through the template.
Breaking changes decreases with maturity. 23% in tier 1 → 16% in tier 4. This likely reflects that mature projects handle breaking changes through separate processes (RFCs, changelogs) rather than PR templates.

Template Shape

Metric	Mean	Median	IQR
Sections	3.0	3	1–4
Word count	100	75	50–130
Checklist items	4.0	4	1–6

The typical PR template is short: ~75 words, 3 sections, 4 checklist items. Two-thirds are medium friction (2–5 minutes to complete), 30% are low friction. Only 4% are high friction.

84% use a neutral tone. 56% are project-specific (referencing particular tools, conventions, or workflows).

Checklist Topics

Among templates that include checklists, the most common items are:

Topic	% of all templates	Category
Documentation updated	37%	Essential
Tests added	36%	Essential
Tests pass	26%	Recommended
Code style followed	19%	Recommended
Issue linked	16%	Recommended
Contributor guidelines read	16%	Recommended
PR formatting / commit hygiene	14%	Recommended
Changelog updated	13%	Recommended
Change type labeled	9%	Optional
Target branch correct	7%	Optional
Legal sign-off (CLA/DCO)	7%	Optional
Breaking changes noted	7%	Optional

The long tail includes 24 additional categories at <5% prevalence each (build passes, self-review, code comments, screenshots provided, etc.). Full taxonomy: 36 categories, documented in src/normalize.py.

Correlations

No surprising co-occurrence patterns emerged. The strongest non-trivial correlations:

asks_for_breaking_changes × asks_for_change_type (phi = 0.52) — these tend to appear together as part of a "change classification" cluster
has_checklist × asks_for_documentation (phi = 0.42) and has_checklist × asks_for_testing (phi = 0.42) — checklists tend to appear in templates that also ask for testing and docs, reflecting a "thoroughness" cluster
has_placeholder_text × asks_for_description (phi = 0.37) — templates that guide contributors with placeholders tend to be the same ones asking for descriptions

Application

Using these findings, I generated an improved PR template for Pollux, a Python library for multimodal LLM orchestration that I maintain solo.

Approach: A structured prompt containing the aggregate findings, 6 exemplar templates from tier 4 repos (vscode, react, bootstrap, docker-mailserver, nestjs, deepchem), the existing template, and project-specific constraints was provided to Claude Opus 4.6. The full prompt, original template, and design rationale are in synthesis/.

Result

## Summary

<!-- What does this PR do and why? One or two sentences on the change,
     plus motivation if not obvious from the linked issue. -->

## Related issue

<!-- Link the issue this PR addresses. Use closing keywords if applicable:
     "Closes #123", "Fixes #456". Write "None" for unprompted changes. -->

## Test plan

<!-- How did you verify this works? Describe what you tested, not just that
     tests pass. Examples: "Added unit tests for X edge case", "Manually
     tested against Y model provider", "N/A — docs-only change". -->

## Notes

<!-- Optional: anything reviewers should know, design trade-offs, follow-up
     work, or context for your future self. -->

---

- [ ] PR title follows conventional commits
- [ ] `make check` passes
- [ ] Tests cover the meaningful cases, not just the happy path
- [ ] Docs updated (if this changes public API or user-facing behavior)

What changed

Change	Data support
Added Related issue section	72% of tier 4 projects ask for linked issues
Added Test plan section	69% ask for testing evidence; a free-text section is more useful than a checkbox
Expanded Summary guidance to include "and why?"	Folds motivation (43%) into the existing section without a new heading
Added docs updated checklist item	Most common checklist topic (37% of all templates)

What was deliberately excluded

Feature	Prevalence	Why excluded for this project
Change type classification	26%	Conventional commits already encode this (`fix:`, `feat:`, `feat!:`)
Breaking changes section	20%	Covered by `!` suffix; pre-1.0, breaking changes are expected
Code style checklist items	19%	`make check` runs ruff + mypy; CI is authoritative
Contributor guidelines reminder	16%	Contributing guide is linked from the conventional commits item
Changelog reminder	13%	`python-semantic-release` generates the changelog automatically

The revised template has 4 sections and 4 checklist items (dataset median: 3 sections, 4 items) and stays in the low-friction band (~1–2 minutes to fill out vs. the dataset median of 2–5 minutes). Full design rationale in synthesis/rationale.md.

Limitations

Descriptive, not causal. This study identifies what popular projects do, not what works. There is no outcome linkage (merge time, review quality, rework rates). The implicit argument is revealed preference: if 78% of high-fork projects independently converge on the same structure, there is signal in that convergence.
Survivor bias. Only repositories with ≥100 stars are included. Patterns may not generalize to smaller or newer projects.
Snapshot timing. BigQuery's github_repos dataset reports a last update of September 2025. Templates were fetched live from GitHub, but repository discovery reflects that snapshot.
LLM extraction. Feature extraction via Gemini may misclassify edge cases. Spot-checking was performed but no formal gold-set evaluation. One template (of 3,747) failed extraction due to a model output-length bug.
Normalization coverage. The regex-based taxonomy captures 73.7% of raw checklist topics. The remaining 26.3% is dominated by singletons (project-specific language). This means topic prevalence numbers are lower bounds — the true prevalence of concepts like "tests added" is likely higher than reported, since some variant phrasings go uncaptured.
Template ≠ practice. A template requesting testing evidence does not mean tests are actually written. Templates were analyzed in isolation, without knowledge of enforcement or compliance.

Reproduction

# Install dependencies
uv sync

# Run the full pipeline (stages must run in order for first-time data generation)
uv run python -m src.fetch_repos       # Stage 1: BigQuery repo discovery
uv run python -m src.collect           # Stage 2: GitHub GraphQL collection
uv run python -m src.clean             # Stage 3: Deduplication & filtering
uv run python -m src.extract           # Stage 4: LLM feature extraction
uv run python -m src.normalize         # Stage 5: Checklist topic normalization
uv run python -m src.analyze           # Stage 6: Statistical analysis
uv run python -m src.synthesize        # Stage 7: Recommendations & exemplars

Required environment variables (see .env.example):

Variable	Required for
`BQ_PROJECT`	Stage 1 (BigQuery discovery)
`GITHUB_TOKEN`	Stage 2 (GraphQL collection)
`GEMINI_API_KEY`	Stage 4 (feature extraction)

Stages 5–7 require no API keys and run in under 5 seconds total on the pre-generated data.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
queries		queries
src		src
synthesis		synthesis
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
RESEARCH_ROADMAP.md		RESEARCH_ROADMAP.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PR Template Patterns in Popular Open Source Projects

Motivation

Dataset

Methodology

Data Collection

Cleaning

Feature Extraction

Normalization

Analysis

Findings

What Templates Ask For

Template Shape

Checklist Topics

Correlations

Application

Result

What changed

What was deliberately excluded

Limitations

Reproduction

License

About

Uh oh!

Languages

License

seanbrar/gh-templates

Folders and files

Latest commit

History

Repository files navigation

PR Template Patterns in Popular Open Source Projects

Motivation

Dataset

Methodology

Data Collection

Cleaning

Feature Extraction

Normalization

Analysis

Findings

What Templates Ask For

Template Shape

Checklist Topics

Correlations

Application

Result

What changed

What was deliberately excluded

Limitations

Reproduction

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages