Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests

Replication Package for MSR 2026

This repository contains the replication package for the paper "Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests" accepted for publication at the MSR 2026 Conference.

Overview

This study investigates how AI coding agents' pull requests (APRs) differ from human pull requests (HPRs) in terms of code change characteristics and description quality. We analyze 33,596 agent-generated PRs and 6,618 human PRs to answer two research questions:

RQ1: How do APRs and HPRs differ in code change characteristics (files changed, code churn, lines added/removed, and change purposes)?
RQ2: How well do APR descriptions and commit messages align with code changes?

Project Structure

.
├── notebooks/
│   ├── RQ1.ipynb          # RQ1: Code change characteristics analysis
│   └── RQ2.ipynb          # RQ2: Description alignment analysis
├── scripts/
│   ├── build_human_pr_commit_details_df.py  # Build human PR commit details
│   └── gen_commit_message.py                # Generate commit messages
├── data/                  # Data directory (see Data Requirements below)
├── plots/                 # Generated plots and visualizations
├── prompts/               # LLM prompts (e.g., LLM-as-judge)
├── pyproject.toml         # Project dependencies
├── uv.lock               # Dependency lock file
└── README.md             # This file

Prerequisites

Python 3.12 or higher
uv (Python package manager)
GitHub API tokens (for building human PR commit details)

Setup

1. Install uv

On macOS:

brew install uv

On Linux/Windows, see: https://github.com/astral-sh/uv

2. Create and activate the virtual environment

uv venv --python 3.12
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install dependencies

uv sync

4. Configure environment variables (required for building human PR data)

Create a .env file in the project root:

GITHUB_TOKEN_1=your_github_token_1
GITHUB_TOKEN_2=your_github_token_2

Note: GitHub tokens are required to build the human_pr_commit_details_df.parquet file using the build script (see Additional Scripts section).

Data Requirements

Required Files

The following data files are required to run the replication notebooks:

From AIDev Dataset

pull_request.parquet - Agent-generated PRs
pr_commits.parquet - PR commits data
pr_commit_details.parquet - Commit-level file change details for APRs
human_pull_request.parquet - Human PRs
human_pr_commit_details_df.parquet - Human PR commit details (must be generated, see Additional Scripts section)
pr_task_type.parquet - Agent PR task type classifications
human_pr_task_type.parquet - Human PR task type classifications
related_issue.parquet - Related issue data

Benchmark Datasets

cleaned_train.csv - PR description and commit message similarity benchmark (train set)
commitbench_test.csv - Commit message similarity benchmark (test set)

Obtaining the Data

These data files are NOT included in this repository. They must be downloaded from the following sources:

AIDev dataset: https://huggingface.co/datasets/hao-li/AIDev
PR-Description benchmark: https://figshare.com/s/58ee9c2a4e9d951305d7?file=46126455
CommitBench dataset: https://huggingface.co/datasets/Maxscha/commitbench

After downloading, place all .parquet and .csv files in the data/ directory:

data/
├── pull_request.parquet
├── pr_commits.parquet
├── pr_commit_details.parquet
├── human_pull_request.parquet
├── human_pr_commit_details_df.parquet  # (must be generated)
├── pr_task_type.parquet
├── human_pr_task_type.parquet
├── related_issue.parquet
├── cleaned_train.csv
└── commitbench_test.csv

Running the Analysis

RQ1: Code Change Characteristics Analysis

Analyzes how APRs and HPRs differ in:

Merge rates and change footprints (commits, files, directories, lines)
Symbol churn and symbol lifetime
Change purposes (feature, bug fix, documentation, etc.)

Open and run the notebook:

jupyter notebook notebooks/RQ1.ipynb

Or using JupyterLab:

jupyter lab notebooks/RQ1.ipynb

RQ2: Description Alignment Analysis

Examines the quality of commit messages and PR descriptions using:

PR-Commit Similarity (semantic alignment between PR description and commit messages)
Patch-Commit Similarity (alignment between diff and messages)
LLM-based Consistency Score (GPT-4o quality rating)
Classification models to identify factors predicting strong descriptions

Open and run the notebook:

jupyter notebook notebooks/RQ2.ipynb

Or using JupyterLab:

jupyter lab notebooks/RQ2.ipynb

Additional Scripts

Building Human PR Commit Details (Required)

The human_pr_commit_details_df.parquet file must be generated by running this script. This file is required for the analysis notebooks.

First time run:

python scripts/build_human_pr_commit_details_df.py -o data/human_pr_commit_details_df.parquet

Resume from previous run (if interrupted):

python scripts/build_human_pr_commit_details_df.py -o data/human_pr_commit_details_df.parquet --resume

Note: This script requires GitHub API tokens (see Setup section). The script fetches commit details for all human PRs from the AIDev dataset.

Generating Commit Messages

Generate commit messages using the CodeT5 model:

python scripts/gen_commit_message.py -i data/input.parquet -o data/output.parquet

Note: GPU support is recommended for faster processing. The input parquet file must contain a patch column.

Results

All analysis results are embedded directly in the Jupyter notebooks (RQ1.ipynb and RQ2.ipynb). Run the notebooks to reproduce all findings from the paper.

Dependencies

Key dependencies (managed via pyproject.toml and uv.lock):

pandas - Data manipulation and analysis
numpy - Numerical computations
scipy - Statistical tests
scikit-learn - Machine learning models
shap - Model interpretability
matplotlib, seaborn - Visualizations
sentence-transformers - Text embeddings
transformers - LLM fine-tuning and inference
pyarrow - Parquet file support

Notes

The notebooks include extensive documentation and markdown cells explaining each analysis step.
Some analyses (e.g., embedding generation, model inference) are computationally intensive. GPU support is recommended for faster processing but not required.
The LLM-as-judge prompt used for RQ2 is available in prompts/lllm_as_judge_prompt.md.

Citation

If you use this replication package, please cite the paper:

@inproceedings{pham2026agentic_codechange,
  title={Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests},
  author={Dung Pham and Taher A. Ghaleb},
  booktitle={Proceedings of the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR)},
  year={2026}
}

Contact

For questions about this replication package, please contact:

Dung Pham: dungpham290198@gmail.com
Taher A. Ghaleb: taherghaleb@trentu.ca

Acknowledgments

This work uses the AIDev dataset by Hao Li et al., available at https://huggingface.co/datasets/hao-li/AIDev.

We also use benchmark datasets from:

Tire et al. (PR-Description benchmark)
Schall et al. (CommitBench dataset)

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC): RGPIN-2025-05897.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests

Overview

Project Structure

Prerequisites

Setup

1. Install uv

2. Create and activate the virtual environment

3. Install dependencies

4. Configure environment variables (required for building human PR data)

Data Requirements

Required Files

From AIDev Dataset

Benchmark Datasets

Obtaining the Data

Running the Analysis

RQ1: Code Change Characteristics Analysis

RQ2: Description Alignment Analysis

Additional Scripts

Building Human PR Commit Details (Required)

Generating Commit Messages

Results

Dependencies

Notes

Citation

Contact

Acknowledgments

Funding

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
notebooks		notebooks
plots		plots
prompts		prompts
scripts		scripts
README.md		README.md
config.py		config.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests

Overview

Project Structure

Prerequisites

Setup

1. Install uv

2. Create and activate the virtual environment

3. Install dependencies

4. Configure environment variables (required for building human PR data)

Data Requirements

Required Files

From AIDev Dataset

Benchmark Datasets

Obtaining the Data

Running the Analysis

RQ1: Code Change Characteristics Analysis

RQ2: Description Alignment Analysis

Additional Scripts

Building Human PR Commit Details (Required)

Generating Commit Messages

Results

Dependencies

Notes

Citation

Contact

Acknowledgments

Funding

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages