Skip to content

Taher-Ghaleb/AIAgentsAlignment-MSR2026

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests

Replication Package for MSR 2026

This repository contains the replication package for the paper "Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests" accepted for publication at the MSR 2026 Conference.

Overview

This study investigates how AI coding agents' pull requests (APRs) differ from human pull requests (HPRs) in terms of code change characteristics and description quality. We analyze 33,596 agent-generated PRs and 6,618 human PRs to answer two research questions:

  • RQ1: How do APRs and HPRs differ in code change characteristics (files changed, code churn, lines added/removed, and change purposes)?
  • RQ2: How well do APR descriptions and commit messages align with code changes?

Project Structure

.
├── notebooks/
│   ├── RQ1.ipynb          # RQ1: Code change characteristics analysis
│   └── RQ2.ipynb          # RQ2: Description alignment analysis
├── scripts/
│   ├── build_human_pr_commit_details_df.py  # Build human PR commit details
│   └── gen_commit_message.py                # Generate commit messages
├── data/                  # Data directory (see Data Requirements below)
├── plots/                 # Generated plots and visualizations
├── prompts/               # LLM prompts (e.g., LLM-as-judge)
├── pyproject.toml         # Project dependencies
├── uv.lock               # Dependency lock file
└── README.md             # This file

Prerequisites

  • Python 3.12 or higher
  • uv (Python package manager)
  • GitHub API tokens (for building human PR commit details)

Setup

1. Install uv

On macOS:

brew install uv

On Linux/Windows, see: https://github.com/astral-sh/uv

2. Create and activate the virtual environment

uv venv --python 3.12
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install dependencies

uv sync

4. Configure environment variables (required for building human PR data)

Create a .env file in the project root:

GITHUB_TOKEN_1=your_github_token_1
GITHUB_TOKEN_2=your_github_token_2

Note: GitHub tokens are required to build the human_pr_commit_details_df.parquet file using the build script (see Additional Scripts section).

Data Requirements

Required Files

The following data files are required to run the replication notebooks:

From AIDev Dataset

  1. pull_request.parquet - Agent-generated PRs
  2. pr_commits.parquet - PR commits data
  3. pr_commit_details.parquet - Commit-level file change details for APRs
  4. human_pull_request.parquet - Human PRs
  5. human_pr_commit_details_df.parquet - Human PR commit details (must be generated, see Additional Scripts section)
  6. pr_task_type.parquet - Agent PR task type classifications
  7. human_pr_task_type.parquet - Human PR task type classifications
  8. related_issue.parquet - Related issue data

Benchmark Datasets

  1. cleaned_train.csv - PR description and commit message similarity benchmark (train set)
  2. commitbench_test.csv - Commit message similarity benchmark (test set)

Obtaining the Data

These data files are NOT included in this repository. They must be downloaded from the following sources:

  1. AIDev dataset: https://huggingface.co/datasets/hao-li/AIDev
  2. PR-Description benchmark: https://figshare.com/s/58ee9c2a4e9d951305d7?file=46126455
  3. CommitBench dataset: https://huggingface.co/datasets/Maxscha/commitbench

After downloading, place all .parquet and .csv files in the data/ directory:

data/
├── pull_request.parquet
├── pr_commits.parquet
├── pr_commit_details.parquet
├── human_pull_request.parquet
├── human_pr_commit_details_df.parquet  # (must be generated)
├── pr_task_type.parquet
├── human_pr_task_type.parquet
├── related_issue.parquet
├── cleaned_train.csv
└── commitbench_test.csv

Running the Analysis

RQ1: Code Change Characteristics Analysis

Analyzes how APRs and HPRs differ in:

  • Merge rates and change footprints (commits, files, directories, lines)
  • Symbol churn and symbol lifetime
  • Change purposes (feature, bug fix, documentation, etc.)

Open and run the notebook:

jupyter notebook notebooks/RQ1.ipynb

Or using JupyterLab:

jupyter lab notebooks/RQ1.ipynb

RQ2: Description Alignment Analysis

Examines the quality of commit messages and PR descriptions using:

  • PR-Commit Similarity (semantic alignment between PR description and commit messages)
  • Patch-Commit Similarity (alignment between diff and messages)
  • LLM-based Consistency Score (GPT-4o quality rating)
  • Classification models to identify factors predicting strong descriptions

Open and run the notebook:

jupyter notebook notebooks/RQ2.ipynb

Or using JupyterLab:

jupyter lab notebooks/RQ2.ipynb

Additional Scripts

Building Human PR Commit Details (Required)

The human_pr_commit_details_df.parquet file must be generated by running this script. This file is required for the analysis notebooks.

First time run:

python scripts/build_human_pr_commit_details_df.py -o data/human_pr_commit_details_df.parquet

Resume from previous run (if interrupted):

python scripts/build_human_pr_commit_details_df.py -o data/human_pr_commit_details_df.parquet --resume

Note: This script requires GitHub API tokens (see Setup section). The script fetches commit details for all human PRs from the AIDev dataset.

Generating Commit Messages

Generate commit messages using the CodeT5 model:

python scripts/gen_commit_message.py -i data/input.parquet -o data/output.parquet

Note: GPU support is recommended for faster processing. The input parquet file must contain a patch column.

Results

All analysis results are embedded directly in the Jupyter notebooks (RQ1.ipynb and RQ2.ipynb). Run the notebooks to reproduce all findings from the paper.

Dependencies

Key dependencies (managed via pyproject.toml and uv.lock):

  • pandas - Data manipulation and analysis
  • numpy - Numerical computations
  • scipy - Statistical tests
  • scikit-learn - Machine learning models
  • shap - Model interpretability
  • matplotlib, seaborn - Visualizations
  • sentence-transformers - Text embeddings
  • transformers - LLM fine-tuning and inference
  • pyarrow - Parquet file support

Notes

  • The notebooks include extensive documentation and markdown cells explaining each analysis step.
  • Some analyses (e.g., embedding generation, model inference) are computationally intensive. GPU support is recommended for faster processing but not required.
  • The LLM-as-judge prompt used for RQ2 is available in prompts/lllm_as_judge_prompt.md.

Citation

If you use this replication package, please cite the paper:

@inproceedings{pham2026agentic_codechange,
  title={Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests},
  author={Dung Pham and Taher A. Ghaleb},
  booktitle={Proceedings of the 23rd IEEE/ACM International Conference on Mining Software Repositories (MSR)},
  year={2026}
}

Contact

For questions about this replication package, please contact:

Acknowledgments

This work uses the AIDev dataset by Hao Li et al., available at https://huggingface.co/datasets/hao-li/AIDev.

We also use benchmark datasets from:

  • Tire et al. (PR-Description benchmark)
  • Schall et al. (CommitBench dataset)

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC): RGPIN-2025-05897.

About

This is the replication package associated wiht the paper "Code Change Characteristics and Description Alignment: A Comparative Study of Agentic versus Human Pull Requests", accepted for publication at the MSR 2026 Conference (Mining Challenge).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors