CuREV: Curating Review Comments for Improved Code Review Automation

This is the replication package accompanying our paper, Curating Review Comments for Improved Code Review Automation.

The datasets of this paper are available on Zenodo and HuggingFace 🤗.

Latest Updates 🚀

Note

[11-02-2025] 🤗 CuREV is now available on Hugging Face! You can access it here.

[05-02-2025] 🔥 We release the first version of CuREV and the complete replication package of our paper.

[12-01-2025] 📢 Our paper has been accepted for MSR 2025! Read it on arXiv.

Overview

We propose a methodology to curate a code reviews dataset to enhance its quality and improve the performance of language models on code review downstream tasks, namely comment generation and code refinement.

Overview of our curation pipeline for code reviews to create CuREV dataset (see our paper for more details).

The main contributions of this work are threefold: (1) A data-centric evaluation framework, (2) A curation pipeline to improve the quality of review comments, and (3) Evaluation of the curated dataset, compared to the original, on downstream tasks (i.e, comment generation and code refinement).

Project structure

The project is structured as follows.

.
├── code_refinement/        # Code refinement package
├── comment_generation/     # Comment generation package
├── quality_assessment/        # empirical study package
├── data_curation/          # dataset curation package
├── util/                   # package for helpers and config
├── data/                   # Folder for dataset and results
├── models/                 # Folder for large language models
├── requirements.txt        # required python libraries

Environment setup

There are two ways to set up the environment to run the project and use the scripts. Choose the method that best suits your workflow: python virtual environment or docker.

A. Python virtual environment

This method involves setting up a python virtual environment to isolate the dependencies required by the project. Follow the steps below to create the environment, install the necessary libraries, and verify the installation.

To facilitate usage and results replication, we include a file requirements.txt to install the required python libraries. Here are the instructions to create a virtual environment, activate it, and install dependencies using the provided requirements.txt file:

Create a Virtual Environment
Run the following command to create a virtual environment named venv:
```
python3 -m venv venv
```
Activate the Virtual Environment
- On macOS/Linux:
```
source venv/bin/activate
```
- On Windows:
```
.\venv\Scripts\activate
```
Install Dependencies
With the virtual environment activated, install the required Python libraries from requirements.txt:
```
pip install -r requirements.txt
```
Verify the Installation
To confirm that all dependencies are installed correctly, run:
```
pip list
```
Deactivating the Environment
When you’re finished, you can deactivate the virtual environment with:
```
deactivate
```

B. Docker

Alternatively, you can run the project within a Docker container. This setup makes it easy to run the project in a containerized environment.

Prerequisites

Docker installed on your system. You can download it from here.

Steps to Run with Docker

Build the docker image
Open a terminal in the root directory of the project and run the following command to build the Docker image:
```
docker build -t curev .
```
This will create a Docker image named curev based on the provided Dockerfile.
Create the docker container
After building the image, you can create a docker container using:
```
docker run -it --name curev-container curev-image
```
The above command should be run once (the first time) to create a container from the built image. For subsequent attempts, we should use the following commands to start and connect to the created docker container:
```
docker start curev-container
docker exec -it curev-container bash
```
To stop the container:
```
docker stop curev-container
```
The project is now set up to run within a Docker container for your project, and you can easily start, connect, and stop the container as needed.
Verify the Installation
To confirm that all dependencies are installed correctly, run:
```
pip list
```

Data

The original code review dataset is available in Zenodo. To run the experiments, you need to download Code_Refinement.zip and place the dataset under the data/ folder. You can use the utilities method create_HFdataset in util.dataset to merge the downloaded jsonl files into a HuggingFace dataset.

Models

We run Llama-3.1-70B on our local machines using ExLlamaV2 to geneerate accurate judgments using our defined evaluation framework. You can choose the same model or any other model. Or, you can to download a quantized version of any other model that is compatible with ExLlamaV2. The downloaded model should be placed under the folder models/.

1- A data-centric evaluation framework

We propose an evaluation framework to categorize and assess the quality of code reviews. It consists of (1) a categorization scheme to classify the type, nature, and civility of code review comments, and (2) scoring criteria to assess the overall quality of code reviews based on their relevance, clarity, and conciseness. We apply our evaluation framework to the largest existing dataset of code reviews. Given the scale of the dataset, we utilize a large language model (LLM) as a judge to automatically annotate samples with thoroughly designed prompts to ensure reliable and consistent annotations.

The experiments conducted for this contribution are available under the folder quality_assessment/.

To run the LLM judgments:

python quality_assessment/inference.py \
      --model_dir="models/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4/" \
      --dataset_path="data/Code_Refinement/CRdataset" \
      --save_steps=5000

The full list of arguments is available in util/config.py.

2- CuREV: a curated dataset for code review

The experiments conducted for this contribution are available under the folder data_curation/.

To run the experiments for reformulating review comments:

python reformulate_reviews/inference.py \
      --model_dir="models/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4/" \
      --dataset_path="data/Code_Refinement/CRdataset" \
      --output_path="data/eval_results/reform_results.jsonl" \
      --save_steps=5000

The full list of arguments is available in util/config.py.

3-a. Comment generation

The experiments conducted for this contribution are available under the folder comment_generation/.

To train a language model on comment generation on the original dataset:

python comment_generation/sft_init.py \
      --model_name_or_path="deepseek-ai/deepseek-coder-6.7b-instruct" \
      --dataset_path="data/Code_Refinement/CRdataset" \
      --output_path="data/eval_results/reform_results.jsonl" \
      --save_steps=200 \
      --checkpoint_path="models/comment_generation/init_ckpts" \
      --output_path="models/comment_generation/final_model"

To train a language model on comment generation on the original dataset:

python comment_generation/sft_cur.py \
      --model_name_or_path="deepseek-ai/deepseek-coder-6.7b-instruct" \
      --dataset_path="data/Code_Refinement/CRdataset_reform" \
      --output_path="data/eval_results/reform_results.jsonl" \
      --save_steps=200 \
      --checkpoint_path="models/comment_generation/init_ckpts" \
      --output_path="models/comment_generation/final_model"

To run the inference on the initial or curated dataset:

python comment_generation/hf_inference-init.py
python comment_generation/hf_inference-cur.py

To run the evaluation of both model:

python comment_generation/evaluation.py"

The full list of arguments is available in util/config.py.

3-b. Code refinement

The experiments conducted for this contribution are available under the folder code_refinement/.

To run the inference of a model for code on the initial dataset:

python comment_generation/hf_inference-init.py \
      --model_name_or_path="deepseek-ai/deepseek-coder-6.7b-instruct" \
      --dataset_path="data/Code_Refinement/CRdataset" \
      --output_path="data/eval_results/reform_results.jsonl" \
      --save_steps=1000 \
      --output_path="models/init_coderef_results.jsonl"

To run the inference of a model for code on the curated dataset:

python comment_generation/hf_inference-cur.py \
      --model_name_or_path="deepseek-ai/deepseek-coder-6.7b-instruct" \
      --dataset_path="data/Code_Refinement/CRdataset_reform" \
      --output_path="data/eval_results/reform_results.jsonl" \
      --save_steps=1000 \
      --output_path="models/cur_coderef_results.jsonl"

To run the evaluation of both model:

python comment_generation/evaluate.py"

The full list of arguments is available in util/config.py.

Contributors

For questions, collaborations, or further discussion, please feel free to reach out to any of our contributors via email or open an issue on GitHub.

Name	Contact	Github
Oussama Ben Sghaier	oussama.ben.sghaier@umontreal.ca	OussamaSghaier
Martin Weyssow	mweyssow@smu.edu.sg	martin-wey
Houari Sahraoui	sahraouh@iro.umontreal.ca

Citation

@misc{sghaier2025harnessinglargelanguagemodels,
      title={Harnessing Large Language Models for Curated Code Reviews}, 
      author={Oussama Ben Sghaier and Martin Weyssow and Houari Sahraoui},
      year={2025},
      eprint={2502.03425},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2502.03425}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CuREV: Curating Review Comments for Improved Code Review Automation

Latest Updates 🚀

Overview

Table of Contents

Project structure

Environment setup

A. Python virtual environment

B. Docker

Prerequisites

Steps to Run with Docker

Data

Models

1- A data-centric evaluation framework

2- CuREV: a curated dataset for code review

3-a. Comment generation

3-b. Code refinement

Contributors

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
code_refinement		code_refinement
comment_generation		comment_generation
quality_assessment		quality_assessment
reformulate_reviews		reformulate_reviews
util		util
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

License

OussamaSghaier/CuREV

Folders and files

Latest commit

History

Repository files navigation

CuREV: Curating Review Comments for Improved Code Review Automation

Latest Updates 🚀

Overview

Table of Contents

Project structure

Environment setup

A. Python virtual environment

B. Docker

Prerequisites

Steps to Run with Docker

Data

Models

1- A data-centric evaluation framework

2- CuREV: a curated dataset for code review

3-a. Comment generation

3-b. Code refinement

Contributors

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages