This repo contains code for Rewriting the Code: A Simple Framework for Large Language Model Augmented Semantic Code Search, accepted to ACL 2024. In this codebase we provide instructions for reproducing our results from the paper. We hope that this work can be useful for future research on Generation-Augmented Retrieval framework for code search.
conda create -n ReCo python=3.8 -y
conda activate ReCo
conda install pytorch-gpu=1.7.1 -y
pip install transformers datasets tqdm tree-sitter openai fairscale
fire sentencepiece backoff edit_distance pyserini
For the detailed information of data we used in our experiments,
please refer to README.md in ./data
.
For the detailed information of ReCo and GAR in our paper, please refer to
README.md in ./ReCo
.
For the detailed information of Code Style Distance in our paper, please refer to
README.md in ./metrics
.
If you found this repository useful, please consider citing:
@article{li2024rewriting,
title={Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search},
author={Li, Haochen and Zhou, Xin and Shen, Zhiqi},
journal={arXiv preprint arXiv:2401.04514},
year={2024}
}