The repo of ESEC/FSE 2022 paper "No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence"
In this report, we upload all three tasks that can also be introduced in detail at CodeXGlue.
You can design and experiment different prompt templates by yourself :).
Firstly download the dataset.
cd dataset
pip install gdown
gdown https://drive.google.com/uc?id=1x6hoF7G-tSYxg8AFybggypLZgMGDNHfF
cd ..
We provide a prompt version and fine-tuning version.
To prompt tuning a CodeBERT, just
cd defect/prompt
python codebert.py
To prompt tuning a CodeT5:
cd defect/prompt
python prompt_t5_2.py --visible_gpu <GPU> --data_dir=../dataset --max_source_length 512 --max_target_length 3
To fine-tune a CodeT5, we provide the official and our implementation of CodeT5 repo in
cd defect/finetune
Download the dataset, where {LANG} can be one of six programming languages.
cd summarization/data
wget https://s3.amazonaws.com/code-search-net/CodeSearchNet/v2/{LANG}.zip
unzip {LANG}.zip
python preprocess.py
To fine-tune or prompt tuning CodeT5 and try some different templates by yourself :)
cd summarization
python finetune_t5_gene.py --visible_gpu <GPU> --lang {LANG} --max_source_length 256 --max_target_length 128 --log_name=./log/{LANG}.log
For prompt tuning, use prompt_t5.py.
Download dataset from CodeXGlue dataset:
cd translation/data
python preprocess.py
The running command is similar to code summarization for fine-tune and prompt tuning.
Template | Verbalizer | ACC |
---|---|---|
[x] the code is [Z] | bad, defective&clean, perfect | 63.68 |
the code [x] is [z] | bad, defective&clean, perfect | 64.17 |
[x] it is [z] | bad, defective&clean, perfect | 63.98 |
a [z] code [x] | bad, defective&clean, perfect | 63.36 |
the code [x] is [z] | yes&no | 63.08 |
the code [x] is [z] | bad, defective&indefective, perfect | 64.28 |
the code [x] is [z] | bad&perfect | 63.71 |
the code [x] is [z] | bad, defective, insecure&clean, perfect, secure | 63.26 |
the code [x] is [z] | bad, defective, insecure, vulnerable&clean, perfect, secure,invulnerable | 63.10 |
[SOFT] [z] [SOFT] [x] | bad, defective&clean, perfect | 62.95 |
[x] [SOFT]*2 [z] | bad, defective&clean, perfect | 62.77 |
[x] [SOFT]*3 [z] | bad, defective&clean, perfect | 63.15 |
[SOFT]*10 [x] [z] | bad, defective&clean, perfect | 62.52 |
[SOFT]*50 [x] [z] | bad, defective&clean, perfect | 62.96 |
[SOFT]*100 [x] [z] | bad, defective&clean, perfect | 62.46 |
CodeT5-small | ACC |
---|---|
Defect [X] [Z] | 63 |
prefix 50 | 62.34 |
prefix 100 | 62.65 |
prefix 150 | 63.52 |
prefix 200 | 63.91 |
prefix 250 | 63.77 |
CodeT5-base | ACC |
---|---|
Defect [X] [Z] | 64.98 |
prefix 50 | 64.59 |
prefix 100 | 64.7 |
prefix 150 | 65.66 |
prefix 200 | 65.82 |
prefix 250 | 65.64 |
Ruby | JavaScript | Go | Python | Java | PHP | Overall | |
---|---|---|---|---|---|---|---|
codet5-small | Fine-tuning | 13.38 | 14.94 | 21.27 | 17.88 | 18.38 | 24.70 |
Codet5-small | Prompt tuning | 13.60 | 15.91 | 22.33 | 18.34 | 20.60 | 26.95 |
codet5-base | Fine-tuning | 13.70 | 15.80 | 22.60 | 17.97 | 19.56 | 25.77 |
codet5-base | Prompt tuning | 14.29 | 16.04 | 23.11 | 18.52 | 19.72 | 27.06 |
Python | 100 | 200 | 300 | 500 | 1000 | 1% |
---|---|---|---|---|---|---|
CodeT5-small | 5.42 | 7.62 | 7.89 | 11.58 | 13.23 | 14.01 |
CodeT5-small+PT | 6.55 | 9.28 | 9.6 | 12.73 | 13.89 | 14.33 |
CodeT5-base | 5.8 | 8.46 | 9.36 | 13.58 | 13.86 | 14.22 |
CodeT5-base+PT | 7.82 | 10.78 | 12.63 | 14.77 | 14.78 | 14.81 |
Ruby | 100 | 200 | 300 | 500 | 1000 | 1% |
CodeT5-small | 4.82 | 6.75 | 7.22 | 9.46 | 9.85 | 9.99 |
CodeT5-small+PT | 6.48 | 7.89 | 8.26 | 10.89 | 10.91 | 10.85 |
CodeT5-base | 4.93 | 6.83 | 7.19 | 10.1 | 11.22 | 10.36 |
CodeT5-base+PT | 6.99 | 8.52 | 9.41 | 10.79 | 11.87 | 10.64 |
PHP | 100 | 200 | 300 | 500 | 1000 | 1% |
CodeT5-small | 6.41 | 9.5 | 11.89 | 13.21 | 16.71 | 17.25 |
CodeT5-small+PT | 7.9 | 12.23 | 14.13 | 16.26 | 17.47 | 17.88 |
CodeT5-base | 5.52 | 8.9 | 12.83 | 15.59 | 17.65 | 20.65 |
CodeT5-base+PT | 9.12 | 13.55 | 14.94 | 17.39 | 18.3 | 21.05 |
go | 100 | 200 | 300 | 500 | 1000 | 1% |
CodeT5-small | 5.24 | 7.18 | 8.65 | 12.99 | 15.05 | 17.65 |
CodeT5-small+PT | 7.2 | 11.51 | 12.42 | 14.32 | 16.88 | 17.95 |
CodeT5-base | 7.96 | 9.64 | 10.88 | 13.62 | 16.93 | 19.99 |
CodeT5-base+PT | 9.07 | 12.15 | 13.66 | 15.04 | 17.74 | 20.54 |
java | 100 | 200 | 300 | 500 | 1000 | 1% |
CodeT5-small | 2.7 | 3.86 | 5.33 | 6.94 | 7.88 | 10.12 |
CodeT5-small+PT | 3.56 | 5.89 | 7.35 | 9.9 | 10.44 | 11.18 |
CodeT5-base | 3.35 | 4.73 | 7.24 | 8.32 | 10.94 | 11.75 |
CodeT5-base+PT | 6.07 | 7.56 | 10.14 | 11.06 | 11.99 | 12.4 |
js | 100 | 200 | 300 | 500 | 1000 | 1% |
CodeT5-small | 3.56 | 5.48 | 6.97 | 7.73 | 8.36 | 9.81 |
CodeT5-small+PT | 5.9 | 7.58 | 8.76 | 9.6 | 10.14 | 11.58 |
CodeT5-base | 4.14 | 5.6 | 7.07 | 10 | 10.62 | 11.53 |
CodeT5-base+PT | 6.5 | 8.37 | 9.61 | 11.27 | 11.81 | 12.17 |
BLEU | Accuracy | CodeBLEU | BLEU | Accuracy | CodeBLEU | ||
---|---|---|---|---|---|---|---|
Naive copy | 18.69 | 0 | - | 18.54 | 0 | - | |
Transformer | 50.47 | 37.90 | 61.59 | 55.84 | 33.00 | 63.74 | |
RoBERTa (code) | 71.99 | 57.90 | 80.18 | 77.46 | 56.10 | 83.07 | |
CodeBERT | 72.14 | 58.00 | 79.41 | 79.92 | 59.00 | 85.10 | |
CodeT5-small | Fine-tuning | 78.67 | 65.40 | 82.55 | 82.29 | 63.80 | 87.01 |
CodeT5-small | Prompt tuning | 79.59 | 66.00 | 83.06 | 83.33 | 64.30 | 87.99 |
CodeT5-base | Fine-tuning | 79.45 | 66.10 | 83.96 | 83.61 | 65.30 | 88.32 |
CodeT5-base | Prompt tuning | 79.76 | 66.10 | 84.39 | 83.99 | 65.40 | 88.74 |