Skip to content

tomasmajercik/ai-code-completion-evaluation

Repository files navigation

Evaluation of AI code-competion for my own projects

📝 Project description

This project was created as an application task for internship at jetbrains. The task was to find model, that will generate the hidden part of a code from my own project(s). I was asked to create my own dataset which I've done with the help of automated python script, as it was not feasible doing manually. After that, I implemented a model form huggingface, which was prompted to generate this hidden part based on content before and after the hidden middle (hidden) part. To finish this task, I manually went trought the generated code and checked, whether the code is an exact match (generated code was identical to the hidden part), different but still working, or an garbage code unable to work properly.

🛠️ Tech stack

  • Python
  • Huggingface

🌱 Skills gained & problems overcomed

I got in touch with huggingface, finding and using suitable models, simple evaluations and dataset preparation

📊 Shorten example results

[
    // Correct output
    { 
        "prefix": "...cation/json; charset=UTF-8\");\n    session_start",
        "suffix": " \"root\", \"\", \"webTask\" ...",
        "real_middle": "();\n\n    $connection = mysqli_connect(\"localhost\",",
        "generated_middle": "();\n\n    $connection = mysqli_connect(\"localhost\",",
        "exact_match": true,
        "chrf": 100.0,
        "levenshtein_distance": 0,
        "label": "is_correct"
    },
    // Incorrect output
    {
        "prefix": "...$query = \"SELECT `ta",
        "suffix": "result = mysqli_query($connection, $query);\n ...",
        "real_middle": "g` FROM `tags` WHERE `username` = '$username'\";\n        $",
        "generated_middle": "g` FROM `tags` WHERE `fileName` = '$fileName'\";\n        $",
        "exact_match": false,
        "chrf": 62.37266202285458,
        "levenshtein_distance": 10,
        "label": "will_not_work"
    }
]

⚙️ How to install

  1. clone the repo
  2. create your own dataset(s) using provided python scripts or feel free to use mine
  3. run the main script, save results and look at the generated results

Project dependencies:

  • pip install transformers
  • pip install evaluate
  • pip install python-Levenshtein
  • pip install sacrebleu

📂 File structure:

  • /data_files_vI/II/III this folder contains files from three of my recent projects. These are the data used for dataset creation
  • /datasets contains randomly selected parts of the code to be completed by the model. Each dataset contains from 20 to 50 records with prefix, middle and sufix (code before hidden text, actual hidden code and the remainig code).
  • /evaluation contains main.py, which is a script, that runs the AI model (bigcode/starcoder2-3b) from huggingface hub, that generates the hidden part, and as a result create .json file containg:
    • prefix
    • suffix
    • original middle (the hidden part)
    • generated middle (generated part that is hidden from model)
  • and computed metrics:
    • exact_match
    • chrf
    • levenshtein_distance

There remaining files are:

  • generate_dataset.py is script that generates the datasets
  • load_dataset.py is script that loads dataset (for testing purposes only)
  • report.pdf is a report describing my thought process, findings and learnings

Model results are accessible in resultingDataset-wAnotations.json.
The "label" says, if the generated code is exact match and therefore is, for sure, correct; whether the code is changed, but still capable of working, or if the generated code is wrong and will lead to an error.

About

Created own dataset, ran model that generated hidden part from code, evaluated generated part

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published