Script and data for the manuscript titled "Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks".
- ".env" contains variables used by the python scripts. Make sure to set the "MODEL_REP_PATH" variable with a path to a directory that contains the locally stored code LLMs.
- "config.json" provides the list of used code LLMs and benchmarks.
- "hfDatasetDownloader.py" downloads and formats the MultiPL-HumanEval, MultiPL-MBPP, and MCEVAL benchmarks from HuggingFace. The download benchmarks are stored inside the "benchmarks" directory.
- "genPipe.py" script that loads code LLMs one by one and applies code generation tasks to them. All generated code is stored inside the "genOutput" directory.
- "evalPipe.py" evaluates the Lua code generated by the code LLMs using several metrics mention in the manuscript. The evaluation results are stored inside the "evalOutput" directory.
- "analysis.R" to analyze the content of the "evalOutput" directory
{
"name": "Unique name for the model.",
"family": "Model family name. Same at all quantization precisions.",
"id": "HuggingFace URI of the model.",
"max_tokens": "Maximum number of tokens to generate.",
"temp": "Temperature at which the model is run.",
"top_k": "Next token sampling rate.",
"eos": "End-Of-Sequence tokens",
"qBits": "Precision. 2, 4, 8 for integer quantization precision and 16 for half-precision floating point.",
"skip": "0 or 1. If 1 the model will be ignored by genPipe.py."
}
{
"name": "benchmark name",
"id": "jsonl file with the benchmark",
"sample": "None or integer number. If integer number N then a random sample with N tasks will be used to evaluate the models",
"skip": "0 or 1. If 1 the benchmark will be ignored by genPipe.py."
}
The Python script requires following packages:
- dotenv
- pathlib
- airium
- pandas
- llama-cpp-python
Download the code LLMs and store them locally. The list of code LLMs is available inside the config.json file. Inside the .env file, make sure to set the "MODEL_REP_PATH" variable to the directory path that contains the locally stored code LLMs.
Before running this script ensure that
- Lua 3.5 or higher is installed
- Path to Lua is added to the environmental variable
- the LuaUnit package is installed
- install the CLOC tool to count lines of code (winget install AlDanial.Cloc): https://github.com/AlDanial/cloc?tab=readme-ov-file
- hfDatasetDownloader.py
- genPipe.py
- evalPipe.py
- analysis.R