Skip to content

Script and data for the manuscript titled "Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks"

License

Notifications You must be signed in to change notification settings

E-Nyamsuren/qLMM-Lua-Eval-Pipeline

Repository files navigation

qLMM-Lua-Eval-Pipeline

Script and data for the manuscript titled "Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks".

Files

  • ".env" contains variables used by the python scripts. Make sure to set the "MODEL_REP_PATH" variable with a path to a directory that contains the locally stored code LLMs.
  • "config.json" provides the list of used code LLMs and benchmarks.
  • "hfDatasetDownloader.py" downloads and formats the MultiPL-HumanEval, MultiPL-MBPP, and MCEVAL benchmarks from HuggingFace. The download benchmarks are stored inside the "benchmarks" directory.
  • "genPipe.py" script that loads code LLMs one by one and applies code generation tasks to them. All generated code is stored inside the "genOutput" directory.
  • "evalPipe.py" evaluates the Lua code generated by the code LLMs using several metrics mention in the manuscript. The evaluation results are stored inside the "evalOutput" directory.
  • "analysis.R" to analyze the content of the "evalOutput" directory

config.json structure

{
"name": "Unique name for the model.",
"family": "Model family name. Same at all quantization precisions.",
"id": "HuggingFace URI of the model.",
"max_tokens": "Maximum number of tokens to generate.",
"temp": "Temperature at which the model is run.",
"top_k": "Next token sampling rate.",
"eos": "End-Of-Sequence tokens",
"qBits": "Precision. 2, 4, 8 for integer quantization precision and 16 for half-precision floating point.",
"skip": "0 or 1. If 1 the model will be ignored by genPipe.py."
}

{
"name": "benchmark name",
"id": "jsonl file with the benchmark",
"sample": "None or integer number. If integer number N then a random sample with N tasks will be used to evaluate the models",
"skip": "0 or 1. If 1 the benchmark will be ignored by genPipe.py."
}

Setting the environment

The Python script requires following packages:

  • dotenv
  • pathlib
  • airium
  • pandas
  • llama-cpp-python

Download the code LLMs and store them locally. The list of code LLMs is available inside the config.json file. Inside the .env file, make sure to set the "MODEL_REP_PATH" variable to the directory path that contains the locally stored code LLMs.

Before running this script ensure that

Order of execution

  1. hfDatasetDownloader.py
  2. genPipe.py
  3. evalPipe.py
  4. analysis.R

About

Script and data for the manuscript titled "Evaluating Quantized Large Language Models for Code Generation on Low-Resource Language Benchmarks"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages