HinglishEval

Artifact for HinglishEval: Evaluating the Effectiveness of Code-generation Models on Hinglish Prompts.

Prompting in Hinglish

We used GPT-4 to translate prompts in OpenAI's HumanEval Benchmark to Hinglish and manually verified and fixed these translations. Our benchmark, HinglishEval is available as a JSON file (HinglishEval.json) in this repository.

Why Hinglish?

Hindi is one of the most widely spoken languages in the world, and the most widely spoen in India. A majority of the population in India does not speak English as their first language, and therefore language models that can understand prompts in native languages are important for wider accessibility. Hinglish is a blend of Hindi and English, with frequent usage of English words in sentences with standard Hindi grammar. This is not representative of everyday spoken Hindi for most people, but is rather common in coversations involving technical language, especially in the context of programming.

Therefore it is most natural for Hindi speaking users to prompt LLMs in Hinglish when they want to generate code, or ask for help with programming in general (like explanations or debugging). This benchmark is an attempt to understand how well LLMs can understand and generate code when prompted in such a language.

Contributions and Usage

The HinglishEval Benchmark

The HinglishEval benchmark contains all the problems in the HumanEval benchmark, with their prompts translated to Hinglish. The translation does not modify function signatures or doctests, and is limited to the purpose statement (supplied as a docstring in Python) of each function. The translations were manually verified and corrected to ensure that they sound like idiomatic Hinglish.

Code Samples for 18 models

We have publicly released completions generated from 18 models on the prompts in the HinglishEval benchmark.

Evaluation of models on HinglishEval

We evaluate models on the HinglishEval dataset using the pass@1 metric as well as Item Response Theory (IRT).

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
irt		irt
samples		samples
translation		translation
.gitignore		.gitignore
HinglishEval.json		HinglishEval.json
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HinglishEval

Prompting in Hinglish

Why Hinglish?

Contributions and Usage

The HinglishEval Benchmark

Code Samples for 18 models

Evaluation of models on HinglishEval

About

Releases 3

Packages

Contributors 7

Languages

mrigankpawagi/HinglishEval

Folders and files

Latest commit

History

Repository files navigation

HinglishEval

Prompting in Hinglish

Why Hinglish?

Contributions and Usage

The HinglishEval Benchmark

Code Samples for 18 models

Evaluation of models on HinglishEval

About

Resources

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 7

Languages

Packages