A method to evaluate the response of lightweight LLMs to TRUE-FALSE questions across languages
The supported natural languages are the ones that have been featured as the best performing ones on Google Translation:
- English
- Afrikaans
- German
- Portuguese
- Spanish
- Polish
Model | Hyperparameters |
---|---|
llama-3.2-3b-instruct-q8_0 | 3.21 B |
Phi-3.5-mini-instruct.Q8_0 | 3.82 B |
- https://github.com/google-research-datasets/boolean-questions
- Train dataset: 9427 labeled training examples.
- Dev dataset: 3270 labeled dev examples.
- llama-cpp-python
- pathlib
- pandas
- math
This research work has been done thanks to the computer resources of Wikimedia Switzerland.