A method to evaluate the response of lightweight LLMs to TRUE-FALSE questions
Model | Hyperparameters |
---|---|
llama-3.2-1b-instruct-q8_0 | 1.24 B |
llama-3.2-3b-instruct-q8_0 | 3.21 B |
Phi-3.5-mini-instruct.Q8_0 | 3.82 B |
Mistral-7B-Instruct-v0.3.Q8_0 | 7.25 B |
llama-3.2-8b-instruct-q8_0 | 8.03 B |
- https://github.com/google-research-datasets/boolean-questions
- Train dataset: 9427 labeled training examples.
- Dev dataset: 3270 labeled dev examples.
- llama-cpp-python
- pathlib
- pandas
- math
- jsonlines
This research work has been done thanks to the computer resources of Wikimedia Switzerland.