v0.1.19
v0.1.19 - lm-evaluation-harness integration
This project provides a unified framework to test generative language models on a large number of different evaluation tasks.
Starting
# [Optional] pre-build the image
harbor build lmeval
Refer to the configuration for Harbor services
# Run evals
harbor lmeval --tasks gsm8k,hellaswag
# Open results folder
harbor lmeval results
Full Changelog: v0.1.18...v0.1.19