Skip to content

v0.1.19

Compare
Choose a tag to compare
@av av released this 13 Sep 10:31
· 172 commits to main since this release

v0.1.19 - lm-evaluation-harness integration

DOI

This project provides a unified framework to test generative language models on a large number of different evaluation tasks.

Starting

# [Optional] pre-build the image
harbor build lmeval

Refer to the configuration for Harbor services

# Run evals
harbor lmeval --tasks gsm8k,hellaswag

# Open results folder
harbor lmeval results

Full Changelog: v0.1.18...v0.1.19