This document details the interface exposed by lmms_eval
and provides details on what flags are available to users.
Equivalently, running the library can be done via the lmms_eval
entrypoint at the command line.
This mode supports a number of command-line arguments, the details of which can be also be seen via running with -h
or --help
:
-
--model
: Selects which model type or provider is evaluated. Must be a mdoels registered under lmms_eval/models. For example,--model qwen_vl
or--model llava
. -
--model_args
: Controls parameters passed to the model constructor. Accepts a string containing comma-separated keyword arguments to the model class of the format"arg1=val1,arg2=val2,..."
, such as, for example--model_args pretrained=liuhaotian/llava-v1.5-7b,batch_size=1
. For a full list of what keyword arguments, see the initialization of the corresponding model class inlmms_eval/models/
. -
--tasks
: Determines which tasks or task groups are evaluated. Accepts a comma-separated list of task names or task group names. Must be solely comprised of valid tasks/groups. You can use--tasks list
to see all the available tasks. If you add your own tasks but not shown on the list, you can try to set--verbosity=DEBUG
to view the error message. You can also use--tasks list_with_num
to check every tasks and the number of question each task contains. However,list_with_num
will download all the available datasets and may require lots of memory and time. -
--batch_size
: Sets the batch size used for evaluation. Can be a positive integer or"auto"
to automatically select the largest batch size that will fit in memory, speeding up evaluation. One can pass--batch_size auto:N
to re-select the maximum batch sizeN
times during evaluation. This can help accelerate evaluation further, sincelm-eval
sorts documents in descending order of context length. -
--output_path
: A string of the formdir/file.jsonl
ordir/
. Provides a path where high-level results will be saved, either into the file named or into the directory named. If--log_samples
is passed as well, then per-document outputs and metrics will be saved into the directory as well. -
--log_samples
: If this flag is passed, then the model's outputs, and the text fed into the model, will be saved at per-document granularity. Must be used with--output_path
. -
--limit
: Accepts an integer, or a float between 0.0 and 1.0 . If passed, will limit the number of documents to evaluate to the first X documents (if an integer) per task or first X% of documents per task. Useful for debugging, especially on costly API models.