Commands
User Guide
This document details the interface exposed by lmms_eval and provides details on what flags are available to users.
Command-line Interface
Equivalently, running the library can be done via the lmms_eval entrypoint at the command line.
This mode supports a number of command-line arguments, the details of which can be also be seen via running with -h or --help:
-
--model: Selects which model type or provider is evaluated. Must be a mdoels registered under lmms_eval/models. For example,--model qwen_vlor--model llava. -
--model_args: Controls parameters passed to the model constructor. Accepts a string containing comma-separated keyword arguments to the model class of the format"arg1=val1,arg2=val2,...", such as, for example--model_args pretrained=liuhaotian/llava-v1.5-7b,batch_size=1. For a full list of what keyword arguments, see the initialization of the corresponding model class inlmms_eval/models/. -
--tasks: Determines which tasks or task groups are evaluated. Accepts a comma-separated list of task names or task group names. Must be solely comprised of valid tasks/groups. You can use--tasks listto see all the available tasks. If you add your own tasks but not shown on the list, you can try to set--verbosity=DEBUGto view the error message. You can also use--tasks list_with_numto check every tasks and the number of question each task contains. However,list_with_numwill download all the available datasets and may require lots of memory and time. -
--batch_size: Sets the batch size used for evaluation. Can be a positive integer or"auto"to automatically select the largest batch size that will fit in memory, speeding up evaluation. One can pass--batch_size auto:Nto re-select the maximum batch sizeNtimes during evaluation. This can help accelerate evaluation further, sincelm-evalsorts documents in descending order of context length. -
--output_path: A string of the formdir/file.jsonlordir/. Provides a path where high-level results will be saved, either into the file named or into the directory named. If--log_samplesis passed as well, then per-document outputs and metrics will be saved into the directory as well. -
--log_samples: If this flag is passed, then the model's outputs, and the text fed into the model, will be saved at per-document granularity. Must be used with--output_path. -
--limit: Accepts an integer, or a float between 0.0 and 1.0 . If passed, will limit the number of documents to evaluate to the first X documents (if an integer) per task or first X% of documents per task. Useful for debugging, especially on costly API models.