The MLPerf inference submission rules are spread between the MLCommons policies and the MLCommons Inference policies documents. Further, the rules related to power submissions are given here. The below points are a summary taken from the official rules to act as a checklist for the submitter - please see the original rules for any clarification.
- MLCommons inference results can be submitted on any hardware and we have past results from Raspberry Pi to high-end inference servers.
- Closed category submission for datacenter category needs ECC RAM and also needs to have the networking capabilities as detailed here
- Power submissions need an approved power analyzer.
- Closed submission needs performance and accuracy run for all the required scenarios (as per edge/datacenter category) with accuracy within 99% or 99.9% as given in the respective task READMEs. Further, the model weights are not supposed to be altered except for quantization. If any of these constraints are not met, the submission cannot go under closed division but can still be submitted under open division.
- Reference models are mostly fp32 and reference implementations are just for reference and not meant to be directly used by submitters as they are not optimized for performance.
- Calibration document due one week before the submission deadline
- Power submission needs a power analyzer (approved by SPEC Power) and EULA signature to get access to SPEC PTDaemon
- To submit under the
available
category your submission system must be available (in whole or in parts and either publicly or to customers) and the software used must be either open source or an official or beta release as on the submission deadline. Submissions using nightly release for example cannot be submitted under the available category.
MLPerf inference submissions are expected to be run on various hardware and supported software stacks. Therefore, MLCommons provides only reference implementations to guide submitters in creating optimal implementations for their specific software and hardware configurations. Additionally, all implementations used for MLPerf inference submissions are available in the MLCommons Inference results repositories (under closed/<submitter>/code
directory), offering further guidance for submitters developing their own implementations.
- Closed submission under datacenter needs offline and server scenario runs with a minimum of ten minutes needed for both.
- Closed submission under the edge category needs single stream, multi-stream (only for R50 and retinanet), and offline scenarios. A minimum of ten minutes is needed for each scenario.
- Further two (three for ResNet50) compliance runs are needed for closed division, each taking at least 10 minutes for each scenario.
- SingleStream, MultiStream and Server scenarios use early stopping and so can always finish around 10 minutes
- Offline scenario needs a minimum of 24756 input queries to be processed -- can take hours for low-performing models like 3dunet, LLMs, etc.
- Open division has no accuracy constraints, no required compliance runs, and can be submitted for any single scenario. There is no constraint on the model used except that the model accuracy must be validated on the accuracy dataset used in the corresponding MLPerf inference task or must be preapproved.
- Power submission needs an extra ranging mode to determine the peak current usage and this often doubles the overall experiment run time. If this overhead is too much, ranging run can be reduced to 5 minutes run using mechanisms like this.
- MLCommons Inference submission checker is provided to ensure that all submissions are passing the required checks.
- In the unlikely event that there is an error on the submission checker for your submission, please raise a GitHub issue here
- Any submission passing the submission checker is valid to go to the review discussions but submitters are still required to answer any queries and fix any issues being reported by other submitters.
- Ensure that the
system_desc_id.json
file is having meaningful responses -submission_checker
only checks for the existence of the fields. - For power submissions,
power settings
andanalyzer table
files are to be submitted, and even though the submission checker checks for the existence of these files, the content of these files must be checked manually for validity. - README files in the submission directory must be checked to make sure that the instructions are reproducible.
- For closed datacenter submissions, ECC RAM and Networking requirements must be ensured.
- Submission checker might be reporting warnings and some of these warnings can warrant an answer from the submitter.
- One new benchmark in the datacenter category: Mixtral-8x7B. No changes in the edge category.
- For power submissions, there is no code change.