Byte MLPerf Inference Benchmark Tool

Byte MLPerf（推理）是字节使用的一个基准套件，用于测量推理系统在各种部署场景中运行模型的速度。相比MLPerf，Byte MLPerf有如下特点：

模型和运行环境会更贴近真实业务；
对于新硬件，除了评估性能和精度之外，同时也会评估图编译的易用性、覆盖率等指标；
在开放Model Zoo上测试所得的性能和精度，会作为新硬件引入评估的参考；

厂商可以参考该文档接入测试：ByteMLPerf厂商接入指南 [English Version]

Usage

用户使用入口为launch.py, 在使用byte mlperf评估时，只需传入--task 、--hardware_type 两个参数，如下所示：

python3 launch.py --task xxx --hardware_type xxx

tasks --task 参数为传入的workload 名字，需要指定评估workload，例如：若要评估 open_bert-tf-fp16.json 定义的 workload，则需指定 --task open_bert-tf-fp16 。注：所有workload定义在byte_mlperf/workloads下，传参时名字需要和文件名对齐。目前格式为model-framework-precision。
hardware_type --hardware_type 参数为传入的hardware_type 名字，无默认值，必须用户指定。例如：若要评估 Habana Goya ，则需指定 --hardware_type GOYA 。注：所有hardware type定义在byte_mlperf/backends下，传参时名字需要和folder名对齐。
compile_only --compile_only 参数将在模型编译完成后停止任务
show_task_list --show_task_list 参数会打印所有任务名字
show_hardware_list --show_hardware_list 参数会打印目前所有支持的硬件Backend名称

Workload说明

一个workload定义需包含如下字段:

{
    "model": "bert-torch-fp32",   //待评估模型的名字，需要和model_zoo名字对齐
    "test_perf": true,            //是否评估模型性能
    "test_accuracy": true,        //是否评估模型精度
    "test_numeric": true,         //精度：是否评估数值误差
    "clients": 3,                 //性能：提交数据的client threads
    "iterations": 100,            //性能：每个thread提交多少iteration
    "batch_sizes":[1,4,8,16,32],  //性能：每个thread提交数据时的bs
    "data_percent": 50,           //精度：使用百分多少数据集评估精度, [1-100]
    "compile_only": false,        //是否仅编译模型
}

Model Zoo List

Model Zoo&Dataset Model Zoo下收录了Byte MlPerf支持的模型，从访问权限上，目前分为内部模型、开放模型。随Byte MlPerf 发布的是对应版本收录的开放模型。 Dataset为模型需要用到数据集，对应的dataloader、accuracy_checker从结构上也归入Dataset。

开放模型收录原则：

基础模型：包含十分常见的Rn50、Bert和WnD；
业务类似：包含目前内部较多的、或结构相似的模型结构；
前沿模型：包含业务领域对应的SOTA模型；

此外，除了完整模型结构，Byte MlPerf还会加入一些典型模型子结构子图或OP（前提是开放模型无法找到合适的完整模型包含这类经典子结构），比如各不同序列长度的transformer encoder/decoder，各类常见conv op，如group conv、depwise-conv、point-wise conv，以及rnn 常见结构，如gru/lstm等。

Model	Domain	Purpose	Framework	Dataset	Precision
resnet50-v1.5	cv	regular	tensorflow, pytorch	imagenet2012	fp32
bert-base	nlp	regular	tensorflow, pytorch	squad-1.1	fp32
wide&deep	rec	regular	tensorflow	criteo	fp32
videobert	mm	popular	onnx	cifra100	fp32
albert	nlp	popular	pytorch	squad-1.1	fp32
conformer	nlp	popular	onnx	none	fp32
roformer	nlp	popular	tensorflow	cail2019	fp32
yolov5	cv	popular	onnx	none	fp32
roberta	nlp	popular	pytorch	squad-1.1	fp32
deberta	nlp	popular	pytorch	squad-1.1	fp32
swin-transformer	cv	popular	pytorch	imagenet2012	fp32
gpt2	nlp	sota	pytorch	none	fp32
stable diffusion	cv	sota	onnx	none	fp32

Vendor List

目前支持的厂商Backend如下:

Vendor	SKU	Key Parameters	Supplement
Intel	Xeon	-	-
Stream Computing	STC P920	Computation Power:128 TFLOPS@FP16 Last Level Buffer: 8MB, 256GB/s Level 1 Buffer: 1.25MB, 512GB/s Memory: 16GB, 119.4GB/S Host Interface：PCIe 4, 16x, 32GB/s TDP: 160W	STC Introduction

Benchmark Summary

评估结果汇总 : QPS 视图

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.zh_CN.md

README.zh_CN.md

Byte MLPerf Inference Benchmark Tool

Usage

Workload说明

Model Zoo List

Vendor List

Benchmark Summary

Files

README.zh_CN.md

Latest commit

History

README.zh_CN.md

File metadata and controls

Byte MLPerf Inference Benchmark Tool

Usage

Workload说明

Model Zoo List

Vendor List

Benchmark Summary