- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
N/A
99% of FP32 (1 - WER, where WER=7.452253 714852645%)
1000 ms
99% of FP32 and 99.9% of FP32 (f1_score=90.87 4%)
130 ms
99% of FP32 and 99.9% of FP32 (AUC=80.25%)
• 4 Categories:
– Available – On-premise – Preview – RDI
• 2 Divisions:
– Closed :
▪ Intended to compare hardware platforms or software frameworks “apples-to-apples” and requires using the same model and optimizer as the reference implementation
– GPU – CPU/Memory – FPGA and others – Cloud
• Solution providers
– EX. Dell, publishing benchmarks on server/storage – Answering RFPs
• Anyone who like to contribute in MLPerf community
• inference_policies/blob/
• master/inference_rules.adoc
Area Vision Vision Vision
Speech
Language
Task
Image classification Object detection (large)
Model
Dataset
Open
Closed
Open
Closed
Open
Closed
Open
Closed
Open
Close
Open
Closed
Open
Closed
Datacenter
Edge
RESNET50
DLRM
BERT
SSDRESNET34
3D-UNET
RNN-T
Server
Offline
Server
Offline
99%
Resnet50-v1.5
ImageNet (224x224)
SSDResNet34
COCO (1200x1200)
QSL Size 1024 64
Medical image segmentation
3D UNET
BraTS 2019 (224x224x160)
16
Speech-to-text RNNT
Other systems
• 32 node AMD cluster, storage solutions, etc.
HPC and DL Engineering - what we do
• Design and build systems for HPC and Deep Learning workloads.
• What is the types of MLPerf runs
– Submission – submitter only – Re-run – Anyone can run
• What’s included in MLPerf
– Training and Inference
MLPerf introduction(cont.)
• Latest version is v0.7
• Benchmark suites
– Datacenter
▪ Server/Offline
– Edge
▪ Single Stream/Multiple stream
• More details:
• https:///mlperf/
• Systems include compute, storage, network, software, services, support.
• Integration with factory, software, services. • Power and performance analysis, tuning,
25.00 BLEU
Transformer
Language
NLP
Wikipedia 2020/01/01
0.712 Mask-LM accuracy
BERT
Commerce
Recommendation 1TB Click Logs
0.8025 AUC
DLRM
Research
Reinforcement learning
1TB Click Logs 204800
Quality
99% of FP32 (76.46%)
99% of FP32 (0.20 mAP)
99% of FP32 and 99.9% of FP32 (0.85300 mean DICE score)
Server latency constraint 15 ms 100 ms
Language processing
BERT
Librispeech dev-clean (samples < 15 seconds)
2513
SQuAD v1.1 (max_seq_len= 384)
10833
Commerce
Recommendation DLRM
10 of Y
© Copyright 2 019 Dell Inc.
Object detection (light weight)
COCO
Object detection (heavy weight)
COCO
Translation (recurrent)
Translation (nonrecurrent)
WMT EnglishGerman
WMT EnglishGerman
– Open:
▪ Intended to foster faster models and optimizers and allows any ML approach that can reach the target quality.
• 2 Benchmark suites:
– Datacenter – Edge
30 ms
Starting point
• Join the MLPerf community(https:///get-involved/)
– Email distribution list
▪ Join the forum ▪ Join the working group
Example: Inference->Available->closed->DC->DLRM->Off
Training
Inference
Available
On-premise
Preview
RDI
Available
On-premise
Preview
RDI
Open
Closed
• https:///mlperf/ • training_policies/blob/ • master/training_rules.adoc
Area Vision Vision Vision Language Language
Benchmark
Dataset
Image classification ImageNet
▪ Cover different DL domains ▪ Proper metrics (training time, accuracy, latency) ▪ Real datasets
• What is the goal of MLPerf
– Fair and useful benchmarks for measuring training and inference performance of MLhardware, software, and services.
best practices, trade-offs. • Focus on application performance. • Vertical solutions. • Research and proof of concept studies. • Publish white papers, blogs, conference
• TOP500-class system based on Intel Scalable Systems Framework (OPA, KNL, Xeon, OpenHPC)
• 424 nodes dual Intel Xeon Gold processors, OmniPath fabric.
• +160 Intel Xeon Phi (KNL) servers. • Over 1 PF combined performance! • #265 on Top500 June 2018, 1.86 PF theoretical peak • Lustre, Isilon H600, Isilon F800 and NSS storage • Liquid cooled and air cooled
Quality Target
75.90% classification
Reference Implementation Model
ResNet-50 v1.5
23.0% mAP
SSD
0.377 Box min AP and 0.339 Mask min AP
Mask R-CNN
24.0 Sacre BLEU NMT
99.9%
99%
99.9%
MLPerf Training
• The MLPerf training benchmark suite measures how fast a system can train ML
models. • Latest version is v0.7 • More details:
World-class infrastructure in the Innovation Lab
13K ft.2 lab, 1,300+ servers, ~10PB storage dedicated to HPC in collaboration with the community
Zenith
Rattler
• Research/development system with Mellanox, NVIDIA and Bright Computing
• 88 nodes with EDR InfiniBand and Intel Xeon Gold processors
• 32x PowerEdge C4140 nodes with 4x NVIDIAGPUs
Go
50% win rate vs. checkpoint
Mini Go (based on Alpha Go paper)
MLPerf Inference
• The MLPerf inference benchmark measures how fast a system can perform ML inference using a trained model.
MLPerf 基准套件技术概述
DemystifyingMLPerfBenchmarkSuite
Agenda
• Target Audience • About us • MLPerf introduction • How to start
Target Audience
• Anyone who has interest on Deep Learning coding • Anyone who want to have a reference or baseline for system purchase
papers () • Access to the systems in the lab
MLPerf introduction
• What is MLPerf
– A open sourced ML benchmark suite for measuring performance of ML frameworks, ML hardware accelerators, and ML cloud platforms.