Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

CarloCdelMundo 1,464 views 15 slides May 01, 2017

Slide 1 of 15

About This Presentation

Slides for Google TPU Tensor Processing Unit 2017 ISCA. PDF (created with Keynote). Slides compiled by Carlo C. del Mundo ([email protected]).

Size: 4.05 MB

Language: en

Added: May 01, 2017

Slides: 15 pages

Slide Content

Motivation for TPU
•2006: “Just run DNNs
on our CPU datacenter.
It’s basically free.”
•2013: “3 minutes of
DNN-based voice
search == 2x more
datacenter compute.”

The Players
Norman P. Jouppi and
his two musketeers.
David Patterson
70+ other Google engineers

•30-80x TOPS/watt vs.
2015 CPUs and GPUs.
•8 GiB DRAM.
•8-bit ﬁxed point.
•256x256 MAC unit.
•Support for data
reordering, matrix
multiply, activation,
pooling, and
normalization.
Tensor Processing Unit (TPU)

“The unexpected desire for TPUs by many Google services combined with
the preference for low response time changed the equation, with
application writers often opting for reduced latency over waiting for
bigger batches to accumulate.”
Application Testbed

TPU Block Diagram & Floor Plan

Experimental Testbed
8x K80 GPUs

The Rooﬂine Model

Rooﬂines of TPU with DNN Apps

App breakdown by Performance Counters

Latency Results (99%ile)

Programming the TPU Programming FPGAs
TensorFlow 
graph
TPU host  
instructions
TPU
bitstream

NVIDIA’s Rebuttal to the TPU
https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/

“Patterson” Discussion
1. Fallacy: NN inference applications in data centers value throughput as much as response time.
2. Fallacy: The K80 GPU architecture is a good match to NN inference.
3. Pitfall: Architects have neglected important NN tasks.
4. Pitfall: For NN hardware, Inferences Per Second (IPS) is an inaccurate summary performance
metric.
5. Fallacy: The K80 GPU results would be much better if Boost mode were enabled.
6. Fallacy: CPU and GPU results would be comparable to the TPU if we used them more
eﬃciently or compared to newer versions.
7. Pitfall: Performance counters added as an afterthought for NN hardware.
8. Fallacy: After two years of software tuning, the only path left to increase TPU performance is
hardware upgrades.

“CNNs constitute only about 5% of the representative NN workload for
Google. More attention should be paid to MLPs and LSTMs. Repeating
history, it’s similar to when many architects concentrated on ﬂoating-
point performance when most mainstream workloads turned out to
be dominated by integer operations.”
Interesting quote

Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit

About This Presentation

Slide Content

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......