PowerArtist: RTL Design for Power Platform

ANSYSInc 9,322 views 30 slides Oct 16, 2014
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

PowerArtist™ includes production-proven RTL power analysis with interactive visual debug, analysis-driven automatic RTL power reduction, and a Tcl interface to the database enabling custom reports and tracking of power through regressions. PowerArtist generated models bridge the RTL and layout gap...


Slide Content

© 2014 ANSYS, Inc.6/23/2014 1
PowerArtist™: RTL Design-for-Power
Design Automation Conference 2014

© 2014 ANSYS, Inc.6/23/2014 2
Early Power Decisions High Impact
Power Reduction
100%
50%
0%
Large Impact Small Impact
RTL
Design
Logic
Synthesis
Physical
Design
Timing
Closure
•Power-Performance-Area Trade-offs
•Voltage / Power Domain Planning
•Block-level Clock and Data Gating
•Eliminate Redundant Activity
•Power Switch Sizing / Placement
•Clock Gater Cloning / Decloning
•Multi-Vt Optimization
•Power Integrity Verification
RTL Design-for-Power Low Power Implementation

© 2014 ANSYS, Inc.6/23/2014 3
RTL Power ↔ Gate-level Power
Design Specification
RTL Design
Gate-Level Design
Layout
~20 hours
~22 mins
Quicker Design Iterations Effective Design-for-Power
RTL Design
Gate-level Power
+
Adder
Register
Mux
RTL Power
Power-per-Function
Power-per-Gate

© 2014 ANSYS, Inc.6/23/2014 4
PowerArtist: RTL Design-for-Power Platform
RTL Power
Analysis
•Average, time-based
•Power-critical vector selection
•Regressions via TCL interface
RTL Power
Reduction
•Clock, memory, logic
•Analysis-driven automation
•Interactive power debug
RTL Links
with Physical
•PACE™: RTL power accuracy
•RPM™: RTL-driven physical power integrity
Physical
Power
RTL Power
PACE RPM

© 2014 ANSYS, Inc.6/23/2014 5
RTL Power: Ins and Outs
Vdd1
Power domains
(UPF / CPF)
Vdd2
module PA (
...
always @ (posedge clk) begin
dout <= din1;
end
assign out = sel ? dout : din2;
...
endmoduleRTL
(VHDL, Verilog, System Verilog)
RTL Power
Analysis
Capacitance model
(WLM / PACE)
mu
x
and
register
register
Activity
(FSDB / VCD / SAIF)
Clock tree, gating
(SDC, PACE, user input)
clk
Power models
(Liberty .lib)
register
register
and
mux

© 2014 ANSYS, Inc.6/23/2014 6
Low Power RTL Design Methodology
Peak Power = 391mW
Check power vs. budget
TRANSMIT MODE RECEIVE MODE
Residual receive activity in
transmit mode
Profile power vectors
RTL Power Regression Flow
Monitor power vs. budgetReduce power automatically
Enabled Clock
Inactive Data
Debug power hotspots
Average power = 239mW
Perform design trade-offs
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
Power (W)
Version 2 (Typ)
Version 1 (Typ)
Version 2 (Idle)
Version 1 (Idle)
Version 1 Version 2

© 2014 ANSYS, Inc.6/23/2014 7
RTL vs. Gates: Accuracy and Performance
NvidiaCase Study
RTL Power: ~30X fasterRTL Power Accuracy: ~15%

© 2014 ANSYS, Inc.6/23/2014 9
RTL Capacity: Large Designs / FSDBs
Samsung Case Study
FSDB captures only power-critical
signals identified by PowerArtist
•FSDB size: 1/4
•TAT: 4X faster
•Loss of accuracy: 2%

© 2014 ANSYS, Inc.6/23/2014 10
RTL Power Analysis

© 2014 ANSYS, Inc.6/23/2014 11
PowerArtist RTL Power Analysis
•TotalLogic / Clock Activity
per Hierarchical Instance
•Qualify Coverage per Power
Mode
•Identify Power Bugs
•Understand Power: Where?
Why?
•Per Hierarchy, Category, Mode,
Clock / Voltage Domains
•Qualify Power Efficiency with
Multiple Metrics
Average Power AnalysisActivity Analysis
•Power Waveforms per
Hierarchical Instance
•Waveforms per Category:
Clock, Memory, Logic
•Identify Peak Power and
Time
Time-based Power Analysis

© 2014 ANSYS, Inc.6/23/2014 12
Clock Gating Efficiency
Temporal and Structural Metrics
Example
•16 of 20 bits are gated
•5of 10 cycles are gated
•2of 5enabled cycles had data toggles
gclk
clk
en
data
SCGE DCGE CGEE
Definition % Gated Bits % Gated Clock Cycles% Ideally Gated Cycles
Type of Metric Structural Temporal (en,clk) Temporal (data, en,clk)
Value 80% 50% 40%

© 2014 ANSYS, Inc.6/23/2014 13
Clock Gating Efficiency
Temporal and Structural Metrics
100% Static CGE
0% Dynamic CGE
CGEE,
Power Impact
CGE: Static, Dynamic
Flop: Power, Activity

© 2014 ANSYS, Inc.6/23/2014 14
RTL Power Reduction

© 2014 ANSYS, Inc.6/23/2014 15
PowerArtist RTL Power Reduction
Original RTLLow-Power RTL
openPDBpowerartist.pdb
set RPT [open $output_file"w"]
set ungated_registers[getRegisters-cg none]
foreachI $ungated_registers{
set dyn_power[getPropVal$iDynamic_Power"inst"]
set bit_width[getInstWidth$reg]
set file [getPropVal$iFile_Name"inst"]
set line_num[getPropVal$iLine_Number"inst"]
}
1. Interactive Power
Debug
2. Automated Power
Reduction
3. Customizable Power
Reports
•Block-level Power “Bugs”
•Large Power Savings
•Instance-level Power Reduction
•15 Analysis-driven Techniques
•TCL Queries to OADB
•Automation Beyond
PowerArtist Reports

© 2014 ANSYS, Inc.6/23/2014 16
Debug Power: Visualize-Analyze-Reduce
Inactive Data, Active Clock
Identify Block-level Clock Gating Enable

© 2014 ANSYS, Inc.6/23/2014 17
Block-Level Power Reduction
Clock Active, Data Inactive
Clock Inactive, Data Active
Block-level
Clock Gating
Block-level
Data Gating
Block-level Activity Analysis:
Clock and Data Ports
1.1 Clock Pins
-------------------------------------------------------
Redundant Total Pin Mode Instance
Cycles CyclesName Name Name
-------------------------------------------------------
200 201 CLKA read top.core1.t1.dpmem.m1
-------------------------------------------------------
1.2 Input and Redundant Pins
-------------------------------------------------------
Redundant Total Pin Mode Instance
Toggles TogglesName Name Name
-------------------------------------------------------
1 1 AB[8] read top.core1.t1.dpmem.m1
-------------------------------------------------------
Wasted Activity
per Mode
Clock Activity per
Hierarchy
Constant high activity
Missed clock gating?
Redundant activity
in read mode

© 2014 ANSYS, Inc.6/23/2014 18
Instance-Level Power Reduction
•Clock gating coverage
•Clock gating efficiency
•Sequential and combinational
•Redundant activity
•Don’t care conditions
•Datapath operand isolation
•Redundant read/write
•Splitting memories
•Exercising sleep modes
Clock / Clock GatingControl Logic and DatapathMemory Subsystem

© 2014 ANSYS, Inc.6/23/2014 19
Analysis-Driven RTL Power Reduction
Wasted activity/power when selis 0

© 2014 ANSYS, Inc.6/23/2014 20
Analysis-Driven RTL Power Reduction
Pre-compute based new clock gate enables
Multi-cycle ODC sequential analysis

© 2014 ANSYS, Inc.6/23/2014 21
Analysis-Driven RTL Power Reduction
Pre-compute based new clock gate enables
Multi-cycle ODC sequential analysis0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1112131415161718191101111121131141151161171181191201211221231241251261271281291
Predicted Power Savings
(normalized)
# RTL Changes (Design Effort)
Top 5 RTL changes 
50% identified power savings
Maximize Power Savings
Minimize Design Impact
•Clock, Memory, Logic
•Sequential, Combinational
•Vector-based, Vectorless
•Hierarchical, SoC capacity
15 Power Reduction Techniques

© 2014 ANSYS, Inc.6/23/2014 22
Power Reduction Case Studies
…..
1
0
A
B
scan_enable= 0
scan_clock
data_in
M_OUT
Write ReadWrite
MUX Reduction Technique:
•Scan clocks toggling in functional mode
•Redundant data activity in registers wasting power
Redundant Data Toggles
GMC Technique:
•Redundant data toggles in
read mode
•Cycle-based analysis reports
% Redundant Cycles

© 2014 ANSYS, Inc.6/23/2014 23
Power Database Access with TCL API
Power Database
(OpenAccess)
Design Queries
•getMemories/Flops/Combs
•getFanout
•getModulePorts
•reportDesignStats
Report Creation
•reportCGEfficiency
•diffPdbPower
•reportPower
•reportReductions
Power Queries
•getPropValinstance/net
•getClockPower
•getNetPower
•getClockEnableExpr
Design Navigation
•dls
•dpwd, dcd
•dpushd, dpopd
•show
Customize and Automate Power Reduction, Reports, Regressions
•Quick access to power and design properties
•Accomplish custom tasks with few lines of TCL

© 2014 ANSYS, Inc.6/23/2014 24
Custom Power Reports
50% Idle Power Reduction in Mobile SoC
Instance Name
Enable
EfficiencyClock PowerClock En Net
or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk
or1200_cpu.or1200_ctrl.ckg50.1 1.36E-03 gclk_blkor1200_cpu.or1200_ctrl.n1
en_blk
clk
data
gclk_blk
Inefficient enables waste power
en_blk
clk
gclk_blk
Block
Clock
Gate
en_reg
Register
Clock
Gate
gclk_reg
Block-level clock gates control
significant power
Single clock gate controls >5mWPower Efficiency = 0
PowerArtist clock gating report identifies inefficient clock gates

© 2014 ANSYS, Inc.6/23/2014 25
RTL Power Regressions
•30+ blocks per typical SoC
•2+ vectors per block
•Vectors written for power: idle, active
•Daily block-level, weekly chip-level regressions
monitor power changes
•Power metrics track power efficiency
•PowerArtist identifies where power changed
RTL
(Verilog, SV, VHDL)
Testbench
Simulator
FSDB
RTL Power
Analysis, Reduction, Regression

© 2014 ANSYS, Inc.6/23/2014 26
RTL Links with Physical Design

© 2014 ANSYS, Inc.6/23/2014 27
PACE™: Physical-Aware RTL Power
Budgeting
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
•Clock Distribution
•Parasitics
•Multiple Vt
•Low-power Structures
•Optimization
PACE Models
(Cap, Clock)
Post-Layout
Gate-level Power
PACERTL Power
PACE Bridges the RTL vs. Layout Gap
Predictable RTL Power Accuracy

© 2014 ANSYS, Inc.6/23/2014 28
RTL PACE vs. Gate-Power: Mobile SoC@14nm
RTL-PACE Power within 20%
Total Power Correlation
Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation
Gate-SPEF vs. RTL-PACE
RTL-PACE Clock Power within 20%

© 2014 ANSYS, Inc.6/23/2014 29
RTL Power-Driven Power Integrity
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
•Shrinking geometries Increasing di/dt
•Gate vectors too late
•Layout late for changes
•Error-prone guesstimates
RTL Power
RPM Enables PDN Planning 
Early, Optimal, Robust
RTL Power
Model
RPM
Physical
Power Integrity

© 2014 ANSYS, Inc.6/23/2014 30
RPM Case Studies
RPM
CPM(Layout)+Pkg
CPM(RPM)+Pkg
Pkg only
RPM
Gate
FSDB
Vectorless
Peak = 6X Average Power
Di/dtevent not at the
same time as the peak
Peak and di/dtCycle Selection on a GPU Core
Frame: DIDT
Start time: 0.0817704
Finish time: 0.0817706
Average leakage for supply VDD: 0.00257393
Average power for supply VDD: 0.185336
Peak power for supply VDD: 0.219776
Frame: CYCLE_POWER
Start time: 0.0806005
Finish time: 0.0806007
Average leakage for supply VDD: 0.002569
Average power for supply VDD: 0.250168
Peak power for supply VDD: 0.266678
Early Voltage Drop Analysis Early Package Resonance Analysis

© 2014 ANSYS, Inc.6/23/2014 32
Related Presentations @ DAC2014
•Power Analysis Using PowerArtist for WaveLogic3 ASIC –
100Gbs Coherent Metro Optical Modem
•Achieving RTL Power Efficiency and Automated Power
Reduction
•Methods for Achieving RTL to Gate Power Consistency