Implementing Useful Clock Skew Using Skew Groups

miaofei 11,784 views 24 slides May 04, 2014
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

Using IC Compiler's skew groups to implement useful clock skew to improve timing closure.


Slide Content

Implementing Useful Skew
Using Skew Groups
Matthew Mei
Cisco Systems

2
Matthew Mei
•Overview of skew
•Example design affected by skew
•What is useful skew
•Using skew groups to achieve useful skew
•Experimental results of trials on example design
•Inserting clock buffers to achieve useful skew
•Comparing skew groups and buffer insertion
•Conclusions
Outline

3
Matthew Mei
Skew
Capture
Flip
Flop
Clock
Port
•Skew equals insertion delay at capture minus
insertion delay at launch
•The insertion delay from:
report_clock_timing -to <pin> -type latency
-setup
•Common path pessimism removal from:
report_crpr -from <pin1> -to <pin2> -setup
Launch
Flip
Flop

4
Matthew Mei
•40 nm technology being used

•The block was about 8000 µm × 4000 µm

•Block utilization was about 75%, while standard
cell utilization was only about 20% (~600K cells)

•The block was mostly Ternary Content
Addressable Memories (TCAMs), which are
large memory macros used for fast searches
The Example Design

5
Matthew Mei
Example Failing Path
(Diagram)
Memory
Capture
Flip
Flops
clk_core
•Thus, the skew is equal to:

1.0460 ns – 1.1783 ns = -0.132 ns

•Therefore, this timing path has -132 ps of skew
1.4831 ns 0.0000 ns
1.0460 ns 1.1783 ns

6
Matthew Mei
Example Failing Path
(Timing Report)
Path Type: max

Point Incr Path
----------------------------------------------------------
clock clk_core (rise edge) 0.0000 0.0000
clock network delay (propagated) 1.1783 1.1783
w/m_36x1/CLK 0.0000 1.1783 r
w/m_36x1/QXY[13] 1.4831 2.6614 f
w/r0_data_read1_s_36x1_13 _ (net) 0.0000 2.6614 f
w/r1_data_read1_s_36x1_reg_13 _/D 0.0000 & 2.6614 f
data arrival time 2.6614

clock clk_core (rise edge) 1.6670 1.6670
clock network delay (propagated) 1.0460 2.7130
clock uncertainty -0.0580 2.6550
w/r1_data_read1_s_36x1_reg_13_/CK 0.0000 2.6550 r
library setup time -0.1197 2.5353
data required time 2.5353
----------------------------------------------------------
data required time 2.5353
data arrival time -2.6614
----------------------------------------------------------
slack (VIOLATED) -0.1261

7
Matthew Mei
Example Failing Path
(Layout)
•Pipeline flops already added and magnet placed

8
Matthew Mei
Using Skew Groups to Achieve
Useful Skew
TCAMs
Pipeline
Flip
Flops
clk_core
•To improve the setup timing performance, delay
can be added to the red clock path
•Tried to achieve the target skew using skew
groups
•Also tried manual buffer insertion (later)
Target Skew

9
Matthew Mei
Skew Groups
•Skew groups were defined before clock tree
synthesis
•The following commands were used before
clock_opt to create a skew group:
set_skew_group -name <name> -target_skew <skew>
<pins list>
report_skew_group -name <name>
commit_skew_group
•The pins list in the example design included the
clock pins of about 8000 flip flops
•Tried 50 ps, 120 ps, 200 ps, 240 ps, 300 ps

10
Matthew Mei
Skew Groups
Effective Skew vs. Target Skew
-0.05
0
0.05
0.1
0.15
0.2
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Effective Skew (ns)

Target Skew (ns)
Effective Skew vs. Target Skew
Clock Opt Effective Skew
Route Opt Effective Skew
Post Route Effective Skew

11
Matthew Mei
Skew Groups
Setup Timing Performance
-700
-600
-500
-400
-300
-200
-100
0
-0.18
-0.16
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
0 0.05 0.1 0.15
Negative Slack (ns)

Effective Skew (ns)
Negative Slack vs. Effective
Skew
WNS
TNS
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 0.05 0.1 0.15
Failing Paths

Effective Skew (ns)
Failing Paths vs. Effective
Skew

12
Matthew Mei
Skew Groups
Hold Timing Performance
0
20
40
60
80
100
120
140
0 0.05 0.1 0.15
Failing Paths

Effective Skew (ns)
Failing Hold Paths vs.
Effective Skew
-1.8
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
0 0.05 0.1 0.15
Negative Slack (ns)

Effective Skew (ns)
Negative Hold Slack vs.
Effective Skew
Worst Hold
Total Hold

13
Matthew Mei
Skew Groups
Path Skew Distribution
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
Number of Flops (Cumulative)

Skew of Individual Path (ns)
Cumulative Distribution of Path Skew Among Skew
Group Flip Flops
Effective Skew 0.005 ns
Effective Skew 0.085 ns
Effective Skew 0.121 ns
Effecitve Skew 0.138 ns

14
Matthew Mei
•Using skew groups causes the clock tree to
branch out at an early level
•The TCAMs and the pipeline flip flops had zero
common path pessimism removed
•More complex clock tree, more cells and routing
Skew Groups
Effects on Clock Tree

15
Matthew Mei
Skew Groups
Clock Tree Cells and Buffer Area
23000
24000
25000
26000
27000
28000
29000
5950
6000
6050
6100
6150
6200
6250
6300
6350
6400
6450
Control 0.05 0.12 0.2 0.24 0.3
Buffer Area (
µm
2
)

Number of Clock Cells

Target Skew (ns)
Clock Tree vs. Target Skew
Buffer Area
Clock Cells
•Increased clock tree size by about 250 cells

16
Matthew Mei
Skew Groups
Power Consumption
0
0.2
0.4
0.6
0.8
1
1.2
0
1
2
3
4
5
6
7
8
0.05 0.12 0.2 0.24 0.3
Increase in Total Power (%)

Increase in Clock Tree Power (%)

Target Skew (ns)
Power Increase vs. Target Skew
Percent Total Power Increase
Percent Clock Tree Power Increase
•On average, increase by 5.16% in clock tree and
0.66% in total block power consumption

17
Matthew Mei
Manual Buffer Insertion to Achieve
Useful Skew
TCAMs
Pipeline
Flip
Flops
clk_core
•The instinctive way of inserting delay is to
manually insert clock buffers:
insert_buffer –no_of_cells <num buffers> <pins
list> <buffer type>
•The target skew is determined by the number
and type of buffers, not by numerical value
Target Skew

18
Matthew Mei
Manual Buffer Insertion
•Clock buffers were inserted right before clock
tree routing
•Two buffers of low drive strength were used.
Each buffer added about 40 ps of delay
•The pins list in the example design included the
clock pins of the same ~8000 flip flops
•The clock buffer insertion resulted in a “Post
Route Effective Skew” of about 0.084 ns
•The TCAMs and the flip flops had on average 38
ps of common path pessimism removed

19
Matthew Mei
Manual Buffer Insertion
Setup Timing Performance
-700
-600
-500
-400
-300
-200
-100
0
-0.18
-0.16
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
0 0.05 0.1 0.15
Negative Slack (ns)

Effective Skew (ns)
Negative Slack vs. Effective
Skew
WNS
WNS (clkbuf)
TNS
TNS (clkbuf)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 0.05 0.1 0.15
Failing Paths

Effective Skew (ns)
Failing Paths vs. Effective
Skew
Failing Paths
Failing Paths (clkbuf)

20
Matthew Mei
Manual Buffer Insertion
Hold Timing Performance
0
20
40
60
80
100
120
140
0 0.05 0.1 0.15
Failing Paths

Effective Skew (ns)
Failing Hold Paths vs.
Effective Skew
Failing Paths
Failing Paths (clkbuf)
-1.8
-1.6
-1.4
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
-0.14
-0.12
-0.1
-0.08
-0.06
-0.04
-0.02
0
0 0.05 0.1 0.15
Negative Slack (ns)

Effective Skew (ns)
Negative Hold Slack vs.
Effective Skew
Worst Hold
Worst Hold (clkbuf)
Total Hold
Total Hold (clkbuf)

21
Matthew Mei
Manual Buffer Insertion
Path Skew Distribution
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
Number of Flops (Cumulative)

Path Skew (ns)
Cumulative Distribution of Path Skew Among Skew
Group Flip Flops
Effective Skew 0.005 ns
Effective Skew 0.085 ns
Effective Skew 0.121 ns
Effecitve Skew 0.138 ns
Effective Skew clkbuf

22
Matthew Mei
Manual Buffer Insertion
Power Consumption
•Buffer insertion resulted in about 22000 clock
cells, dramatically increasing power
0
0.5
1
1.5
2
2.5
3
3.5
4
0
10
20
30
40
50
60
0.05 0.12 0.2 0.24 0.3 clkbuf
Increase in Total Power (%)

Increase in Clock Tree Power (%)

Target Skew (ns)
Power Increase vs. Target Skew
Percent Total Power Increase
Percent Clock Tree Power Increase

23
Matthew Mei
Conclusions
•Both methods are easy to setup in IC Compiler

•Skew groups:
–Easy to specify target skew
–Results in smaller increase in cells, power, and area

•Manual buffer insertion:
–Relies on past experience for buffer selection
–Results in larger increase in cells, power, and area

Questions?
Tags