Understanding cts log_messages

10,794 views 63 slides Oct 17, 2012
Slide 1
Slide 1 of 63
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63

About This Presentation

understanding CLOCK TREE SYNTHESIS MESSAGES


Slide Content

Understanding Understanding

Clock Tree Synthesis
LogMessages Log

Messages
© Synopsys 2012 1

Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages
© Synopsys 2012 2

Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages
© Synopsys 2012 3

Prerequisite 1:
Run the check clock treeCommand
• Run the check_clock_treecommand prior to clock tree
synthesis, and fix the issues reported
__
• This command checks the following, and reports issues that can
lead to bad QoR:

Cl k T S

Cl
oc
k
T
ree
S
tructure
Constraints
Clock Tree Exceptions
© Synopsys 2012 4

Prerequisite 2:
Ensure Placement Le
g
alit
y
• For clock tree synthesis to proceed without any errors, it is necessary to
have a legally placed design.

Use the
check legality
command to check whether the design is
gy

Use

the

check
_
legality
command

to

check

whether

the

design

is

properly placed and legalized, prior to CTS.
• In case of legality issues, use the legalize_placementcommand to
resolve these issues resolve

these

issues
.
Note:

Clock tree synthesis will abort in case of placement legality issues

Clock

tree

synthesis

will

abort

in

case

of

placement

legality

issues
.
• In some cases, like overlapping standard cells, it may still proceed and
issue a warning during placement legality checking, but continuing with
placement legality issues may lead to bad QoR placement

legality

issues

may

lead

to

bad

QoR
.
Warning: Some cells in the design are not legal. (CTS-242)
© Synopsys 2012 5

Default Constraints • The default constraints that clock tr ee synthesis uses are as follows:
Maximum transition time 0.5ns
Maximum capacitance 0.6pF
Mi
f
2000
M
ax
imum
f
anout
2000
© Synopsys 2012 6

Design Rule Constraints

In addition to the clock tree design rule constraint values specified using In

addition

to

the

clock

tree

design

rule

constraint

values

specified

using

set_clock_tree_options, IC Compiler also considers the design rule constraint values
from the logic library and the design.

The following table summarizes how IC Compiler determines the design rule constraint
Case1:
Default behavior:
t lib f tfl
Case2: Use library and SDC settings for maximum fanout:
tlibftt
Case3: Use only user set settings for clock tree synthesis and clock tree optimization:
The

following

table

summarizes

how

IC

Compiler

determines

the

design

rule

constraint

values used during the design rule fixing st age of clock tree synthesis and optimization.
c
t
s
_
use
_lib_
max
_f
anou
t
=
f
a
l
se
cts_use_sdc_max_fanout=false
cts_force_user_constraints=false
ct
s
_
use
_lib_
max
_f
anou
t
=
t
rue
cts_use_sdc_max_fanout=true
cts_force_user_constraints=false
cts_force_user_constraints=true
Maximum capacitance
The minimum value from:
• The set_clock_tree_options

The CTS
default value (0.6pF)
The minimum value from:
• The set_clock_tree_options

The CTS
default value (0.6pF)
Value set using s
et clock tree o
p
tions
Maximum

capacitance
The

CTS

default

value

(0.6pF)
• The logic library
• The SDC constraints
The

CTS

default

value

(0.6pF)
• The logic library • The SDC constraints
___p
Maximum transition time
The minimum value from:
•The set_clock_tree_options
• The CTS default value (0.5ns)
Th l i lib
The minimum value from:
• The set_clock_tree_options
• The CTS default value (0.5ns)
Th l i lib
Value set using
set_clock_tree_options

Th
e
log
ic
lib
rar
y
• The SDC constraints

Th
e
log
ic
lib
rar
y
• The SDC constraints
Maximum fanoutThe value set using
set_clock_tree_options
The minimum value from
• The logic library
• The SDC constraints

The
set clock tree options
The value set using set_clock_tree_options
© Synopsys 2012 7
The

set_clock_tree_options

Constraints Specified Using the
set clock tree o
p
tionsCommand
• Library units are used for time and capacitance values specified by using
the set_clock_tree_optionscommand
___p
• The smallest values accepted for the -max_capacitanceand
-max_transition options of the set_clock_tree_options
command are 1fF and 1ps respectively command

are

1fF

and

1ps

respectively
.
• For example, if the library units ar e pF and ps, and you specify the following
command IC Compiler will issue an error: command
,
IC

Compiler

will

issue

an

error:
icc_shell> set_clock_tree_options -max_cap 0.0009 -max_tran 0.300 Error: User max_cap constraint (0.900000 fF) is too small. (CTS-206)
Error: User max_tran constraint (0.300000 ps) is too small. (CTS-207)
– IC compiler will not accept these small values, and will use the previously
specified values or the default val ues for maximum capacitance and maximum
transition, during clock tree synthesis.
© Synopsys 2012 8

Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages
© Synopsys 2012 9

Enabling Debug Messages • To enable clock tree synthesis debug messages in IC Compiler, use:
set
cts use debug mode
true
set

cts
_
use
_
debug
_
mode
true
• Many of the messages discussed in this presentation are available only
when
y
ou enable the debu
g
mode.
yg
© Synopsys 2012 10

Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages
© Synopsys 2012 11

Messages in the compile_clock_tree CommandLog • Before clock tree synthesis:
Di dt
Command

Log


D
es
ign up
d
a
t
e
– Buffer and Inverter information
– Clock tree constraints
– Clock structure before clock three synthesis
• During clock tree synthesis:
– Clustering –
Meeting target early delay Meeting

target

early

delay
– Gate level clock tree synthesis results
• After clock tree synthesis:
St

S
ummary repor
t
– Embedded clock tree optimization – DRC fixing beyond exceptions – Placement legalization
© Synopsys 2012 12

START CMD: com
p
ile clock tree CPU: 55 s
(
0.02 hr
)
ELAPSE: 288 s
(
0.08 hr
)
MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
Overview of the compile_clock_treeCommand Log
_
p_ _
() ()
(PSYN-508)
CTS: CTS Operating Condition(s): MAX(Worst)
START_FUNC: prelude CPU: 55 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)
Loading design 'ORCA_TOP'

Information: Desi
g
n Librar
y
and main librar
y
ca
p
acitance units are matched
-
1
.000
p
f.
Prelude
gy yp
p
END_FUNC: prelude CPU: 56 s ( 0.02 hr) ELAPSE: 288 s ( 0.08 hr) MEM-PEAK: 203 Mb Wed Dec 28 22:33:54 2011
(PSYN-508)

****************************************************************
Information: TLUPlus based RC computation is enabled. (RCEX-141)
****************************************************************
Information: The distance unit in Capacitance and Resistance is 1 micron. (RCEX
-
007)
Extraction related messages
Information:

The

distance

unit

in

Capacitance

and

Resistance

is

1

micron.

(RCEX
007)
Information: The RC model used is TLU+. (RCEX-015) … CTS: Blockage Aware Algorithm CTS: Marking Ignore Pins.... … Warning: too small maximum transition (=0.300000) defined at library cell dl02d4. (CTS-619) CTS bff ti td k t tdl dii i t CTS
:
b
u
ff
er es
ti
ma
t
e
d
s
k
ew
t
arge
t

d
e
l
ay
d
r
i
v
i
ng res
i
npu
t
cap
CTS: invbdk [0.009 0.010] [0.043 0.058] [0.197 0.213] [0.059 0.059]
...
CTS: Prepare sources for clock domain SD_DDR_CLK
CTS: Prepare sources for clock domain SDRAM_CLK
CTS: Prepare sources for clock domain SYS_2x_CLK

Buffer characterization
CTS: Region Aware Algorithm is automatically turned off when design has no region or only has one region. CTS: Info: Found net sys_2x_clk, on cell I_RISC_CORE/I_REG_FILE/REG_FILE_B_RAM is macro. Will not treat as pad. … clean drc fixing cell first... In all, 0 drc fixing cell(s) are cleaned In all, 0 drc fixing cell(s) beyond exception pins are cleaned …
© Synopsys 2012 13
…CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_9/S is implicit ignore …

CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_8/S is implicit ignore
CTS: I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_11/S is implicit ignore

Warning: Ignore net sd_CK since it has no synchronous pins. (CTS-231)
CTS: Info: will use target transition value for initial CTS stages
Pruning library cells (r/f, pwr)
Min drive = 0.000372606.

Final pruned buffer set (7 buffers):
bufbd1
Pruning of buffers and inverters
… CTDN lib estimation: buffers should result in better clock power. CTS: BA: Net 'sdram_clk' CTS: Starting clock tree synthesis ... CTS: Conditions = worst(1) CTS: Global design rule constraints [rise fall] CTS: max transition = worst[0.300 0.300] GUI = worst[0.300 0.300] SDC = undefined/ignored
Reporting global
clock tree constraints

Information: Removing clock transition on clock PCI_CLK ... (CTS-103)
CTS: gate level 1 clock tree synthesis
CTS: clock net = sdram_clk
CTS: gate level 1 clock tree synthesis results
CTS:
clock net :
sdram clk
Clock tree synthesis
CTS:

clock

net

:

sdram_clk
… TS: Clock tree synthesis completed successfully CTS: CPU time: 18 seconds CTS: Reporting clock tree violations ... … CTS: ------------------------------------------------
Reporting the results of clock tree synthesis
CTS: Clock Tree Synthesis Summary CTS: ------------------------------------------------ … CTS: Starting block level clock tree optimization … CTS: gate level 1 clock tree optimization CTS:
clock net =
pclk
Embedded clock tree optimization
© Synopsys 2012 14
CTS:

clock

net

=

pclk

Gate Upsizing During Clock Tree Synthesis • The compile_clock_treecommand will upsize all the Synthesis

preexisting cells in the clock tree before building the clock tree.
Information: Replaced the library cell of sys_ctl/sunburst_clk_mux_div1/clk_buf from bufbd4 to
bufbdf
(CTS
152)
Preexisting gate
bufbdf
.
(CTS
-
152)
• In the previous example the preexisting gate is upsized from a
bufbd4 to a bufbdf.
• This upsizing helps in reducing the number of buffer levels needed
to buildin
g
the clock tree
,
thereb
y
reducin
g
the buffer count.
g,yg
© Synopsys 2012 15

Maximum Capacitance and Transition Related
Warnings
• Even if the set_clock_tree_optionscommand does not issue
any errors when you set the maximum capacitance and transition
constraints, the compile_clock_treecommand can issue
warnings if the values are too small.
Warning: too small maximum transition (=0.050000) defined at pin instCLK1GC1/Q. (CTS
-
6
20)
Warning: too small maximum capacitance (=0.050000) defined at
pin instCLK1GC1/Q. (CTS-620)
Warning: too small maximum transition (
=
0.050000) defined at
Max trans =50ps is too tight for the pin instCLK1GC1/Q
Max cap =50fF is too tight for the pin instCLK1GC1/Q
Warning:

too

small

maximum

transition

( 0.050000)

defined

at

library cell bufbdk. (CTS-619)

Tight constraints can cause clock tree synthesis to use an excessive Tight

constraints

can

cause

clock

tree

synthesis

to

use

an

excessive

number of buffers to build the clock trees
© Synopsys 2012 16

Buffers and Inverters Used During Clock Tree
Synthesis
• Before synthesizing the clock tree, IC Compiler characterizes each buffer
and inverter
To see the characterization details, set the followin
g
variable to true:
g
set cts_do_characterization true
After characterization is done, characterized values for each buffer and
inverte
r
are re
p
orted
Buffer
p
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
CTS: bufbd7 [0.025 0.030] [0.223 0.234] [0.415 0.503] [0.008 0.008]
CTS b fbd4 [0 047 0 053] [0 347 0 357] [0 786 0 880] [0 004 0 004] CTS
:
b
u
fbd4

[0
.
047

0
.
053]

[0
.
347

0
.
357]

[0
.
786

0
.
880]

[0
.
004

0
.
004]
Inverter
Rise delay
Fall delay
• Driving resistance determines the drive strength of the buffer or inverter.
• Smaller the driving resistance, greater is the drive strength.
• In the previous example, bufbdf is the buffer with the highest drive strength.
© Synopsys 2012 17

Unbalanced Buffers • Buffers and inverters that have a big difference between their rise
and fall delays, which is referred to as the rise/fall delay skew, are
reported. CTS: inverter inv0da: rise/fall delay skew = 0.204816 (> 0.200000) • Remove unbalanced buffers them from the buffer list specified for
clock tree synthesis, as they can might cause bad skew.
• Use the set_clock_tree_referencescommand to specify the
buffers and inverters that should be used for clock tree synthesis
© Synopsys 2012 18

Pruning of Buffers and Invertors • Pruning is a process by which IC Compiler selects the buffers and
inverters which are best suited for clock tree synthesis, based on the
buffer and inverter characterization, and prevents the remaining ones
fbi d f
rom
b
e
ing use
d
.
• IC Compiler prunes the buffers and inverters based on drive strength
and power: and

power:
Pruning library cells (r/f, pwr)
Min drive = 0.264263.
Pruning inv0d0 because drive of 0.149845 is less than 0.264263.
Pruning inv0d2 because it is (w/ power-considered) inferior to invbd2.
• IC Compiler calculates a minimum drive value based on heuristics.
Buffers and inverters whose drive strength is less than the minimum
d
riv
e
v
a
lue

a
r
e

co
n
s
ide
r
ed

as
w
ea
k
d
riv
e
r
s

a
n
d

a
r
e

p
r
u
n
ed

by
I
C

d e aue a e co sde ed as ea d e s a d a e p u ed by C Compiler.
• It is not possible to override the default pruning process
© Synopsys 2012 19

Maximum Transition, Maximum CapacitanceandTimingConstraints Capacitance

and

Timing

Constraints
Before clock tree synthesis begins, all the global clock tree constraints are reported in the log in the format shown below:
Default value or the value set
using
s
et clock tree o
p
tions
The value
reported

in

the

log
,
in

the

format

shown

below:
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst
[
0.050 0.050
]
GUI = worst
[
0.100 0.100
]
SDC = worst
[
0.050 0.050
]
Value from
SDC
___
p
used by CTS
[] [] []
CTS: max capacitance = worst[0.600 0.600] GUI = worst[0.600 0.600] SDC = undefined/ignored
CTS: max fanout = 2000 GUI = 2000 SDC = undefined/ignored
on
s
Undefined means no value
ifi d i SDC
CTS: Global timing/clock tree constraints CTS: clock skew = worst[0.100] CTS: insertion delay = worst[2.000] CTS: levels per net = 200
Skew/insertio
delay targets
Values set using the
spec
ifi
e
d

in
SDC

Ignored means the value from
SDC is ignored as the
cts force user constraints
© Synopsys 2012 20
S
d
Values

set

using

the
set_clock_tree_options
command
cts
_
force
_
user
_
constraints
variable is set to true

ClockTreeSynthesisTargetSpecifications • Target specifications are the internal targets for clock tree synthesis, Clock

Tree

Synthesis

Target

Specifications
but are not guaranteed. Only target constraints are guaranteed to be
achieved CTS: Global target spec [rise fall]
CTS: transition = worst[0.250 0.250]
CTS: capacitance = worst[0.300 0.300]
CTS: fanout= 32 (This target fanout value is not considered by CTS)
• Target specifications:
maxTransSpec: Min(0.25, 80%of max_transition constraints)
maxCapSpec: Min(0.30, 80%of max_capacitance constraints)
© Synopsys 2012 21

Preexisting Clock Tree Information in the Log File
Maximum number of
Before starting to
CTS: Design infomation
CTS: total gate levels = 8
CTS: Root clock net CLK2
CTS: clock gate levels = 2
Number of sinks
Maximum

number

of

gate levels available
e levels
Before

starting

to

build the clock tree,
the preexisting clock
tree structure is
printed in the log file
CTS: clock sink pins = 4
CTS: level 2: gates = 1
CTS: level 1: gates = 1
CTS: Buffer/Inverter list for CTS for clock net CLK2:
CTS: invbdk
Existing gate levels and number
of gates at each level
Number of gate
for clock CLK2
printed

in

the

log

file
CTS: bufbdk
...
CTS: Root clock net CLK1
CTS: clock gate levels = 8
CTS: clock sink pins
=
8431
N
f
CTS:

clock

sink

pins

8431
CTS: level 8: gates = 2 CTS: level 7: gates = 3 CTS: level 6: gates = 4 CTS: level 5: gates = 3 CTS: level 4: gates = 1
evels from
ps towards
source
CTS:

level

4:

gates

=

1
CTS: level 3: gates = 5
CTS: level 2: gates = 4
CTS: level 1: gates = 1
CTS: Buffer/Inverter list for CTS for clock net CLK1:
CTS
ibdk
Gate l
flip-flo
clock
s
© Synopsys 2012 22
CTS
:
i
nv
bdk
CTS: bufbdk
...

Real Gates and Guide Buffers • You may see the term real gates in the preexisting clock tree structure
information section
:
CTS: Root clock net CLK1
CTS: clock gate levels = 16 CTS:

clock

gate

levels

=

16
CTS: clock sink pins = 70644
...
CTS: level 13: gates = 14 (real gates= 4)
CTS: level 12: gates = 111 (real gates = 101)
CTS: level 11:
g
ates = 146
(
r
eal
g
ates
=
136
)
g(
g
)
CTS: level 10: gates = 2488 (real gates = 2478)
• Real gates are preexisting gates in the clock tree, and are not gates added by
the tool
• Guide buffers are buffers or inverters t hat are inserted by the tool, before it
begins to build the tree. They are inte nded to help clock tree synthesis build a
better clock tree
• The number of guide buffers inserted at each level can be determined from the
difference between gates and real gates.
– In the above example, the tool has added 10 guide buffers at each of the clock tree
© Synopsys 2012 23

Buffers and Inverters Used
• Before it begins to build the clock tree, the t ool will list all the buffers and inverters it will
use to build the tree
CTS: Buffer/Inverter list for CTS for clock net
s
dram clk
:
_
CTS: CLKBUFX20
CTS: CLKBUFX16
CTS: CLKBUFX12
CTS: Buffer/Inverter LEQ cell list for Boundary Cell for clock net sdram_clk:
CTS CLKBUFX20
CTS uses this list
CTS
:
CLKBUFX20
CTS: CLKBUFX16
CTS: CLKINVX8
CTS: Buffer/Inverter LEQ cell list for CTO for clock net sdram_clk:
CTS: CLKBUFX20
CTS uses this list for inserting boundary cells
CTS: CLKBUFX16 CTS: CLKINVX8 CTS: Buffer/Inverter list for DelayInsertion for clock net sdram_clk:
CTS: CLKBUFX20
CTO uses this list for sizing
CTO thi li t f d l i ti
CTS: CLKBUFX16 CTS: CLKINVX8
• You can change the buffer and inverter list by using the following command:
CTO
uses
thi
s
li
s
t

f
or
d
e
lay
inser
ti
on
© Synopsys 2012 24
set_clock_tree_references

Clock Tree Synthesis Removes User-Specified IdealAttributesonClocks
• Synthesized clocks are set to be propagated, and clock transition, which
is an attribute of an ideal clock, is removed
Ideal
Attributes

on

Clocks
CTS: Information: Removing clock transition on clock SP0XCLK ... (CTS
-
1
03)
CTS: Information: Removing clock transition on clock SP0RCLK ... (CTS-103)

Latency, another attribute of an ideal clock, is also removed Latency,

another

attribute

of

an

ideal

clock,

is

also

removed
CTS: Information: Removing clock latency on pin
Idma_scr_wrap0__Idma_scrba0_m2m0_wrap/I_dma_scrba0_m2m0/ I_dma@ ... (CTS-
098)
• Source Latency is removed for generated clocks
Information: Removing clock source latency on clock CLK1GC1 ... (CTS-289)
• These messages are informational only, and no action is required
© Synopsys 2012 25

Overlap or ReconvergentPaths
• Overlap or reconvergent paths occur when multiple clocks can drive a
node node
• IC Compiler issues warnings about such paths
Warning: Either the driven net has been synthesized previously or
clock path overlaps/reconverges at pin periph/U1852/Y. (CTS-209)
• Such messages should be treated as informational, rather than as
warnings
– IC Compiler has no problems handling such situations
© Synopsys 2012 26

Clktbildiid tllb tllttif th
Gate Level-by-Level Clock Tree Synthesis

Cl
oc
k

t
ree
b
u
ildi
ng
is
d
one ga
t
e
leve
l
b
y ga
t
e
leve
l, s
t
ar
ti
ng
f
rom
th
e
sinks to the clock root
• For each gate level, just before the synthesis starts, the following
information will be printed in the log:
CTS: gate level 2 clock tree synthesis
CTS: clock net = I BLENDER 1/
g
clk
Net and driver at
__
g
CTS: driving pin = I_BLENDER_1/U483/Z
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
CTS: max capacitance = worst[0.300 0.300]
Net

and

driver

at
this gate level
CTS: max fanout = 2000
CTS: gate level 2 target spec [rise fall]
CTS: transition = worst[0.240 0.240]
CTS: capacitance = worst[0.240 0.240]
C
T
S:

d
riv
e
r
cap.
= w
o
r
st[0.088

0.088]
C S: d e cap. o st[0.088 0.088] CTS: fanout = 32
CTS: gate level 2 timing constraints
CTS: clock skew = worst[0.000]
CTS: levels per net = 200
© Synopsys 2012 27
CTS: -----------------------------------------------
CTS: Starting clustering for bufbdawith target load = worst[0.240 0.240]

• The clock tree buildin
g
starts with clusterin
g
. Clusterin
g
is the
p
rocess of
Clustering During Clock Tree Synthesis
gggp
dividing a set of sink pins (fanouts) into groups. Each group is driven by a
buffer The instances of a cluster are all close to each other

The following message says that 423 sink pins are divided into 27 clusters

The

following

message

says

that

423

sink

pins

are

divided

into

27

clusters
,
each with approximately 423/27 sink pins
CTS: gate level 2 clock tree synthesis
...
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300] CTS: max capacitance = worst[0.300 0.300] CTS: max fanout = 2000 CTS: gate level 2 target spec [rise fall] CTS: transition = worst[0.240 0.240] CTS: ca
p
acitance = worst
[
0.240 0.240
]
p[]
CTS: driver cap. = worst[0.088 0.088] CTS: fanout = 32 CTS: gate level 2 timing constraints ... CTS: ----------------------------------------------- CTS:
Starting clustering for
bufbda
with target load
=
worst[0.240 0.240]
Before clustering
After clustering
CTS:

Starting

clustering

for

bufbda
with

target

load

worst[0.240

0.240]
CTS: Completed 423 to 27 clustering
CTS: BA: lp (1.520, 0.673): skew (0.149, 0.080) c(1.481, 0.198) viol(n y)
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 27 to 4 clustering
CTS:
BA:
lp
(0 673 0 597): skew (0 080 0 105) c(0 198 0 026) viol(n
n
)
One buffer level is added
with each clustering
Represents DRCs
(cap,trans)
© Synopsys 2012 28
CTS:

BA:

lp
(0
.
673
,
0
.
597):

skew

(0
.
080
,
0
.
105)

c(0
.
198
,
0
.
026)

viol(n

n
)
CTS: -----------------------------------------------
y : violation present
n : no violation
Skew (Before clustering, After clustering)

Clustering With Hookup Pins • Hookup pins are input pins of gates or macros • Unlike clock pins of flip-flops and latches (sink pins), hookup pins
have a nonzero phase delay that must be balanced with the sink
pins
© Synopsys 2012 29

Initially the tool makes attempts to cluster hookup pins along with the normal sinks (trial
Clustering With Hookup Pins

Initially
,
the

tool

makes

attempts

to

cluster

hookup

pins

along

with

the

normal

sinks

(trial

clustering)
CTS: gate level 1 clock tree synthesis
...
CTS: gate level 1 design rule constraints [rise fall]
CTS: max transition = worst[0.300 0.300]
In this example there are 479 sinks
CTS: max capacitance = worst[0.300 0.300] CTS: max fanout = 2000 CTS: gate level 1 target spec [rise fall] CTS: transition = worst[0.240 0.240] CTS: capacitance = worst[0.240 0.240] CTS: driver cap. = worst[0.150 0.150] CTS: fanout
=
32
In

this

example
,
there

are

479

sinks

and 1 hookup pin
CTS:

fanout

32
CTS: gate level 1 timing constraints
...
CTS: -----------------------------------------------
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: Completed 480 to 34 clustering
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS C l t d 34 t 6 l t i
Trial clustering
CTS
:
C
omp
l
e
t
e
d

34

t
o
6
c
l
us
t
er
i
ng
CTS: BA: this delay [max min] (skew) = worst[0.000 0.000] (0.000)
CTS: BA: next delay [max min] (skew) = worst[0.124 0.124] (0.000)
CTS: BA: target cap = 0.070 pf
CTS: Starting clustering for bufbda with target load = worst[0.240 0.240]
CTS: BA: CAC set: target cap = 0.070317: targetWireCap = 0.274866
CTS: Completed 479 to 39 clustering
clustering
Actual
lti
CTS: BA: lp (1.574, 0.770): skew (0.821, 0.451) c(1.737, 0.269) viol(n y)
CTS: -----------------------------------------------
• At the trial clustering stage, the hookup pin is considered along with the other sink pins and
(479+1) to 34 to 6 clustering is obtained

At the actual clustering stage the tool clusters the 479 sink pins separately from the hookup
c
lus
t
er
ing
© Synopsys 2012 30

At

the

actual

clustering

stage
,
the

tool

clusters

the

479

sink

pins

separately

from

the

hookup

pin

Clustering With Hookup Pins: HookupPinClusteredWithSinks
• If the trial clustering gives good QoR results, the following message shown in
blue is displayed :
Hookup

Pin

Clustered

With

Sinks
blue

is

displayed

:
CTS: BA: lp (1.968, 2.031): skew (0.257, 0.194) c(0.076, 0.072) viol(y y)
CTS: -----------------------------------------------
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005]
CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed
CTS: Completed 2 to 2 clustering CTS:

Completed

2

to

2

clustering
CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005] CTS: BA: rootNetCap = 0.071776: targ cap = 0.045000: targ wirecap = 0.000000: not relaxed CTS: Completed 2 to 1 clustering CTS: BA: this delay [max min] (skew) = worst[2.040 1.844] (0.196) CTS: BA: next delay [max min] (skew)
=
worst[2.161 1.965] (0.196)
CTS:

BA:

next

delay

[max

min]

(skew)

worst[2.161

1.965]

(0.196)
CTS: BA: target cap = 0.048 pf CTS: Pin 1: periph/U5659/A is selected for next level CTS: delay [max min] (skew) = worst[1.976 1.921] (0.055) CTS: Starting clustering for bufbd7 with target load = worst[0.000 0.005] CTS: Com
p
leted 2 to 2 clusterin
g
pg
CTS: BA: lp (2.031, 2.153): skew (0.194, 0.210) c(0.072, 0.026) viol(n n) CTS: -----------------------------------------------
• When the phase delay of the hookup pin periph/U5659/Amatches with the
dela
y
of the alread
y
built tree at that
g
ate level
,
it will be clustered at that buffer
© Synopsys 2012 31
yy g,
level.

Meeting Target Early Delay • After the synthesis of the root clock net (gate le vel 1 synthesis), the tool checks if the delay
constraint set by the user is being met or not.
• If it is not met, the tool inserts some buffers at the root clock net to achieve the target delay
s
p
ecified b
y
the user.
py
• In the following message, 16 buffers are inserted at the root clock net to increase the delay from
0.569ns to 2ns, which is the user specified target. CTS: gate level 1 clock tree synthesis
C
T
S:

c
l
oc
k n
et
=
sys c
lk
CS:
coc et
sys
_
c
CTS: driving pin = sys_clk
CTS: gate level 1 design rule constraints [rise fall]
...
CTS: gate level 1 target spec [rise fall]
...
CTS: gate level 1 timing constraints
Constraint set by the user
CTS: clock skew = worst[0.000] CTS: insertion delay = worst[2.000] CTS: levels per net = 200 CTS: ----------------------------------------------- CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270] ... CTS:
-----------------------------------------------
CTS:

CTS: Starting clustering for CLKBUF_X20 with target load = worst[0.211 0.270]
CTS: Completed 19 to 2 clustering
CTS: BA: lp (0.563, 0.569): skew (0.142, 0.112) c(0.008, 0.008) viol(n n)
CTS: -----------------------------------------------
CTS: Inserting delay cells for clock tree sys_clk ...
CTS: current delay = worst[0.569] worst[0.457]
© Synopsys 2012 32
CTS: constraint = worst[2.000] worst[0.000]
CTS: inserted 16 (buffd3) delay cells to the clock net sys_clk

CTS:
gate level 1 clock tree synthesis results
Synthesis Results of One Gate Level

After the synthesis of a
CTS:

gate

level

1

clock

tree

synthesis

results
CTS: clock net : sdram_clk
CTS: driving pin: sdram_clk
CTS: load pins : 5 sink pins, 0 gates/macros pins, 0 ignore pins
CTS: buffer level 1: bufbd7 (1)
CTS: buffer level 2: bufbd7 (1)
delay at the
dram_clk)
After

the

synthesis

of

a

gate level, the results are
printed in the log
CTS: clock tree skew = worst[0.036]
CTS: longest path delay = worst[0.327](rise)
CTS: shortest path delay = worst[0.291](rise)
CTS: total capacitance = worst[0.389 0.389]
CTS: buffer level phase delay
CTS
1 (I) t[0 293]( i ) t[0 256]( i ) k t[0 036]
d insertion d
n A (here sd
Operating Condition
CTS
:
1

(I)
: wors
t[0
.
293](
r
i
se
)
, wors
t[0
.
256](
r
i
se
)
; s
k
ew = wors
t[0
.
036]
CTS: (O): worst[0.151](rise), worst[0.129](rise); skew = worst[0.022] CTS: 2 (I): worst[0.150](rise), worst[0.128](rise); skew = worst[0.022] CTS: (O): worst[0.004](rise), worst[0.000](rise); skew = worst[0.004] CTS: buffer level output transition delays [rise fall] CTS:
level 0: worst[0.088 0.085] worst[0.088 0.085]
Skew and
driving pin
CTS:

level

0:

worst[0.088

0.085]

worst[0.088

0.085]
CTS: load 0: worst[0.088 0.085] worst[0.088 0.085] CTS: level 1: worst[0.111 0.115] worst[0.091 0.092] CTS: load 1: worst[0.111 0.115] worst[0.091 0.092] CTS: level 2: worst[0.158 0.153] worst[0.080 0.071] CTS: load 2: worst[0.158 0.153] worst[0.080 0.071] CTS: buffer level total load capacitance CTS: level 0: worst[0.045 0.045] CTS: level 1: worst[0.093 0.093] CTS: level 2: worst[0.251 0.251] CTS: drc violations: 0 0
2 1 AC
B
Load capacitance value is added and is
© Synopsys 2012 33
Load

capacitance

value

is

added

and

is
reported as total capacitance of the subtree
Number of cap
violations
Number of trans
violations

Maximum Transition and Capacitance Violations • After each gate level is synthesized, the maximum capacitance and
maximum transition violations at that gate level are reported
Violations
CTS: gate level 3 clock tree synthesis results
...
CTS: buffer level total load capacitance
...
CTS it i l ti
ih
/CTS 755
CTS
: capac
it
ance v
i
o
l
a
ti
on on
p
er
i
p
h
/CTS
_
755
CTS: capacitance = worst[0.052 0.052]
CTS: constraint = worst[0.050 0.050]
CTS: capacitance violation on periph/CTS_757
CTS: capacitance = worst[0.051 0.051]
CTS: constraint
=
worst[0 050 0 050]
CTS:

constraint

worst[0
.
050

0
.
050]
...
CTS: transition delay violation at periph/CLKBUFX20_G3B1I3/A
CTS: transition delay = worst[0.052 0.050] worst[0.052 0.050]
CTS: constraint = worst[0.050 0.050]
CTS: transition delay violation at periph/CLKBUFX20_G3B2I14/A
CTS: transition delay = worst[0.053 0.051] worst[0.053 0.051]
CTS: constraint = worst[0.050 0.050]
...
CTS: drc violations: 18 5
Number of cap
violations
Number of trans violations
© Synopsys 2012 34
violations
violations

A More Complex Synthesis Results
CTS: gate level 1 clock tree synthesis results
CTS: clock net : clk
CTS: driving pin: clk
CTS: load pins : 80 sink pins, 0 gates/macros pins, 0 ignore pins
CTS: buffer level 1: CLKBUFX20 (1)
CTS: buffer level 2: CLKBUFX20 (2) CLKBUFX12 (1) CTS: clock tree skew = worst[0.001] CTS: longest path delay = worst[0.248](rise) CTS: shortest path delay = worst[0.246](rise) CTS: total capacitance = worst[0.549 0.549] CTS: buffer level phase delay CTS: 1 (I): worst[0.247](rise), worst[0.246](rise); skew = worst[0.001] CTS: (O): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001] CTS: 2 (I): worst[0.141](rise), worst[0.140](rise); skew = worst[0.001] CTS: (O): worst[0.001](rise), worst[0.000](rise); skew = worst[0.001] CTS: buffer level output transition delays [rise fall] CTS: level 0: worst[0.000 0.000] worst[0.000 0.000] CTS: load 0: worst[0.000 0.000] worst[0.000 0.000] CTS: level 1: worst[0.089 0.076] worst[0.089 0.076] CTS: load 1: worst[0.089 0.076] worst[0.089 0.076] CTS: level 2: worst[0.109 0.093] worst[0.104 0.091] CTS: load 2: worst[0.109 0.093] worst[0.104 0.091] CTS:
buffer level total load capacitance
CTS:

buffer

level

total

load

capacitance
CTS: level 0: worst[0.038 0.038] CTS: level 1: worst[0.108 0.108] CTS: level 2: worst[0.403 0.403] CTS: drc violations: 0 0
© Synopsys 2012 35

Gate Level and Buffer Level Nomenclature
2
1
2
1
)
ate level 2
ate level 1
ate level 2
ate level 1
level 3
e level 2
level 4
e level 2
vel 1
source pin
evel 2
evel 1 of g
evel 2 of g
evel 2 of g
evel 1 of g
Buffer
of gate
Buffer of gat
e
Gate lev
(Clock s
Gate Le
Buffer le
Buffer le
Buffer le
Buffer le
Red: Preexisting gatesAt each gate level, the clock tree is built
© Synopsys 2012 36
Black: CTS introduced gates bottom-up, but the buffer names are changed
to appear top-down

DRC Violation Report After Synthesis • After building the complete clock tree, all the remaining DRC violations in
the entire clock tree gets reported in the log file:
CTS: Clock tree synthesis completed successfully
CTS: CPU time: 50 seconds
CTS: Reporting clock tree violations ...
CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
Constraints
CTS: maximum fanout = 2000 CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clk CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050] CTS: constraint = worst[0.050 0.050]
Constraints
CTS: transition delay violation at CLKBUF_X20_G1B21I1/Z CTS: transition delay = worst[0.051 0.050] worst[0.051 0.050] CTS: constraint = worst[0.050 0.050] CTS: capacitance violation on CTS_6557 CTS: ca
p
acitance = worst
[
0.074 0.074
]
Reports only transition
and ca
p
acitance violations
p[]
CTS: constraint = worst[0.050 0.050]
CTS: Summary of clock tree violations:
CTS: Total number of transition violations = 2
CTS: Total number of capacitance violations = 1
p
Total transition and
capacitance violations
© Synopsys 2012 37

Summary Report After ClockTreeSynthesis
CTS: ------------------------------------------------ CTS
ClkT SthiS
Clock

Tree

Synthesis
CTS
:
Cl
oc
k

T
ree
S
yn
th
es
i
s
S
ummary
CTS: ------------------------------------------------
CTS: 5 clock domain synthesized
CTS: 30 gated clock nets synthesized
CTS: 26 buffer trees inserted
CTS: 722 buffers used (total size = 45974.2)
CTS: 752 clock nets total capacitance = worst[76.868 76.868]
Each gate level can hlil h
ave mu
lt
ip
le nets
© Synopsys 2012 38

Clock-by-Clock Summary
• A summary is reported for each clock:
CTS: ------------------------------------------------
CTS: Clock-by-Clock Summary
Buffer tree is inserted
only if necessary
CTS: ------------------------------------------------
CTS: Root clock net pclk
CTS: 3 gated clock nets synthesized
CTS: 2 buffer trees inserted
only

if

necessary
CTS: 2 buffers used (total size = 159.667)
CTS: 5 clock nets total capacitance = worst[0.514 0.514]
CTS: clock tree skew = worst[0.341]
CTS:
longest path delay
=
worst[5.959](rise)
CTS:

longest

path

delay

worst[5.959](rise)
CTS: shortest path delay = worst[5.619](rise)
CTS: Root clock net sys_clk
...
© Synopsys 2012 39

Embedded Clock Tree Optimization
• After clock tree synthesis, embedded clock tree optimization begins
• The characteristics of the buffers and inverters used are reported again
CTS: buffer estimated skew target delay driving res input cap
CTS: bufbdf [0.013 0.015] [0.217 0.200] [0.210 0.248] [0.007 0.007]
CTS: inv0da [0.018 0.021] [0.097 0.119] [0.294 0.347] [0.036 0.036]
...
• The global constraints for clock tree are also reported again
CTS: Global design rule constraints [rise fall]
CTS: max transition = worst[0.050 0.050] GUI = worst[0.050 0.050] SDC = undefined/ignored
...
CS Glbl ii /l k i C
T
S
:
Gl
o
b
a
l
t
i
m
i
ng
/
c
l
oc
k
tree constra
i
nts
CTS: clock skew = worst[0.000]
...
CTS: Global target spec [rise fall]
CTS: transition = worst[0.040 0.040]
...
Note:
Embedded clock tree optimization is called only when the compile_clock_tree
command is used It is not called when the
lk t
command is used
© Synopsys 2012 40command

is

used
.
It

is

not

called

when

the

cl
oc
k
_op
t
command

is

used

More Messages on Real Gates and GuideBuffers • At the beginning of optimization, you might get the following Guide

Buffers
messages: CTS: Root clock net chip_sclk_src
CTS: clock gate levels = 75
CTS: clock sink pins
=
125896
CTS:

clock

sink

pins

125896
...
CTS: level 73: gates = 3 (real gates = 1)
CTS: level 72: gates = 2 (no real gates, guide buffers only)
ff
• All the gates are guide bu
ff
ers and inverters inserted during clock
tree synthesis.
• This information is similar to the one printed prior to clock tree
hi
synt
h
es
is.
© Synopsys 2012 41

Gate Level Optimization
• The clock tree optimization is also done for each gate level
• Similar to when the clock tree is built
• Before optimizing a gate level, the current skew, longest path delay and shortest
path delay from the driving pin of that gate level, is reported.
CTS: gate level 2 clock tree optimization
CTS: clock net = I_BLENDER_1/gclk
CTS: driving pin = I_BLENDER_1/U483/Z
CTS: clock tree skew = worst[0.517]
CTS: longest path delay = worst[5.339](rise)
CTS: shortest path delay = worst[4.822](fall)
• After which that gate level is optimized
© Synopsys 2012 42

Buffer Sizing
• The following message indicates that buffer sizing was successful CTO-BS: Starting buffer sizing ...
Information: Replaced the library cell of CLKBUF_X20_G2B2I1 from CLKBUF_X20 to CLKBUF_X16. (CTS-152)
CTO-BS: CPU time = 0 seconds for buffer sizing • Clock tree optimization will try to resize buffers, and improve skew and
insertion delay. If it does not find it beneficial, then the original cell
master will be restored. CTO-BS: Starting buffer sizing ...
CTO-BS: Restoring original cellMaster <CLKBUF_X20> of <CLKBUF_X20_G2B2I4>
CTO-BS: CPU time = 1 seconds for buffer sizing
© Synopsys 2012 43

CTO-GS: Starting gate sizing ...
Gate Sizing
Information: Replaced the library cell of I7188625 from TLQMUX2X60 to TULQMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I7586451 from TLTMUX2X60 to TLTMUX2X50. (CTS-152)
Information: Replaced the library cell of I3342873 from TULTMUX2X50 to TLTMUX2ZSX60. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2X80 to TULTMUX2ZSX80. (CTS-152)
...
I f ti R l d th lib ll f I6717862 f THQMUX2ZSX80 t TSTMUX2ZSX20
(CTS
152
)
14 cells sized
I
n
f
orma
ti
on:
R
ep
l
ace
d

th
e
lib
rary ce
ll
o
f

I6717862

f
rom
THQMUX2ZSX80

t
o
TSTMUX2ZSX20
.

(CTS
-
152
)
Information: Replaced the library cell of I9359863 from TLTMUX2ZSX80 to TULTMUX2ZSX60. (CTS-152) Information: Replaced the library cell of I10258160 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152) Information: Replaced the library cell of I7636259 from TLTMUX2ZFFX80 to TULTMUX2ZSX60. (CTS-152) CTO-GS: 1: Sized 14/40cell instances (tested 40X247)
CTO-GS: dela
y

(
from
)
= worst
[
9.104
]
worst
[
8.633
];
skew = worst
[
0.471
]
Summary of the first round of sizing
y( ) [ ] [ ]; [ ]
CTO-GS: delay (to) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO-GS: improvement = worst[0.106%]
Information: Replaced the library cell of I2130284 from TLTMUX2X80 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8618764 from TLTMUX2ZFFX80 to TLTMUX2X80. (CTS-152)
Information: Replaced the library cell of I1749911 from TULTMUX2ZFFFX80 to TULTMUX2ZFFX80. (CTS-152)
• Number of gate sized (Here 14 out of 40 gates)
• Shows the improvement in skew
Information: Replaced the library cell of I3342873 from TLTMUX2ZSX60 to TLTMUX2ZSX40. (CTS-152)
Information: Replaced the library cell of I8872989 from TULTMUX2ZFFFX60 to TLTMUX2ZFFX80. (CTS-152)
Information: Replaced the library cell of I1387108 from TULTMUX2ZSX80 to TULTMUX2X50. (CTS-152)
CTO-GS: 2: Sized 6/40 cell instances (tested 40X247)
CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO
GS: delay (to) = worst[9 104] worst[8 633]; skew = worst[0 471]
CTO
-
GS:

delay

(to)

=

worst[9
.
104]

worst[8
.
633];

skew

=

worst[0
.
471]
CTO-GS: improvement = worst[0.000%] CTO-GS: Summary of cell sizing CTO-GS: Sized 20/40 cell instances (tested 80X247) CTO-GS: delay (from) = worst[9.104] worst[8.633]; skew = worst[0.471] CTO-GS: dela
y
(to) = worst[9.104] worst[8.633]; skew = worst[0.471]
Overall summary of gate sizing done at this gate
level. Total 14+6 =20 gates sized giving an
0 106% i t i k t thi t l l
© Synopsys 2012 44
y
CTO-GS: improvement = worst[0.106%]
CTO-GS: CPU time = 2413 seconds for gate sizing
0
.
106%

improvemen
t

in s
k
ew a
t

thi
s ga
t
e
leve
l

Gate Relocation • Gate relocation works on preexisting gates. • If you have no preexisting gates, you might see the following
messa
g
e:
g
CTO-GR: gate relocation is skipped since there are no hookup pins
© Synopsys 2012 45

A Successful Gate Relocation
CTO-GR: Starting gate relocation ...
CTO-GR: delay [max min] (skew) = worst[9.023 8.563] (0.460)
2 cells were tried at 47
new locations, 1 was moved
CTO-GR: 1: Relocated 1/40cell instances (tested 2cell instances at 47points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: delay (to) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: improvement = worst[0.000%]
CTO
GR dl [ i](k ) t[90188563](0455)
Initial skew
Final skew
Improvement in skew
CTO
-
GR
:
d
e
l
ay
[
max m
i
n
]

(
s
k
ew
)
= wors
t[9
.
018

8
.
563]

(0
.
455)
CTO-GR: delay [max min] (skew) = worst[9.018 8.563] (0.455)
CTO-GR: 2: Relocated 2/40 cell instances (tested 5 cell instances at 83 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
CTO-GR: dela
y

(
to
)
= worst
[
9.018
]
worst
[
8.563
]
; skew = worst
[
0.455
]
y( ) [ ] [ ] [ ]
CTO-GR: improvement = worst[1.118%]
CTO-GR: Summary of cell relocation
CTO-GR: Relocated 3/40 cell instances (tested 7 cell instances at 130 points)
CTO-GR: delay (from) = worst[9.023] worst[8.563]; skew = worst[0.460]
Overall summary of
tlti tthi
CTO-GR: delay (to) = worst[9.018] worst[8.563]; skew = worst[0.455]
CTO-GR: improvement = worst[1.118%]
CTO-GR: CPU time = 2 seconds for gate relocation
ga
t
e re
loca
ti
on a
t

thi
s
gate level
© Synopsys 2012 46

Gate Relocation: Failed Attempts
CTO-GR: Starting gate relocation ... CTO
-
GR: Summary of cell relocation
CTO
-
GR:

Summary

of

cell

relocation
CTO-GR: Relocated 0/1 cell instances (tested 1 cell instances at 24 points)
CTO-GR: delay (from) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: delay (to) = worst[1.207] worst[0.980]; skew = worst[0.227]
CTO-GR: improvement = worst[0.000%]
CTO-GR: CPU time = 0 seconds for gate relocation
• In this example, clock tree optimization tried to move one gate
instance to 24 different locations. Since the attempts did not improve
the QoR, the gate relocation was abandoned
© Synopsys 2012 47

Buffer Relocation
• Buffer relocation is done on all clock tree synthesis inserted buffers
CTO
-
BR: Buffer relocation ...
CTO
BR:

Buffer

relocation

...
CTO-BR: Optimization level: net
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 1: Relocated 1/6 cell instances (tested 6 cell instances at 74 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: delay [max min] (skew) = worst[9.087 8.503] (0.584)
CTO-BR: 2: Relocated 1/6 cell instances (tested 5 cell instances at 62 points)
CTO
-
BR: delay (from)
=
worst[9 087] worst[8 503]; skew
=
worst[0 584]
CTO
BR:

delay

(from)

worst[9
.
087]

worst[8
.
503];

skew

worst[0
.
584]
CTO-BR: delay (to) = worst[9.087] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[0.000%]
CTO-BR: Summary of cell relocation
CTO-BR: Relocated 2/6 cell instances (tested 11 cell instances at 136 points)
CTO-BR: delay (from) = worst[9.099] worst[8.503]; skew = worst[0.596]
CTO-BR: delay (to) = worst[9.099] worst[8.503]; skew = worst[0.584]
CTO-BR: improvement = worst[2.013%]
CTO-BR: CPU time = 0 seconds for buffer relocation
Th i f i i i il l i
© Synopsys 2012 48

Th
e
in
f
ormat
ion
is s
im
il
ar to gate re
locat
ion

• After the embedded clock tree optimization, the tool prints the summary. •
It looks exactly similar to the summary printed after clock tree synthesis
Post Embedded Clock Tree Synthesis •
It

looks

exactly

similar

to

the

summary

printed

after

clock

tree

synthesis
.
CTS: ------------------------------------------------
CTS: Clock Tree Optimization Summary
CTS: ------------------------------------------------
CTS: 4 clock domain synthesized
CTS: 5 gated clock nets synthesized
CTS: 5 buffer trees inserted
CTS: 1000 buffers used (total size = 16570 8) CTS:

1000

buffers

used

(total

size

=

16570
.
8)
CTS: 1005 clock nets total capacitance = worst[14.010 14.010] CTS: ------------------------------------------------ CTS: Clock-by-Clock Summary CTS: ------------------------------------------------ CTS: Root clock net sdram_clk CTS: 1 gated clock nets synthesized CTS: 1 buffer trees inserted CTS:

1

buffer

trees

inserted
CTS: 302 buffers used (total size = 5039.47) CTS: 303 clock nets total capacitance = worst[4.170 4.170] CTS: clock tree skew = worst[0.035] CTS: longest path delay = worst[2.041](rise) CTS: shortest path delay = worst[2.006](fall) CTS: Root clock net sys_2x_clk ... • After the summary, all the trans and cap violations on the clock tree are also reported. CTS: Global design rules:
CTS: maximum transition delay [rise,fall] = [0.05,0.05]
CTS: maximum capacitance = 0.05
CTS: maximum fanout = 2000
CTS: maximum buffer levels per net = 200 CTS: transition delay violation at sdram_clk CTS: user specified transition delay = worst[0.056 0.050] worst[0.056 0.050] CTS: constraint = worst[0.050 0.050] CTS: transition delay violation at buffd2_G1B1I1/Z ... CTS: Summary of clock tree violations:
© Synopsys 2012 49
CTS:

Summary

of

clock

tree

violations:

CTS: Total number of transition violations = 3994
CTS: Total number of capacitance violations = 1

DRC Fixing Beyond Exceptions • After embedded clock tree optimization, the tool will start fixing the
DRC violations beyond exceptions.
• The messages are similar to clustering:
CTS: fixing DRC beyond exception pins under clock CLK1
CTS: gate level 2 DRC fixing (exception level 1)
CTS: clock net = CLK1_G1IP
CTS: driving pin = bufbd2_G1IP_1/Z
CTS: gate level 2 design rule constraints [rise fall]
CTS: max transition = worst[0.100 0.100]
CTS: max capacitance = worst[0.600 0.600]
CTS: max fanout
=
2000
CTS:

max

fanout

2000
CTS: ----------------------------------------------- CTS: Starting clustering for bufbdf with target load = worst[0.056 0.056] CTS: Completed 4 to 1 clustering CTS: ----------------------------------------------- CTS: Starting clustering for bufbd7 with target load = worst[0.050 0.050]
11 i
CTS: Completed
1
to
1
cluster
i
ng
CTS: ------------------------------------------------
• After fixing the DRC violations, the whole summary and the clock-
by
-
clock summary of DRC fixing beyond exceptions are reported.
© Synopsys 2012 50
by
clock

summary

of

DRC

fixing

beyond

exceptions

are

reported.

Placement Legalization is Called AfterClockTreeSynthesis • When clock tree synthesis places a clock tree buffer or inverter, it After

Clock

Tree

Synthesis
places it at a legal location, but the location might be occupied Causes overlaps which needs to be resolved
• The tool calls the placement legalizer which moves the cells to
resolve the overlaps.
• After legalization, the cells with large displacement gets reported in
the log
Largest displacement cells:
Cell: periph/U122 (AND3X)
Input location: (906.380 1597.520)
Legal location: (897.140 1582.400)
Displacement: 17 720 um e g 3 52 row height
1 of 6 cells that
were displaced
Displacement:

17
.
720

um
,
e
.
g
.
3
.
52

row

height
.
Total 6 cells has large displacement (e.g. > 15.120 um or 3 row height)
© Synopsys 2012 51

Agenda • Prerequisites for Clock Tree Synthesis • Enabling Useful Debug Messages in IC Compiler Clock
Tree Synthesis
• Clock Tree Synthesis Log Messages • Clock Tree Optimization Log Messages
© Synopsys 2012 52

The optimize_clock_treeCommand LogFileMessages •O
p
timization o
p
tions
Log

File

Messages
pp
• Report before optimization
• Optimization
• Report after optimization
© Synopsys 2012 53

Standalone Optimization Using the optimize clock tree
Command
• Standalone optimization differs from embedded optimization in the optimize_clock_tree
Command

algorithms used
• Some of the lo
g
messa
g
es are similar to those of when
y
ou use the
gg y
compile_clock_treecommand
Design update information

Buffer characterization Buffer

characterization
Pruning of cells
List of cells used for clock tree optimization
© Synopsys 2012 54

CTS-352 Warning • The default delay calculation engine is Elmore. Elmore delay
calculation might lead to inferior accuracy in skew and latency
estimation.
• Enable the Arnoldi dela
y
calculation en
g
ine for more accurate dela
y

yg y
calculation during optimization, by using the following command:
set_delay_calculation –clock_arnoldi
• Otherwise, the optimize_clock_treecommand will issue the
following warning:
Warning: set_delay_calculation is currently set to 'elmore'.
'clock arnoldi' is suggested (CTS
352)
'clock
_
arnoldi'

is

suggested
.
(CTS
-
352)
© Synopsys 2012 55

Optimization Options • Before starting optimization, the optimize_clock_tree
dhidhiiiifh
comman
d
reports t
h
e root p
in an
d
t
h
e opt
im
izat
ion opt
ions
f
or eac
h

clock.
• The following are the options which you have specified, by using the
set clock tree optimization options
command
set
_
clock
_
tree
_
optimization
_
options
command
Initializing parameters for clock CLK2GC: Root pin: instCLK2GC/Q Root

pin:

instCLK2GC/Q
Using the following optimization options:
gate sizing : on
gate relocation : on
preserve levels : off area recovery : on relax insertion delay : off balance
rc
: off
© Synopsys 2012 56
balance

rc
:

off

PreoptimizationReport • Before the tool begins to optimize t he clock tree, it reports some of
the current characteristics of the clock tree:
*****************************************
*
Preoptimization
report (clock
'
CLK3
'
)*
Clock name
*

Preoptimization
report

(clock

CLK3 )

*
*****************************************
Corner max'
Estimated Skew (r/f/b) = (0.073 0.000 0.073)
Estimated Insertion Delay (r/f/b) = (1.903 -inf 1.903)
Corner 'RC-ONLY'
Clock

name
CTS corner
The starting skew and ID
for the clock as seen by
CTO
Estimated Skew (r/f/b) = (0.005 0.000 0.005)
Estimated Insertion Delay (r/f/b) = (0.008 -inf 0.008)
Wire capacitance = 0.8 pf
Total capacitance = 2.3 pf
Max transition = 0.448 ns
CTO

Maximum transition value p
resent in the clock tree
Cells = 24 (area=67.500000) Buffers = 23 (area=67.500000) Buffer Types ============
bufbd2: 1
bufbdf
:8
p
Information about the
buffers and inverters
ti th l kt
bufbdf
:

8
bufbd7: 5
bufbd4: 3
bufbd1: 6
presen
t

in
th
e c
loc
k

t
ree
© Synopsys 2012 57

Optimization Messages • During optimization, the tool prints out messages for sizing, insertion
and removal, and switching of metal layers:
Deleting cell I_SDRAM_TOP/bufbda_G1B1I10 and output net I_SDRAM_TOP/sdram_clk_G1B1I10.
iteration 1: (0.314104, 3.328620)
Total 1 buffers removed on clock CLK3
Start (3.256, 3.527), End (3.015, 3.329)
Buffer Removal
Start (sp,
lp
) : Initial delays
(skew, ID)
.... iteration 2: (0.313991, 3.314841) iteration 3: (0.308073, 3.295621) Total 2 cells sized on clock CLK3 Start (3 015, 3 329), End (2 988, 3 296)
Cell Sizing
Start

(sp,

lp
)

:

Initial

delays
End (sp, lp) : Final delays
sp: shortest path delay
lp: longest path delay
Start

(3
.
015,

3
.
329),

End

(2
.
988,

3
.
296)
....
iteration 6: (0.305181, 3.275623)
Total 1 delay buffers added on clock sck_in12 (LP)
Start (2.975, 3.283), End (2.970, 3.276)
Buffer Insertion
.... Switch to low metal layer for clock ‘CLK3': Total 9 out of 13 nets switched to low metal layer for clock ‘CLK3' with largest cap
change 0.00 percent
© Synopsys 2012 58
Metal layer switching

Optimization Messages
• If area recovery option is enabled, the tool does area recovery after
optimizing each clock and reports the changes made to that clock: optimizing

each

clock
,
and

reports

the

changes

made

to

that

clock:
Area recovery optimization for clock ‘CLK3':
15% 23% 30% 46% 53% 61% 76% 84% 92% 100%
Deleting cell cell I_SDRAM_TOP/bufbda_G1B1I9 and output net I_SDRAM_TOP/sdram_clk_G1B1I9.
Total 1 buffers removed (all paths) for clock ‘CLK3'
© Synopsys 2012 59

• After com
p
letin
g
the o
p
timization of a clock
,
the tool re
p
orts the new
Post Optimization Report
pg p , p
characteristics of the clock tree.
• This is similar to the information printed in before optimization: **************************************************
* Multicorner optimization report (clock 'CLK3') *
**************************************************
Corner ‘max'
Estimated Skew (r/f/b) = (0.041 0.000 0.041)
E ti t d I ti D l ( /f/b) (1 725
if
1 725)
E
s
ti
ma
t
e
d

I
nser
ti
on
D
e
l
ay
(
r
/f/b)
=
(1
.
725

-
i
n
f
1
.
725)
Corner 'RC-ONLY'
Estimated Skew (r/f/b) = (0.007 0.000 0.007)
Estimated Insertion Delay (r/f/b) = (0.009 -inf 0.009)
Wire capacitance = 0.8
p
f
Total capacitance = 2.3 pf Max transition = 0.356 ns
Cells = 24 (area=59.000000)
Buffers = 23 (area=59.000000) Buffer Types Buffer

Types
============
bufbd7: 4
bufbdf: 6
bufbd4: 5
© Synopsys 2012 60
bufbd1: 7 bufbd2: 1

Reporting the Longest and Shortest Paths
• The longest and shortest paths corresponding to all corners are reported,
soon after the post optimization report:
++ Longest path for clock CLK3 in corner 'max':
object fan cap trn inc arr r location
clk3 (port) 32 0 0 r ( 440 748)
clk3 (net) 13 97
… I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_3__8_/CP (senrq1)
167 4 289 r ( 521 520)
++ Shortest path for clock CLK3 in corner 'max':
object fan cap
trn
inc
arr
r location
object

fan

cap

trn
inc

arr
r

location
clk3 (port) 32 0 0 r ( 440 748)
clk3(net) 13 97

I_SDRAM_TOP/I_SDRAM_READ_FIFO/reg_array_reg_4__11_/CP (senrq1)
217 4 247 r ( 687 656) 217

4

247

r

(

687

656)
• Placement legalization related messages are located at the end of the
optimize_clock_treecommand log
© Synopsys 2012 61

Thank you
© Synopsys 2012 62

© Synopsys 2012 63
Tags