Multipliers in VLSI

21,753 views 23 slides Jan 19, 2016
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

its a presentation on multipliers..which shows various types and its usage


Slide Content

Sp09 CMPEN 411 L20 S.1
CMPEN 411
VLSI Digital Circuits
Spring 2009
Lecture 20: Multiplier Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003
J. Rabaey, A. Chandrakasan, B. Nikolic]

Sp09 CMPEN 411 L20 S.2
Review: Basic Building Blocks
Datapath
Execution units
-Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers
Multiplexers, decoders
Control
Finite state machines (PLA, ROM, random logic)
Interconnect
Switches, arbiters, buses
Memory
Caches (SRAMs), TLBs, DRAMs, buffers

Sp09 CMPEN 411 L20 S.3
The Binary Multiplication
x
+
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1

Sp09 CMPEN 411 L20 S.4
Multiply Operation
Multiplication is just a a lot of additions
multiplicand
multiplier
partial
product
array
double precision product
N
2N
N
can be formed in parallel

Sp09 CMPEN 411 L20 S.5
Multiplication Approaches
Right shift and add
Partial product array rows are accumulated from top to bottom
on an N-bit adder
-After each addition, right shift (by one bit) the accumulated partial
product to align it with the next row to add
Time for N bits T
serial_mult
= O(N

T
adder
) = O(N
2
) for a RCA
Making it faster
Use a faster adder
Use higher radix (e.g., base 4) multiplication – O(N/2 T
adder
)
-Use multiplier recoding to simplify multiple formation (booth)
Form the partial product array in parallel and add it in parallel
Making it smaller (i.e., slower)
Use serial-parallel mult
Use an array multiplier
-Very regular structure with only short wires to nearest neighbor
cells. Thus, very simple and efficient layout in VLSI Can be easily
and efficiently pipelined

Sp09 CMPEN 411 L20 S.6
Serial-parallel multiplier structure

Sp09 CMPEN 411 L20 S.7
The Array Multiplier
Y
0
Y
1
X
3
X
2
X
1
X
0
X
3
HA
X
2
FA
X
1
FA
X
0
HA
Y
2X
3
FA
X
2
FA
X
1
FA
X
0
HA
Z
1
Z
3
Z
6
Z
7
Z
5
Z
4
Y
3X
3
FA
X
2
FA
X
1
FA
X
0
HA

Sp09 CMPEN 411 L20 S.11
Booth multiplier
Encoding scheme to reduce number of stages in
multiplication.
Performs two bits of multiplication at once—requires half
the stages.
Each stage is slightly more complex than simple
multiplier, but adder/subtracter is almost as small/fast as
adder.

Sp09 CMPEN 411 L20 S.12
Booth encoding
Two’s-complement form of multiplier:
y = -2
n
y
n
+ 2
n-1
y
n-1
+ 2
n-2
y
n-2
+ ... (first bit is the sign bit)
(example, y=18=010010 y= -18 = 101110 )
Rewrite using 2
a
= 2
a+1
- 2
a
:
y = 2
n
(y
n-1
-y
n
) + 2
n-1
(y
n-2
-y
n-1
) + 2
n-2
(y
n-3
-y
n-2
) + ...
Consider first two terms: by looking at three bits of y, we
can determine whether to add x, 2x to partial product.

Sp09 CMPEN 411 L20 S.13
Booth actions
y
i
y
i-1
y
i-2
increment
0 0 0 0
0 0 1 x
0 1 0 x
0 1 1 2x
1 0 0 -2x
1 0 1 -x
1 1 0 -x
1 1 1 0
y = 2
n
(y
n-1
-y
n
) + 2
n-1
(y
n-2
-y
n-1
) + 2
n-2
(y
n-3
-y
n-2
) + ...
Consider first two terms: by looking at three bits of y, we
can determine whether to add x, 2x to partial product.

Sp09 CMPEN 411 L20 S.14
Booth example
x = 1001 (9
10
), y = 0111 (7
10
).
P
0
= 00000000
y
3
y
2
y
1
=011 y
1
y
0
y
-1
=11(0)
y
1
y
0
y
-1
= 110, P
1
= P
0
- (1001) = 11110111
 x shift left for 2 bits to be 100100


y
3
y
2
y
1
= 011, P
2
= P
1
+ (10*100100) =
11110111+01001000 = 001111111 (63
10
)
An array multiplier needs N addtions, booth multiplier
needs only N/2 additions

Sp09 CMPEN 411 L20 S.15
Review: A 64-bit Adder/Subtractor
1-bit
FA S
0
C
0
=C
in
C
1
1-bit
FA S
1
C
2
1-bit
FA S
2
C
3
C
64
=C
out
1-bit
FA S
63
C
63
.

.

.
Ripple Carry Adder (RCA)
built out of 64 FAs
Subtraction – complement
all subtrahend bits (xor
gates) and set the low
order carry-in
RCA
advantage: simple logic,
so small (low cost)
disadvantage: slow (O(N)
for N bits) and lots of
glitching (so lots of energy
consumption)
A
0
B
0
A
1
B
1
A
2
B
2
A
63
B
63
add/subt

Sp09 CMPEN 411 L20 S.16
Booth structure

Sp09 CMPEN 411 L20 S.17
Wallace-Tree Multiplier
6543210 6543210
Partial products First stage
Bit position
6543210 6543210
Second stage Final adder
FA HA
(a) (b)
(c) (d)

Sp09 CMPEN 411 L20 S.18
Wallace-Tree Multiplier
Partial products
First stage
Second stage
Final adder
FA FA FA
HA HA
FA
x
3y
3
z
7
z
6
z
5
z
4
z
3
z
2
z
1
z
0
x
3
y
2
x
2y
3
x
1
y
1
x
3
y
0
x
2
y
0
x
0y
1
x
0y
2
x
2
y
2
x
1y
3
x
1
y
2
x
3
y
1
x
0y
3 x
1y
0x
0y
Full adder = (3,2) compressor

Sp09 CMPEN 411 L20 S.19
Making it Faster: Tree Multiplier Structure
partial
product
array
reduction
tree
fast carry
propagate
adder (CPA)
P (product)
mux
+
reduction
tree (log N)
+
CPA (log N)
Q (‘ier)
D (‘icand)
D
D
D
0
0
0
0
multiple
forming
circuits
in
t
e
r
c
o
n
n
e
c
t

Sp09 CMPEN 411 L20 S.20
(4,2) Counter
Built out of two (3,2) counters (just FA’s!)
all of the inputs (4 external plus one internal) have
the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher
weight position (indicated by the )
(3,2)
(3,2)
Note: Two carry outs - one
“internal” and one “external”

Sp09 CMPEN 411 L20 S.22
Tiling (4,2) Counters
Reduces columns four high to columns only two high
Tiles with neighboring (4,2) counters
Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)

Sp09 CMPEN 411 L20 S.24
4x4 Partial Product Array Reduction

multiplicand
multiplier
partial
product
array
reduced pp
array (to
CPA)
double
precision
product
Fast 4x4 multiplication using (4,2) counters
How would you
lay it out?
five (4,2) counters
5-bit CPA
multiplicand
m
u
lt
ip
lie
r
8-bit product

Sp09 CMPEN 411 L20 S.25
8x8 Partial Product Array Reduction

‘icand
‘ier
partial
product
array
reduced
partial
product
array
 Wallace tree
multiplier
two rows of
nine (4,2)
counters
one row of
thirteen
(4,2)
counters
to a 13-bit fast CPA

Sp09 CMPEN 411 L20 S.26
An 8x8 Multiplier Layout
How should it be laid out?
multiplicand
m
u
lt
ip
lie
r
thirteen (4,2) counters
nine (4,2) counters
nine (4,2) counters
13-bit CPA

Sp09 CMPEN 411 L20 S.32
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Data encoding (Booth)
- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
- Logarithmic versus Linear (Wallace Tree Mult)

Sp09 CMPEN 411 L20 S.33
Next Lecture and Reminders
Next lecture
Shifters, decoders, and multiplexers
-Reading assignment – Rabaey, et al, 11.5-11.6
Tags