Computer organization and architecture arithmetic.ppt

vizhivasu1 24 views 96 slides Jul 23, 2024
Slide 1
Slide 1 of 96
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96

About This Presentation

Computer organization Digital logic


Slide Content

Apr. 2020 Computer Arithmetic, Multiplication Slide 1
Part III
Multiplication Number Representation
Numbers and Arithmetic
Representing Signed Numbers
Redundant Number Systems
Residue Number Systems
Addition / Subtraction
Basic Addition and Counting
Carry-Lookahead Adders
Variations in Fast Adders
Multioperand Addition
Multiplication
Basic Multiplication Schemes
High-Radix Multipliers
Tree and Array Multipliers
Variations in Multipliers
Division
Basic Division Schemes
High-Radix Dividers
Variations in Dividers
Division by Convergence
Real Arithmetic
Floating-Point Reperesentations
Floating-Point Operations
Errors and Error Control
Precise and Certifiable Arithmetic
Function Evaluation
Square-Rooting Methods
The CORDIC Algorithms
Variations in Function Evaluation
Arithmetic by Table Lookup
Implementation Topics
High-Throughput Arithmetic
Low-Power Arithmetic
Fault-Tolerant Arithmetic
Past, Present, and Future
Parts Chapters
I.
II.
III.
IV.
V.
VI.
VII.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
25.
26.
27.
28.
21.
22.
23.
24.
17.
18.
19.
20.
13.
14.
15.
16.
Elementary Operations

28. Reconfigurable Arithmetic
Appendix: Past, Present, and Future

Apr. 2020 Computer Arithmetic, Multiplication Slide 2
About This Presentation
Edition ReleasedRevisedRevisedRevisedRevised
First Jan. 2000Sep. 2001Sep. 2003Oct. 2005May 2007
Apr. 2008Apr. 2009
Second Apr. 2010Apr. 2011Apr. 2012Apr. 2015Apr. 2020
This presentation is intended to support the use of the textbook
ComputerArithmetic:Algorithms and Hardware Designs(Oxford
U. Press, 2nd ed., 2010, ISBN 978-0-19-532848-6). It is updated
regularly by the author as part of his teaching of the graduate
course ECE 252B, Computer Arithmetic, at the University of
California, Santa Barbara. Instructors can use these slides freely
in classroom teaching and for other educational purposes.
Unauthorized uses are strictly prohibited. ©BehroozParhami

Apr. 2020 Computer Arithmetic, Multiplication Slide 3
III Multiplication
TopicsinThisPart
Chapter9BasicMultiplicationSchemes
Chapter10High-RadixMultipliers
Chapter11TreeandArrayMultipliers
Chapter12VariationsinMultipliers
Review multiplication schemes and various speedup methods
•Multiplication is heavily used (in arith & array indexing)
•Division = reciprocation + multiplication
•Multiplication speedup: high-radix, tree, recursive
•Bit-serial, modular, and array multipliers

Apr. 2020 Computer Arithmetic, Multiplication Slide 4
“Well, well, for a rabbit, you’re not
very good at multiplying, are you?”

Apr. 2020 Computer Arithmetic, Multiplication Slide 5
9 Basic Multiplication Schemes
Chapter Goals
Studyshift/addorbit-at-a-timemultipliers
andsetthestageforfastermethodsand
variationstobecoveredinChapters10-12
Chapter Highlights
Multiplication=multioperandaddition
Hardware,firmware,softwarealgorithms
Multiplying2’s-complementnumbers
Thespecialcaseofoneconstantoperand

Apr. 2020 Computer Arithmetic, Multiplication Slide 6
Basic Multiplication Schemes: Topics
TopicsinThisChapter
9.1Shift/AddMultiplicationAlgorithms
9.2ProgrammedMultiplication
9.3BasicHardwareMultipliers
9.4MultiplicationofSignedNumbers
9.5MultiplicationbyConstants
9.6PreviewofFastMultipliers

Apr. 2020 Computer Arithmetic, Multiplication Slide 7
9.1 Shift/Add Multiplication Algorithms
Notationforourdiscussionofmultiplicationalgorithms:
a Multiplicand a
k–1a
k–2...a
1a
0
x Multiplier x
k–1x
k–2...x
1x
0
p Product(ax) p
2k–1p
2k–2...p
3p
2p
1p
0
Initially,weassumeunsignedoperands
Fig. 9.1 Multiplication of two 4-bit unsigned binary numbers in dot notation.Product
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier 

Apr. 2020 Computer Arithmetic, Multiplication Slide 8
Preferred
Multiplication Recurrence
Multiplication with right shifts: top-to-bottom accumulation
p
(j+1)
=(p
(j)
+ x
ja2
k
) 2
–1
withp
(0)
= 0and
|–––add–––| p
(k)
= p= ax + p
(0)
2
–k
|––shift right––|Product
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier 
Multiplication with left shifts: bottom-to-top accumulation
p
(j+1)
=2p
(j)
+ x
k–j–1a withp
(0)
= 0and
|shift| p
(k)
= p = ax + p
(0)
2
k
|––––add––––|
Fig. 9.1

Apr. 2020 Computer Arithmetic, Multiplication Slide 9
Why Premultiply the Multiplicand by 2
k
?
Additiontakesplacebetweenthedashedlinesinthefigurebelow
The0thPPiseventuallyshiftedrightbykbits,the1stbyk–1bits,…
EventhoughthecumulativePPwidensby1bitateachstep,
theadditionisalwayskbitswide
Fig. 9.1 Multiplication of two 4-bit unsigned binary numbers in dot notation.Product
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier 
p
(j+1)
=(p
(j)
+ x
ja2
k
) 2
–1
withp
(0)
= 0and

Apr. 2020 Computer Arithmetic, Multiplication Slide 10
Examples of Basic Multiplication
Fig. 9.2
Examples
of
sequential
multipli-
cation with
right and
left shifts.
Right-shift algorithm Left-shift algorithm
======================== =======================
a 1 0 1 0a 1 0 1 0
x 1 0 1 1x 1 0 1 1
======================== =======================
p
(0)
0 0 0 0 p
(0)
0 0 0 0
+x
0a 1 0 1 0 2p
(0)
0 0 0 0 0
––––––––––––––––––––––––– +x
3a 1 0 1 0
2p
(1)
0 1 0 1 0 ––––––––––––––––––––––––
p
(1)
0 1 0 1 0 p
(1)
0 1 0 1 0
+x
1a 1 0 1 0 2p
(1)
0 1 0 1 0 0
––––––––––––––––––––––––– +x
2a 0 0 0 0
2p
(2)
0 1 1 1 1 0 ––––––––––––––––––––––––
p
(2)
0 1 1 1 1 0p
(2)
0 1 0 1 0 0
+x
2a 0 0 0 0 2p
(2)
0 1 0 1 0 0 0
––––––––––––––––––––––––– +x
1a 1 0 1 0
2p
(3)
0 0 1 1 1 1 0––––––––––––––––––––––––
p
(3)
0 0 1 1 1 1 0p
(3)
0 1 1 0 0 1 0
+x
3a 1 0 1 0 2p
(3)
0 1 1 0 0 1 0 0
––––––––––––––––––––––––– +x
0a 1 0 1 0
2p
(4)
0 1 1 0 1 1 1 0––––––––––––––––––––––––
p
(4)
0 1 1 0 1 1 1 0p
(4)
0 1 1 0 1 1 1 0
======================== =======================
p
(j+1)
=(p
(j)
+ x
ja2
k
) 2
–1
|–––add–––|
|––shift right––|
1 0 1 0
Check:
10 11
= 110
= 64 + 32 +
8 + 4 + 2

Apr. 2020 Computer Arithmetic, Multiplication Slide 11
Examples of Basic Multiplication (Continued)
Fig. 9.2
Examples
of
sequential
multipli-
cation with
right and
left shifts.
Right-shift algorithm Left-shift algorithm
======================== =======================
a 1 0 1 0a 1 0 1 0
x 1 0 1 1x 1 0 1 1
======================== =======================
p
(0)
0 0 0 0 p
(0)
0 0 0 0
+x
0a 1 0 1 0 2p
(0)
0 0 0 0 0
––––––––––––––––––––––––– +x
3a 1 0 1 0
2p
(1)
0 1 0 1 0 ––––––––––––––––––––––––
p
(1)
0 1 0 1 0 p
(1)
0 1 0 1 0
+x
1a 1 0 1 0 2p
(1)
0 1 0 1 0 0
––––––––––––––––––––––––– +x
2a 0 0 0 0
2p
(2)
0 1 1 1 1 0 ––––––––––––––––––––––––
p
(2)
0 1 1 1 1 0p
(2)
0 1 0 1 0 0
+x
2a 0 0 0 0 2p
(2)
0 1 0 1 0 0 0
––––––––––––––––––––––––– +x
1a 1 0 1 0
2p
(3)
0 0 1 1 1 1 0––––––––––––––––––––––––
p
(3)
0 0 1 1 1 1 0p
(3)
0 1 1 0 0 1 0
+x
3a 1 0 1 0 2p
(3)
0 1 1 0 0 1 0 0
––––––––––––––––––––––––– +x
0a 1 0 1 0
2p
(4)
0 1 1 0 1 1 1 0––––––––––––––––––––––––
p
(4)
0 1 1 0 1 1 1 0p
(4)
0 1 1 0 1 1 1 0
======================== =======================
p
(j+1)
=2p
(j)
+ x
k–j–1a
|shift|
|––––add––––|
Check:
10 11
= 110
= 64 + 32 +
8 + 4 + 2

Apr. 2020 Computer Arithmetic, Multiplication Slide 12
9.2 Programmed Multiplication
Fig. 9.3Programmed
multiplication (right-shift
algorithm).
{Using right shifts, multiply unsigned m_cand and m_ier,
storing the resultant 2 k-bit product in p_high and p_low.
Registers: R0 holds 0 Rc for counter
Ra for m_cand Rx for m_ier
Rp for p_high Rq for p_low}
{Load operands into registers Ra and Rx}
mult: load Ra with m_cand
load Rx with m_ier
{Initialize partial product and counter}
copy R0 into Rp
copy R0 into Rq
load kinto Rc
{Begin multiplication loop}
m_loop: shift Rx right 1 {LSB moves to carry flag}
branch no_add if carry = 0
add Ra to Rp {carry flag is set to cout}
no_add: rotate Rp right 1 {carry to MSB, LSB to carry}
rotate Rq right 1 {carry to MSB, LSB to carry}
decr Rc {decrement counter by 1}
branch m_loop if Rc 0
{Store the product}
store Rp into p_high
store Rq into p_low
m_done: ...R0 RcCounter0
Ra Rx
Rp Rq
Multiplicand Multiplier
Product, high Product, low

Apr. 2020 Computer Arithmetic, Multiplication Slide 13
Time Complexity of Programmed Multiplication
Assumek-bitwords
kiterationsofthemainloop
6-7instructionsperiteration,dependingonthemultiplierbit
Thus,6k+3to7k+3machineinstructions,
ignoringoperandloadsandresultstore
k=32implies200
+
instructionsonaverage
Thisistooslowformanymodernapplications!
Microprogrammedmultiplywouldbesomewhatbetter

Apr. 2020 Computer Arithmetic, Multiplication Slide 14
9.3 Basic Hardware Multipliers
Fig. 9.4Hardware realization of the sequential multiplication
algorithm with additions and right shifts.Multiplier x
Mux
Adder
0
out c
0 1
Doublewidth partial product p
Multiplicand a
Shift
Shift
(j)
j x
x a j
k
k
k
p
(j+1)
=(p
(j)
+ x
ja2
k
) 2
–1
|–––add–––|
|––shift right––|

Apr. 2020 Computer Arithmetic, Multiplication Slide 15
Example of Hardware Multiplication
Fig. 9.4a Hardware realization of the sequential multiplication
algorithm with additions and right shifts.
1 0 1 1
1 0 1 0
0 0 0 01 0 1 0
1 0 1
0 1 0 1 01 1 1 1 0
1 0
0 1 1 1 1 0
1
0 0 1 1 1 1 01 1 0 1 1 1 00 1 1 0 1 1 1 0
(11)
ten
(10)
ten
(110)
ten
p
(j+1)
=(p
(j)
+ x
ja2
k
) 2
–1
|–––add–––|
|––shift right––|

Apr. 2020 Computer Arithmetic, Multiplication Slide 16
Performing Add and Shift in One Clock CyclePartial product p
(j)
k
Unused
part of the
multiplier x
Adder’s
carry-out
Adder’s sum
k
k – 1
k – 1
To mux control To adder
Fig. 9.5 Combining the loading and shifting of the
double-width register holding the partial product and
the partially used multiplier.

Apr. 2020 Computer Arithmetic, Multiplication Slide 17
Sequential Multiplication with Left Shifts
Fig. 9.4b Hardware realization of the sequential multiplication
algorithm with left shifts and additions.Multiplier x
Mux
2k-bit adder
0
out c
0 1
Doublewidth partial product p
Multiplicand a
Shift
Shift
(j)
k-j-1 x
a
2k
k k-j-1 x
2k

Apr. 2020 Computer Arithmetic, Multiplication Slide 18
9.4 Multiplication of
Signed Numbers
Fig. 9.6Sequential
multiplication of
2’s-complement
numbers with right
shifts (positive
multiplier).
============================
a 1 0 1 1 0
x 0 1 0 1 1
============================
p
(0)
0 0 0 0 0
+x
0a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(1)
1 1 0 1 1 0
p
(1)
1 1 0 1 1 0
+x
1a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(2)
1 1 0 0 0 1 0
p
(2)
1 1 0 0 0 1 0
+x
2a 0 0 0 0 0
–––––––––––––––––––––––––––––
2p
(3)
1 1 1 0 0 0 1 0
p
(3)
1 1 1 0 0 0 1 0
+x
3a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(4)
1 1 0 0 1 0 0 1 0
p
(4)
1 1 0 0 1 0 0 1 0
+x
4a 0 0 0 0 0
–––––––––––––––––––––––––––––
2p
(5)
1 1 1 0 0 1 0 0 1 0
p
(5)
1 1 1 0 0 1 0 0 1 0
============================
Negative multiplicand,
positive multiplier:
No change, other than
looking out for proper
sign extension
Check:

10 11
=

110
=

512 +
256 +
128 +
16 + 2

Apr. 2020 Computer Arithmetic, Multiplication Slide 19
The Case of a
Negative Multiplier
Fig. 9.7Sequential
multiplication of
2’s-complement
numbers with right
shifts (negative
multiplier).
============================
a 1 0 1 1 0
x 1 0 1 0 1
============================
p
(0)
0 0 0 0 0
+x
0a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(1)
1 1 0 1 1 0
p
(1)
1 1 0 1 1 0
+x
1a 0 0 0 0 0
–––––––––––––––––––––––––––––
2p
(2)
1 1 1 0 1 1 0
p
(2)
1 1 1 0 1 1 0
+x
2a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(3)
1 1 0 0 1 1 1 0
p
(3)
1 1 0 0 1 1 1 0
+x
3a 0 0 0 0 0
–––––––––––––––––––––––––––––
2p
(4)
1 1 1 0 0 1 1 1 0
p
(4)
1 1 1 0 0 1 1 1 0
+(-x
4a)0 1 0 1 0
–––––––––––––––––––––––––––––
2p
(5)
0 0 0 1 1 0 1 1 1 0
p
(5)
0 0 0 1 1 0 1 1 1 0
============================
Negative multiplicand,
negative multiplier:
In last step (the sign bit),
subtract rather than add
Check:

10 

11
= 110
= 64 + 32 +
8 + 4 + 2

Apr. 2020 Computer Arithmetic, Multiplication Slide 20
Signed 2’s-Complement Hardware Multiplier
Fig. 9.8 The 2’s-complement sequential hardware multiplier.
Adder
k + 1
0, except in
last cycle
01
Mux
k + 1
Enable
Select
Partial product
Multiplier
Multiplicand
k + 1
c
inc
out

Apr. 2020 Computer Arithmetic, Multiplication Slide 21
Booth’s Recoding
Table 9.1 Radix-2 Booth’s recoding
–––––––––––––––––––––––––––––––––––––
x
ix
i–1y
iExplanation
–––––––––––––––––––––––––––––––––––––
0 00No string of 1s in sight
0 11End of string of 1s in x
1 0
-
1Beginning of string of 1s in x
1 10Continuation of string of 1s in x
–––––––––––––––––––––––––––––––––––––
Example
1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 Operand x
(1)
-
1 0 1 0 0
-
1 1 0
-
1 1
-
1 1 0 0
-
1 0 Recoded version y
Justification
2
j
+ 2
j–1
+ . . . + 2
i+1
+ 2
i
= 2
j+1
–2
i

Apr. 2020 Computer Arithmetic, Multiplication Slide 22
Example Multiplication
with Booth’s Recoding
Fig. 9.9Sequential
multiplication of
2’s-complement
numbers with right
shifts by means of
Booth’s recoding.
============================
a 1 0 1 1 0
x 1 0 1 0 1 Multiplier
y
-
1 1
-
1 1
-
1 Booth-recoded
============================
p
(0)
0 0 0 0 0
+y
0a 0 1 0 1 0
–––––––––––––––––––––––––––––
2p
(1)
0 0 1 0 1 0
p
(1)
0 0 1 0 1 0
+y
1a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(2)
1 1 1 0 1 1 0
p
(2)
1 1 1 0 1 1 0
+y
2a 0 1 0 1 0
–––––––––––––––––––––––––––––
2p
(3)
0 0 0 1 1 1 1 0
p
(3)
0 0 0 1 1 1 1 0
+y
3a 1 0 1 1 0
–––––––––––––––––––––––––––––
2p
(4)
1 1 1 0 0 1 1 1 0
p
(4)
1 1 1 0 0 1 1 1 0
y
4a 0 1 0 1 0
–––––––––––––––––––––––––––––
2p
(5)
0 0 0 1 1 0 1 1 1 0
p
(5)
0 0 0 1 1 0 1 1 1 0
============================
––––––––––
x
ix
i–1y
i
––––––––––
0 00
0 11
1 0
-
1
1 10
––––––––––
Check:

10 

11
= 110
= 64 + 32 +
8 + 4 + 2

Apr. 2020 Computer Arithmetic, Multiplication Slide 23
9.5 Multiplication by Constants
Explicit, e.g.y := 12 *x + 1
Implicit, e.g. A[i, j] := A[i, j] + B[i, j]
Address of A[i, j] = base + n *i + j
Software aspects:
Optimizing compilers replace multiplications by shifts/adds/subs
Produce efficient code using as few registers as possible
Find the best code by a time/space-efficient algorithm
0 1 2 . . . n–1
0
1
2
.
.
.
m–1
Row i
Column j
Hardware aspects:
Synthesize special-purpose units such as filters
y[t] = a
0x[t] + a
1x[t–1] + a
2x[t–2] + b
1y[t–1] + b
2y[t–2]

Apr. 2020 Computer Arithmetic, Multiplication Slide 24
Multiplication Using Binary Expansion
Example: Multiply R1 by the constant 113 = (1 1 1 0 0 0 1)
two
R2  R1 shift-left 1
R3  R2 + R1
R6  R3 shift-left 1
R7  R6 + R1
R112  R7 shift-left 4
R113  R112 + R1
Shift, addShift
Ri: Register that contains itimes (R1)
This notation is for clarity; only one
register other than R1 is needed
Shorter sequence using shift-and-add instructions
R3  R1 shift-left 1 + R1
R7  R3 shift-left 1 + R1
R113  R7 shift-left 4 + R1

Apr. 2020 Computer Arithmetic, Multiplication Slide 25
Multiplication via Recoding
Example: Multiply R1 by 113 = (1 1 1 0 0 0 1)
two= (1 0 0
-
1 0 0 0 1)
two
R8  R1 shift-left 3
R7  R8 –R1
R112  R7 shift-left 4
R113  R112 + R1
Shift, add
Shift
Shorter sequence using shift-and-add/subtract instructions
R7  R1 shift-left 3 –R1
R113  R7 shift-left 4 + R1
Shift, subtract
6 shift or add (3 shift-and-add) instructions needed without recoding
The canonic signed-digit representation of a number contains no
consecutive nonzero digits: average number of shift-adds is O(k/3)

Apr. 2020 Computer Arithmetic, Multiplication Slide 26
Multiplication via Factorization
Example: Multiply R1 by 119 = 7 17
= (8 –1) (16 + 1)
R8  R1shift-left3
R7  R8–R1
R112 R7shift-left4
R119  R112 + R7
Shorter sequence using shift-and-add/subtract instructions
R7  R1 shift-left 3 –R1
R119  R7 shift-left 4 + R7
119 = (1 1 1 0 1 1 1)
two= (1 0 0 0
-
1 0 0
-
1)
two
More instructions may be needed without factorization
Requires a scratch register
for holding the 7 multiple
CSA
CPA
128x8xx
119x
1
1

Apr. 2020 Computer Arithmetic, Multiplication Slide 27
Multiplication by Multiple Constants
Example: Multiplying a number by 45, 49, and 65
R9  R1shift-left3+R1
R45  R9 shift-left 2 + R9
R7  R1shift-left3–R1
R49  R7 shift-left 3 –R7
R65  R1 shift-left 6 + R1
A combined solution for all three constants
R65  R1 shift-left 6 + R1
R49  R65 –R1 left-shift 4
R45  R49 –R1 left-shift 2
Separate solutions:
5 shift-add/subtract
operations
A programmable
block can perform
any of the three
multiplications

Apr. 2020 Computer Arithmetic, Multiplication Slide 28
9.6 Preview of Fast Multipliers
Viewingmultiplicationasamultioperandadditionproblem,
therearebuttwowaystospeeditup
a.Reducingthenumberofoperandstobeadded:
Handlingmorethanonemultiplierbitatatime
(high-radixmultipliers,Chapter10)
b.Addingtheoperandsfaster:
Parallel/pipelinedmultioperandaddition
(treeandarraymultipliers,Chapter11)
InChapter12,wecoverallremainingmultiplicationtopics:
Bit-serialmultipliers
Modularmultipliers
Multiply-addunits
Squaringas a special case

Apr. 2020 Computer Arithmetic, Multiplication Slide 29
10 High-Radix Multipliers
Chapter Goals
Studytechniquesthatallowustohandle
morethanonemultiplierbitineachcycle
(twobitsinradix4,threeinradix8,...)
ChapterHighlights
Highradixgivesriseto“difficult”multiples
Recoding(changeofdigit-set)asremedy
Carry-saveadditionreducescycletime
Implementationandoptimizationmethods

Apr. 2020 Computer Arithmetic, Multiplication Slide 30
High-Radix Multipliers: Topics
TopicsinThisChapter
10.1Radix-4Multiplication
10.2ModifiedBooth’sRecoding
10.3UsingCarry-SaveAdders
10.4Radix-8andRadix-16Multipliers
10.5MultibeatMultipliers
10.6VLSIComplexityIssues

Apr. 2020 Computer Arithmetic, Multiplication Slide 31
10.1 Radix-4 Multiplication
Preferred
Multiplication with right shifts in radix r: top-to-bottom accumulation
p
(j+1)
=(p
(j)
+ x
jar
k
)r
–1
withp
(0)
= 0and
|–––add–––| p
(k)
= p= ax + p
(0)
r
–k
|––shift right––|
Multiplication with left shifts in radix r: bottom-to-top accumulation
p
(j+1)
=rp
(j)
+ x
k–j–1a withp
(0)
= 0and
|shift| p
(k)
= p = ax + p
(0)
r
k
|––––add––––|
Fig. 9.1
(modified)Product
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier 
x
0ar
0
x
1ar
1
x
2ar
2
x
3ar
3

Apr. 2020 Computer Arithmetic, Multiplication Slide 32
Radix-4 Multiplication in Dot Notation
Number of cycles is
halved, but now the
“difficult” multiple 3a
must be dealt withProduct
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier  Multiplier x
p Product
Multiplicand a
(x x ) a 4
1
3 2 two
4
0
a (x x )
1 0 two

Fig. 9.1
Fig. 10.1 Radix-4,
or two-bit-at-a-time,
multiplication in dot
notation

Apr. 2020 Computer Arithmetic, Multiplication Slide 33
A Possible Design for a Radix-4 Multiplier
Precomputed via
shift-and-add
(3a= 2a+ a)
k/2 + 1 cycles, rather than k
One extra cycle over k/2
not too bad, but we would like
to avoid it if possible
Solving this problem for radix 4
may also help when dealing
with even higher radices0a2a
3a
Multiplier
To the adder
2-bit shifts
00 01 10 11
Mux
x
i+1x
i
Fig. 10.2 The multiple
generation part of a radix-4
multiplier with
precomputation of 3a.

Apr. 2020 Computer Arithmetic, Multiplication Slide 34
Example Radix-4 Multiplication Using 3a
================================
a 0110
3a 010010
x 1110
================================
p
(0)
0000
+(x
1x
0)
twoa 001100
–––––––––––––––––––––––––––––––––
4p
(1)
001100
p
(1)
001100
+(x
3x
2)
twoa 010010
–––––––––––––––––––––––––––––––––
4p
(2)
01010100
p
(2)
01010100
================================
Fig. 10.3 Example of
radix-4 multiplication
using the 3amultiple.Multiplier x
p Product
Multiplicand a
(x x ) a 4
1
3 2 two
4
0
a (x x )
1 0 two

Apr. 2020 Computer Arithmetic, Multiplication Slide 35
A Second Design for a Radix-4 Multiplier
x
i+1
x
i
c Muxcontrol Setcarry
---- --- --- ----------------------------
0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 0 1 0
0 1 1 1 0 0
1 0 0 1 0 0
1 0 1 1 1 1
1 1 0 1 1 1
1 1 1 0 0 1
Fig. 10.4 The multiple generation
part of a radix-4 multiplier based on
replacing 3awith 4a(carry into next
higher radix-4 multiplier digit) and –a.0a2a–a
Multiplier
To the adder
+c
FF
Set if = = 1
or if = c = 1
c
00 01 10 11
Mux
2-bit shifts
mod 4
Carry
xi+1xi
xi+1
xi+1
xi
x
i+1(x
ic)
x
i+1x
ic
x
ic
c

Apr. 2020 Computer Arithmetic, Multiplication Slide 36
10.2 Modified Booth’s Recoding
Table 10.1 Radix-4 Booth’s recoding yielding (z
k/2. . . z
1z
0)
four
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
x
i+1 x
i x
i–1 y
i+1 y
i z
i/2 Explanation
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
0 0 0 0 0 0 No string of 1s in sight
0 0 1 0 1 1 Endofstringof1s
0 1 0 0 1 1 Isolated1
0 1 1 1 0 2 Endofstringof1s
1 0 0
-
1 0
-
2 Beginningofstringof1s
1 0 1
-
1 1
-
1 Endastring,beginnewone
1 1 0 0
-
1
-
1 Beginningofstringof1s
1 1 1 0 0 0 Continuationofstringof1s
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
(1)
-
2 2
-
1 2
-
1
-
1 0
-
2 Radix-4 version z
Context
Recoded
radix-2 digits
Radix-4 digit
Example
1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 Operand x
(1)
-
1 0 1 0 0
-
1 1 0
-
1 1
-
1 1 0 0
-
1 0 Recoded version y

Apr. 2020 Computer Arithmetic, Multiplication Slide 37
Example Multiplication via Modified Booth’s Recoding
================================
a 0110
x 1010
z
-
1
-
2Radix-4
================================
p
(0)
000000
+z
0a 110100
–––––––––––––––––––––––––––––––––
4p
(1)
110100
p
(1)
11110100
+z
1a 111010
–––––––––––––––––––––––––––––––––
4p
(2)
11011100
p
(2)
11011100
================================
Fig. 10.5 Example of
radix-4 multiplication
with modified Booth’s
recoding of the 2’s-
complement multiplier.Multiplier x
p Product
Multiplicand a
(x x ) a 4
1
3 2 two
4
0
a (x x )
1 0 two

Apr. 2020 Computer Arithmetic, Multiplication Slide 38
Multiple Generation with Radix-4 Booth’s Recoding
Fig. 10.6 The multiple generation part of a radix-4
multiplier based on Booth’s recoding.
Could have named
this signal one/twotwo non0
a 2a
Enable
Select
z a
neg
ii+1 i–1
i/2
0 1
Mux
k+1
0, a, or 2a
To adder input
Add/subtract
control
x
Multiplier
xx
Recoding Logic
Multiplicand
0
k
0
2-bit shift
Init. 0
Sign
of a
----Encoding ----
Digitnegtwonon0
–2 1 1 1
–1 1 0 1
0 0 0 0
1 0 0 1
2 0 1 1

Apr. 2020 Computer Arithmetic, Multiplication Slide 39
10.3 Using Carry-Save Adders
Fig. 10.7 Radix-4 multiplication with a carry-save adder used to
combine the cumulative partial product, x
ia, and 2x
i+1ainto two numbers.M ux
0
2a
0 a
Multiplier
New Cumulative Partial Product
Old Cumulative
Partial Product
CSA
M ux
x
i+1x
i
Adder

Apr. 2020 Computer Arithmetic, Multiplication Slide 40
Keeping the Partial Product in Carry-Save Form
Fig. 10.8 Radix-2 multiplication with
the upper half of the cumulative partial
product kept in stored-carry form.0
Multiplier
k
k
k-Bit CSA
k
Partial Product
k
Mux
k-Bit Adder
Mux
Multiplicand
Carry
Sum
Upper half of PPLower half of PP
Right
shift
Sum
Carry
Sum
Carry
(a) Multiplier
block diagram (b) Operation in a typical cycle

Apr. 2020 Computer Arithmetic, Multiplication Slide 41
Carry-Save Multiplier with Radix-4 Booth’s Recoding
Fig. 10.9 Radix-4 multiplication with a CSA used to combine the
stored-carry cumulative partial product and z
i/2ainto two numbers.a

Multiplier

x

i+1

x

i

Adder

New cumulative
partial product

Old cumulative
partial product

FF

2-bit

Adder

To the lower half
of partial product

Booth recoder
and selector

CSA

x

i-1

z a

i/2

Extra “dot”

Apr. 2020 Computer Arithmetic, Multiplication Slide 42
Radix-4 Booth’s Recoding for Parallel Multiplication
Fig. 10.10 Booth
recoding and multiple
selection logic for
high-radix or parallel
multiplication.x x x x
Recoding Logic
two non0
a 2a
Enable
Select
z a
neg
ii+1 i–1
i/2
i–2
0 1
Mux
k+1
0, a, or 2a
k+2
Selective Complement
0, a, –a, 2a, or –2a
Extra "Dot"
for Column i
x
i+2

Apr. 2020 Computer Arithmetic, Multiplication Slide 43
Yet Another Design for Radix-4 Multiplication
Fig. 10.11 Radix-4 multiplication,
with the cumulative partial product,
x
ia, and 2x
i+1acombined into two
numbers by two CSAs.M ux
0 2a
0 a
Multiplier
CSA
M ux
x
i+1
x
i
Adder
CSA
New Cumulative
Partial Product
Old Cumulative
Partial Product
FF
2-Bit
Adder
To the Lower Half
of Partial Product
(4; 2)-counter

Apr. 2020 Computer Arithmetic, Multiplication Slide 44
10.4 Radix-8 and Radix-16 Multipliers
Fig. 10.12 Radix-16
multiplication with the
upper half of the
cumulative partial
product in carry-save
form. Multiplier
CSA CSA
CSA
CSA
Partial Product
(Upper Half)
Mux
0 8a
Mux
04a
Mux
02a
Mux
0a
xi+3
xi+2
xi+1
xi
Carry
Sum
4-Bit
Shift
FF
To the Lower Half
of Partial Product
3
4-Bit
Adder
4
4
4-bit
right
shift

Apr. 2020 Computer Arithmetic, Multiplication Slide 45
Other High-Radix MultipliersMultiplier
CSA CSA
CSA
CSA
Partial Product
(Upper Half)
Mux
0 8a
Mux
04a
Mux
02a
Mux
0a
xi+3
xi+2
xi+1
xi
Carry
Sum
4-Bit
Shift
FF
To the Lower Half
of Partial Product
3
4-Bit
Adder
4
4
Fig. 10.12
A radix-16 multiplier design
becomes a radix-256
multiplier if radix-4 Booth’s
recoding is applied first
(the muxes are replaced by
Booth recoding and multiple
selection logic)
Remove this mux & CSA and
replace the 4-bit shift (adder)
with a 3-bit shift (adder) to get
a radix-8 multiplier (cycle time
will remain the same, though)

Apr. 2020 Computer Arithmetic, Multiplication Slide 46
A Spectrum of Multiplier Design ChoicesBasic
binary
Adder
Adder
Next
multiple
Partial product
...
Several
multiples
Adder
. . .
All multiples
Small CSA
tree
Full CSA
tree
High-radix
or
partial tree
Full
treeSpeed up Economize
Partial product
Fig. 10.13 High-radix multipliers as intermediate
between sequential radix-2 and full-tree multipliers.

Apr. 2020 Computer Arithmetic, Multiplication Slide 47
10.5 Multibeat Multipliers
Observation: Half of the
clock cycle goes to waste
Fig. 10.15 Two-phase clocking for sequential logic.Next-state
logic
State
flip-flops
Inputs
Next-state
excitation
Present
state
Next-state
logic
State
latches
Inputs
Next-state
logic
Inputs
State
latches
PH1
PH2 CLK
(a) Sequential machine with FFs (b) Sequential machine with latches and 2-phase clock
Once cycle
Begin changing FF contents
Change becomes visible at FF output

Apr. 2020 Computer Arithmetic, Multiplication Slide 48
Twin-Beat and Three-Beat Multipliers
This radix-64 multiplier
runs at the clock rate of a
radix-8 design (2X speed)
Fig. 10.14 Twin-beat multiplier
with radix-8 Booth’s recoding.Adder
CSA
Sum
Carry
CSA
Sum
Carry
FF
To the Lower Half
of Partial Product
6-Bit
Adder
6
65
Pipelined
Radix-8
Booth
Recoder
& Selector
3a a 3a a
4 4
Twin Multiplier
Registers
Pipelined
Radix-8
Booth
Recoder
& Selector
Fig. 10.16 Conceptual view
of a three-beat multiplier.

Apr. 2020 Computer Arithmetic, Multiplication Slide 49
10.6 VLSI Complexity Issues
Aradix-2
b
multiplierrequires:
bktwo-inputANDgatestoformthepartialproductsbit-matrix
O(bk)areafortheCSAtree
AtleastQ(k)areaforthefinalcarry-propagateadder
Totalarea:A=O(bk)
Latency: T=O((k/b)logb+logk)
Any VLSI circuit computing the product of two k-bit integers must
satisfy the following constraints:
ATgrows at least as fast as k
3/2
AT
2
is at least proportional to k
2
The preceding radix-2
b
implementations are suboptimal, because:
AT= O(k
2
log b+ bklog k)
AT
2
= O((k
3
/b) log
2
b)

Apr. 2020 Computer Arithmetic, Multiplication Slide 50
Comparing High-and Low-Radix Multipliers
IntermediatedesignsdonotyieldbetterATorAT
2
values;
Themultipliersremainasymptoticallysuboptimalforanyb
Low-Cost
b=O(1)
High Speed
b= O(k)
AT-or AT
2
-
Optimal
AT O(k
2
) O(k
2
log k) O(k
3/2
)
AT
2
O(k
3
) O(k
2
log
2
k) O(k
2
)
AT= O(k
2
log b+ bklog k) AT
2
= O((k
3
/b) log
2
b)
By the ATmeasure (indicator of cost-effectiveness), slower radix-2
multipliers are better than high-radix or tree multipliers
Thus, when an application requires many independent multiplications,
it is more cost-effective to use a large number of slower multipliers
High-radix multiplier latency can be reduced from O((k/b) logb+ logk)
to O(k/b+ logk) through more effective pipelining (Chapter 11)

Apr. 2020 Computer Arithmetic, Multiplication Slide 51
11 Tree and Array Multipliers
ChapterGoals
Studythedesignofmultipliersforhighest
possibleperformance(speed,throughput)
ChapterHighlights
Treemultiplier=reductiontree
+redundant-to-binaryconverter
Avoidingfullsignextensioninmultiplying
signednumbers
Arraymultiplier=one-sidedreductiontree
+ripple-carryadder

Apr. 2020 Computer Arithmetic, Multiplication Slide 52
Tree and Array Multipliers: Topics
TopicsinThisChapter
11.1.Full-TreeMultipliers
11.2.AlternativeReductionTrees
11.3.TreeMultipliersforSignedNumbers
11.4.Partial-TreeandTruncatedMultipliers
11.5.ArrayMultipliers
11.6.PipelinedTreeandArrayMultipliers

Apr. 2020 Computer Arithmetic, Multiplication Slide 53
11.1 Full-Tree MultipliersBasic
binary
Adder
Adder
Next
multiple
Partial product
...
Several
multiples
Adder
. . .
All multiples
Small CSA
tree
Full CSA
tree
High-radix
or
partial tree
Full
treeSpeed up Economize
Partial product
Fig. 10.13 High-radix multipliers
as intermediate between sequential
radix-2 and full-tree multipliers.Higher-order
product bits
Multiplier
a
a
a
a
. . .
. . .
Some lower-order
product bits are
generated directly
Redundant result
Redundant-to-Binary
Converter
Multiple-
Forming
Circuits
(Multi-Operand
Addition Tree)
Partial-Products
Reduction Tree
Fig. 11.1 General structure of a full-tree multiplier.Multiplier x
p Product
Multiplicand a
(x x ) a 4
1
3 2 two
4
0
a (x x )
1 0 two

Apr. 2020 Computer Arithmetic, Multiplication Slide 54
Full-Tree versus Partial-Tree Multiplier
Schematic diagrams for full-tree and partial-tree multipliers.Adder
Large tree of
carry-save
adders
. . .
All partial products
Product
Adder
Small tree of
carry-save
adders
. . .
Several partial products
Product
Log-
depth
Log-
depth

Apr. 2020 Computer Arithmetic, Multiplication Slide 55
Variations in Full-Tree Multiplier Design
Designs are distinguished by
variations in three elements:Higher-order
product bits
Multiplier
a
a
a
a
. . .
. . .
Some lower-order
product bits are
generated directly
Redundant result
Redundant-to-Binary
Converter
Multiple-
Forming
Circuits
(Multi-Operand
Addition Tree)
Partial-Products
Reduction Tree
Fig. 11.1
2. Partial products reduction tree
3. Redundant-to-binary converter
1. Multiple-forming circuits

Apr. 2020 Computer Arithmetic, Multiplication Slide 56Product
Partial
products
bit-matrix
a
x
p
2

x a

0

0
1
x a 2

1

x a 2

2

2
2

3

3

x a

Multiplicand
Multiplier 
Example of Variations in CSA Tree Design
1 2 3 4 3 2 1
FA FA FA HA
--------------------
1 3 2 3 2 1 1
FA HA FA HA
----------------------
2 2 2 2 1 1 1
4-Bit Adder
----------------------
1 1 1 1 1 1 1 1
Wallace Tree
(5 FAs + 3 HAs + 4-Bit Adder)

1 2 3 4 3 2 1
FA FA
--------------------
1 3 2 2 3 2 1
FA HA HA FA
----------------------
2 2 2 2 1 2 1
6-Bit Adder
----------------------
1 1 1 1 1 1 1 1
Dadda Tree
(4 FAs + 2 HAs + 6-Bit Adder)
Fig. 11.2 Two different binary 4 4 tree multipliers.
HA
3
HA
3
FAFA HA
Corrections
shown in red
2

Apr. 2020 Computer Arithmetic, Multiplication Slide 57
Details of a CSA Tree
Fig. 11.3 Possible
CSA tree for a 7 7
tree multiplier.
CSA trees are quite
irregular, causing
some difficulties in
VLSI realization10-bit CPA
7-bit CSA 7-bit CSA
7-bit CSA
10-bit CSA
2Ignore
The index pair
[i, j] means that
bit positions
from i up to j
are involved.
7-bit CSA
[0, 6]
[1, 7]
[2, 8]
[6, 12]
[3, 11] [1,8]
[3, 9]
[4, 10]
[5, 11]
[2, 8] [5, 11]
[6, 12]
[2,12]
[3, 12]
[4,13] [4,12]
[4, 13]
[3,9]
3
[3,12]
[2, 8]
[3,12]
[1, 6]
01
Thus, our motivation
to examine alternate
methods for partial
products reduction

Apr. 2020 Computer Arithmetic, Multiplication Slide 58
11.2 Alternative Reduction Trees
Fig. 11.4
A slice of a
balanced-delay
tree for 11
inputs.
FA FA FA
FA FA
FA FA
FA
FA
Inputs
Level-1
carries
Level-2
carries
Level-3
carries
Level-4
carry
Outputs
FA
FA
FA
FA
FA
FA
FA
FA
FA
11 + y
1= 2y
1+ 3
Therefore, y
1= 8
carries are needed
Level
1
Level
5
Level
4
Level
3
Level
2

Apr. 2020 Computer Arithmetic, Multiplication Slide 59
Binary Tree of 4-to-2 Reduction Modules
Due to its recursive structure, a binary tree is more regular
than a 3-to-2 reduction tree when laid out in VLSI
Fig. 11.5 Tree multiplier with a more regular
structure based on 4-to-2 reduction modules.
(a) Binary tree of (4; 2)-counters
4-to-2 4-to-2 4-to-2 4-to-2
4-to-2 4-to-2
4-to-2
(b) Realization with FAs(c) A faster realization
FA
FA
c s c s
0 1
0 1
4-to-2 compressor

Apr. 2020 Computer Arithmetic, Multiplication Slide 60
Example Multiplier with 4-to-2 Reduction Tree
Fig. 11.6 Layout of a partial-products reduction tree composed of
4-to-2 reduction modules. Each solid arrow represents two numbers. M u l t i p l i c a n d
. . .
Redundant-to-binary converter
Multiple
generation
circuits
M u l t i p l e s e l e c t i o n s i g n a l s

Even if 4-to-2 reduction
is implemented using
two CSA levels, design
regularity potentially
makes up for the larger
number of logic levels
Similarly,
using Booth’s
recoding may
not yield any
advantage,
because it
introduces
irregularity

Apr. 2020 Computer Arithmetic, Multiplication Slide 61
11.3 Tree Multipliers for Signed Numbers
From Fig. 8.19a Sign extension in multioperand addition.
----------Extendedpositions----------SignMagnitudepositions---------
x
k–1 x
k–1 x
k–1 x
k–1 x
k–1 x
k–1x
k–2 x
k–3 x
k–4 ...
y
k–1 y
k–1 y
k–1 y
k–1 y
k–1 y
k–1y
k–2 y
k–3 y
k–4 ...
z
k–1 z
k–1 z
k–1 z
k–1 z
k–1 z
k–1z
k–2 z
k–3 z
k–4 ...



x 

















x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x x
FA FA FA FA FA FA
Five redundant copies
removed
Sign extensions
Signs
The difference in
multiplication is the
shifting sign positions
Fig. 11.7 Sharing of full adders to reduce
the CSA width in a signed tree multiplier.

Apr. 2020 Computer Arithmetic, Multiplication Slide 62
Using the Negative-Weight
Property of the Sign Bit
Fig. 11.8 Baugh-Wooley
2’s-complement multiplication.
Sign extension is a way of
converting negatively weighted bits
(negabits) to positively weighted
bits (posibits) to facilitate reduction,
but there are other methods of
accomplishing the same without
introducing a lot of extra bits
Baugh and Wooley have
contributed two such methods 4 3 2 1 0
4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

a x a x a x a x a x

a a a a a
x x x x x

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4




a a a a a
x x x x x

----------------------------
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x

a x -a x -a x -a x -a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a a
1 x x
--------------------------------------------------------
-

p p p p p p p p p p


---------------------------

a x a x a x a x a x

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x


--------------------------------------------------------
-

p p p p p p p p p p


1 1

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4
4 4
4 4

4 3 2 1 0

4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4







9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

a. Unsigned

b. 2's-complement

c. Baugh-Wooley

d. Modified B-W
__
__
__
__
__ __ __ __
_
_
_
_
_ _ _ _

Apr. 2020 Computer Arithmetic, Multiplication Slide 63
Fig. 11.8 4 3 2 1 0
4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

a x a x a x a x a x

a a a a a
x x x x x

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4




a a a a a
x x x x x

----------------------------
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x

a x -a x -a x -a x -a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a a
1 x x
--------------------------------------------------------
-

p p p p p p p p p p


---------------------------

a x a x a x a x a x

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x


--------------------------------------------------------
-

p p p p p p p p p p


1 1

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4
4 4
4 4

4 3 2 1 0

4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4







9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

a. Unsigned

b. 2's-complement

c. Baugh-Wooley

d. Modified B-W
__
__
__
__
__ __ __ __
_
_
_
_
_ _ _ _
The Baugh-Wooley Method and Its Modified Form
–a
4x
0= a
4(1 –x
0) –a
4
= a
4x
0–a
4
–a
4a
4x
0
a
4
In next column
–a
4x
0= (1 –a
4x
0) –1
= (a
4x
0)–1
–1(a
4x
0)
1
In next column

Apr. 2020 Computer Arithmetic, Multiplication Slide 64
Alternate Views of the
Baugh-Wooley Methods
+ 0 0 –a
4x
3–a
4x
2–a
4x
1–a
4x
0
+ 0 0 –a
3x
4–a
2x
4–a
1x
4–a
0x
4
--------------------------------------------
–0 0 a
4x
3a
4x
2a
4x
1a
4x
0
–0 0 a
3x
4a
2x
4a
1x
4a
0x
4
--------------------------------------------
+ 1 1 a
4x
3a
4x
2a
4x
1a
4x
0
+ 1 1 a
3x
4a
2x
4a
1x
4a
0x
4
1
1
--------------------------------------------
+ a
4a
4a
4x
3a
4x
2a
4x
1a
4x
0
+ x
4x
4 a
3x
4a
2x
4a
1x
4a
0x
4
a
4
x
4
--------------------------------------------
a
4
1 x
4 4 3 2 1 0
4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

a x a x a x a x a x

a a a a a
x x x x x

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4




a a a a a
x x x x x

----------------------------
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x
-a x a x a x a x a x

a x -a x -a x -a x -a x
--------------------------------------------------------
-

p p p p p p p p p p


a a a a a
x x x x x
----------------------------

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x
a a
1 x x
--------------------------------------------------------
-

p p p p p p p p p p


---------------------------

a x a x a x a x a x

a x a x a x a x a x
a x a x a x a x a x
a x a x a x a x a x


--------------------------------------------------------
-

p p p p p p p p p p


1 1

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4
4 4
4 4

4 3 2 1 0

4 3 2 1 0

4 3 2 1 0
4 3 2 1 0

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4

4 0 3 0 2 0 1 0 0 0
4 1 3 1 2 1 1 1 0 1
4 2 3 2 2 2 1 2 0 2
4 3 3 3 2 3 1 3 0 3
4 4 3 4 2 4 1 4 0 4







9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

9 8 7 6 5 4 3 2 1 0

a. Unsigned

b. 2's-complement

c. Baugh-Wooley

d. Modified B-W
__
__
__
__
__ __ __ __
_
_
_
_
_ _ _ _

Apr. 2020 Computer Arithmetic, Multiplication Slide 65
11.4 Partial-Tree and Truncated Multipliers
Fig. 11.9 General structure
of a partial-tree multiplier.. . .

CSA Tree

h inputs

Adder

Lower part of
the cumulative
partial product

FF

h-Bit
Adder

Sum

Carry

Upper part of
the cumulative
partial product
(stored-carry)

High-radix versus partial-tree
multipliers: The difference is
quantitative, not qualitative
For small h, say 8 bits,
we view the multiplier of
Fig. 11.9 as high-radix
When his a significant
fraction of k, say k/2 or k/4,
then we tend to view it as
a partial-tree multiplier
Better design through pipelining
to be covered in Section 11.6

Apr. 2020 Computer Arithmetic, Multiplication Slide 66
Why Truncated Multipliers?
Fig. 11.10 The idea of a truncated multiplier with 8-bit fractional operands.
Nearly half of the hardware in array/tree multipliers is there to get the
last bit right (1 dot = one FPGA cell)
ulp
.  k-by-k fractional
.  multiplication
---------------------------------
. |
. |
. |
. |
. |
. |
. |
. | 
---------------------------------
. |
Max error = 8/2 + 7/4
+ 6/8 + 5/16 + 4/32
+ 3/64 + 2/128
+ 1/256 = 7.004 ulp
Mean error =
1.751 ulp

Apr. 2020 Computer Arithmetic, Multiplication Slide 67
Truncated Multipliers with Error Compensation
Constant and variable error compensation for truncated multipliers.
We can introduce additional “dots” on the left-hand side to compensate
for the removal of dots from the right-hand side
Constant compensation Variable compensation
. o o o o o o o| . o o o o o o o|
. o o o o o o| . o o o o o o|
. o o o o o| . o o o o o|
. o o o o| . o o o o|
. o o o| . o o o|
. 1o o| . o o|
. o| . x
-1
o|
. | . y
-1
|
Max error = +4 ulp
Max error -3 ulp
Max error = +? ulp
Max error -? ulp
Mean error = ? ulp Mean error = ? ulp

Apr. 2020 Computer Arithmetic, Multiplication Slide 68
11.5 Array Multipliers
Fig. 11.11 A basic array
multiplier uses a one-sided CSA
tree and a ripple-carry adder.0x ax ax a
x a
x a
CSA
CSA
CSA
CSA
Ripple-Carry Adder
012
3
4
ax p

0

p

1

p

2

p

3

p

4

p

6

p

7

p

8

a x

0 0

a x

1 0

a x

2 0

a x

3 0

a x

4 0

0

0

0

0

a x

0 1

a x

1 1

a x

2 1

a x

3 1

p

9

p

5

a x

4 1

a x

4 2

a x

4 3

a x

4 4

a x

0 2

a x

1 2

a x

2 2

a x

3 2

a x

0 3

a x

1 3

a x

2 3

a x

3 3

a x

0 4

a x

1 4

a x

2 4

a x

3 4

0

Fig. 11.12 Details of a 55
array multiplier using FA blocks.

Apr. 2020 Computer Arithmetic, Multiplication Slide 69
Signed (2’s-complement) Array Multiplier
Fig. 11.13
Modifications in a 55
array multiplier to deal
with 2’s-complement
inputs using the
Baugh-Wooley
method or to shorten
the critical path.p

0

p

1

p

2

p

3

p

4

p

6

p

7

p

8

a x

0 0

a x

1 0

a x

2 0

a x

3 0

a x

4 0

0

0

0

0

a x

0 1

a x

1 1

a x

2 1

a x

3 1

p

9

p

5

a x

4 1

a x

4 2

a x

4 3

a x

4 4

a x

0 2

a x

1 2

a x

2 2

a x

3 2

a x

0 3

a x

1 3

a x

2 3

a x

3 3

a x

0 4

a x

1 4

a x

2 4

a x

3 4
1

x

4

a

4

a

4

x

4

_

_

_

_

_

_

_

_

_

_

Apr. 2020 Computer Arithmetic, Multiplication Slide 70
Array Multiplier Built of Modified Full-Adder Cells
Fig. 11.14 Design of
a 5 5 array multiplier
with two additive
inputs and full-adder
blocks that include
AND gates.p p p p
p
4 3 2 1 0
a a a a a
4
3
2
1
0
x
x
x
x
x
4
3
2
1
0
p
p
p
p
p
9 8 7 6
5
FA

Apr. 2020 Computer Arithmetic, Multiplication Slide 71
Array Multiplier without a Final Carry-Propagate Adder
Fig. 11.15 Conceptual
view of a modified
array multiplier that
does not need a final
carry-propagate adder.i+1
i
i+1
i
i i
Mux
Mux
Mux
k
[k, 2k–1] 1i–1ii+1k–1
Level i
k k
0
Mux
.
.
.
.
.
.
B
i+1
B
i Dots in row i + 1

B

i



B

i+1

Dots in row i

i Conditional bits

i + 1 Conditional bits
of the final product

Fig. 11.16 Carry-save
addition, performed in
level i, extends the
conditionally computed
bits of the final product.
All remaining bits of the final product
produced only 2 gate levels after p
k–1

Apr. 2020 Computer Arithmetic, Multiplication Slide 72
11.6 Pipelined Tree and Array Multipliers. . .

CSA Tree

h inputs

Adder

Lower part of
the cumulative
partial product

FF

h-Bit
Adder

Sum

Carry

Upper part of
the cumulative
partial product
(stored-carry)

Fig. 11.9 General structure
of a partial-tree multiplier.
Fig. 11.17 Efficiently pipelined
partial-tree multiplier.. . .

h inputs

Adder

Lower part of
the cumulative
partial product

FF

h-Bit
Adder

Sum

Carry


CSA
Pipelined
CSA Tree

Latches

Latches

Latches

CSA
(h + 2)-input
CSA tree

Latch

Apr. 2020 Computer Arithmetic, Multiplication Slide 73
Pipelined Array Multipliers
With latches after every
FA level, the maximum
throughput is achieved
Latches may be inserted
after every hFA levels for
an intermediate design
Fig. 11.18 Pipelined 55
array multiplier using
latched FA blocks.
The small shaded boxes
are latches.p p p p p
4 3 2 1 0
a a a a a
4 3 2 1 0
x x x x x
4 3 2 1 0
p p p p p
9 8 7 6 5
Latched
FA with
AND gate
Latch
FA
FA
FA
FA
Example: 3-stage pipeline

Apr. 2020 Computer Arithmetic, Multiplication Slide 74
12 Variations in Multipliers
ChapterGoals
Learnadditionalmethodsforsynthesizing
fastmultipliersaswellasothertypes
ofmultipliers(bit-serial,modular,etc.)
ChapterHighlights
Buildingamultiplierfromsmallerunits
Performingmultiply-addasoneoperation
Bit-serialand(semi)systolicmultipliers
Usingamultiplierforsquaringiswasteful

Apr. 2020 Computer Arithmetic, Multiplication Slide 75
Variations in Multipliers: Topics
TopicsinThisChapter
12.1Divide-and-ConquerDesigns
12.2AdditiveMultiplyModules
12.3Bit-SerialMultipliers
12.4ModularMultipliers
12.5TheSpecialCaseofSquaring
12.6CombinedMultiply-AddUnits

Apr. 2020 Computer Arithmetic, Multiplication Slide 76
12.1 Divide-and-Conquer Designs
Building wide multiplier from narrower ones
Fig. 12.1 Divide-and-conquer (recursive) strategy for
synthesizing a 2b2bmultiplier from bbmultipliers.a

p
Rearranged partial products
in 2b-by-2b multiplication
2b bits
3b bits
H aL
xH xL
aLxH
aLxL
aHxL
xHaH
aHxL
aLxH
aLxLxHaH
b bits

Apr. 2020 Computer Arithmetic, Multiplication Slide 77
General Structure of a Recursive Multiplier
2b2buse(3;2)-counters
3b3buse(5;2)-counters
4b4buse(7;2)-counters
Fig. 12.2 Using bbmultipliers to synthesize
2b2b, 3b3b, and 4b4bmultipliers.4b  4b
3b  3b
2b  2b
b  b

Apr. 2020 Computer Arithmetic, Multiplication Slide 78
Using bc, rather than bbBuilding Blocks
2b2cusebcmultipliersand(3;2)-counters
2b4cusebcmultipliersand(5?;2)-counters
gbhcusebcmultipliersand(?;2)-counters4b  4b
3b  3b
2b  2b
b  b

Apr. 2020 Computer Arithmetic, Multiplication Slide 79
Wide Multiplier Built of Narrow Multipliers and Adders
Fig. 12.3 Using 4 4
multipliers and 4-bit
adders to synthesize
an 88 multiplier. a x a x a x a x
Add
Add
Add
Add Add
p p p p
000
8
8
12
12
H L H H H L L L
[4, 7] [4, 7] [0, 3] [4, 7] [4, 7] [0, 3] [0, 3] [0, 3]
[12,15] [8,11] [8,11] [4, 7] [8,11] [4, 7] [4, 7] [0, 3]
[4, 7]
[4, 7]
[8,11]
[8,11]
[12,15]
[12,15] [8,11] [0, 3] [4, 7]
Multiply Multiply Multiply Multiply

Apr. 2020 Computer Arithmetic, Multiplication Slide 80
Karatsuba Multiplication
2b2bmultiplicationrequiresfourbbmultiplications:
(2
b
a
H+a
L)(2
b
x
H+x
L)=2
2b
a
Hx
H+2
b
(a
Hx
L+a
Lx
H)+a
Lx
L
a
Ha
L
x
Hx
L
Karatsubanotedthatoneofthefourmultiplicationscanberemoved
attheexpenseofintroducingafewadditions:
(2
b
a
H+a
L)(2
b
x
H+x
L)=
2
2b
a
Hx
H+2
b
[(a
H+a
L)(x
H+x
L)–a
Hx
H–a
Lx
L]+a
Lx
L
Mult 1 Mult 2Mult 3
Benefitisquitesignificantforextremelywideoperands
(4/3)
5
=4.2 (4/3)
10
=17.8(4/3)
20
=315.3(4/3)
50
=1,765,781
bbits

Apr. 2020 Computer Arithmetic, Multiplication Slide 81
Computational Complexity of Multiplication
Arnold Schonhage and Volker Strassen (via FFT); best until 2007
O(log k) time
O(klog klog log k) complexity
It is an open problem whether there exist logarithmic-delay
multipliers with linear cost
(it is widely believed that there are not)
In the absence of a linear cost multiplication circuit, multiplication
must be viewed as a more difficult problem than addition
In 2007, Martin Furer managed to replace the log log kterm with
an asymptotically smaller term (for astronomically large numbers)
In 2019, David Harvey and Joris van der Hoeven developed an
O(klog k) multiplication algorithm, which is believed to be the best
possible theoretically (but not practical at present)

Apr. 2020 Computer Arithmetic, Multiplication Slide 82
12.2 Additive Multiply Modules
Fig. 12.4 Additive multiply module with 2 4 multiplier (ax)
plus 4-bit and 2-bit additive inputs (yand z).c

in



y

z

ax

p

4-bit adder

y

z

x
a

p = ax + y + z

(a) Block diagram
v
(b) Dot notation
v
b-bitandc-bitmultiplicativeinputs
bcAMM b-bitandc-bitadditiveinputs
(b+c)-bitoutput
(2
b
–1) (2
c
–1) + (2
b
–1) + (2
c
–1) = 2
b+c
–1

Apr. 2020 Computer Arithmetic, Multiplication Slide 83
Multiplier Built of AMMs
Fig. 12.5 An 8 8 multiplier built of 42 AMMs.
Inputs marked with an asterisk carry 0s. [0, 1]
[2, 3]
[4, 5]
[6, 7]
[8, 9][10,11][12,15]
[0, 1]
[2, 3]
[4,5]
[6, 7]
x
x
x
x
[0, 3]a
[0, 3]a
[0, 3]a
[0, 3]a
p
p
p
p
ppp
[0, 1]x
[2, 3]
[4, 5]
[6, 7]x
x
x
[10,11]
[8, 9]
[4, 7]a
[4, 7]
a
[4, 7]
a
[4, 7]a
[8, 9]
[0, 1]
[2, 3]
[4, 5]
[6, 7]
[4,5]
[6, 7]
[8, 11]
[10,13]
[2, 5]
[4,7]
[6, 9]
[8, 11]
[6, 9]
*
*
* *
*
*
Legend:
2 bits
4 bits
Understanding
an 8 8 multiplier
built of 4 2
AMMs using dot
notation

Apr. 2020 Computer Arithmetic, Multiplication Slide 84
Multiplier Built of AMMs: Alternate Design
Fig. 12.6 Alternate 88
multiplier design based on
42 AMMs. Inputs marked
with an asterisk carry 0s.[8, 9]p
* *
*
*
*
*
[0, 1]
[2, 3]
[4, 5]
[6, 7]
x
x
x
x
[10,11][12,15]
[0, 1]
[2, 3]
[4,5]
[6, 7]
p
p
p
p
p
p
[0,3] [4, 7] aa
Legend:
2 bits
4 bits
This design is more regular
than that in Fig. 12.5 and is
easily expandable to larger
configurations; its latency,
however, is greater

Apr. 2020 Computer Arithmetic, Multiplication Slide 85
12.3 Bit-Serial Multipliers
What goes inside the box to make a bit-serial multiplier?
Can the circuit be designed to support a high clock rate?
FA
FF
Bit-serial adder
(LSB first) x
0
y
0
s
0
x
1
y
1
s
1
x
2
y
2
s
2



Bit-serial multiplier
a
1
x
1
p
1
a
0
x
0
p
0
a
2
x
2
p
2


…(Must follow the k-bit
inputs with k0s;
alternatively, view
the product as being
only kbits wide)
?

Apr. 2020 Computer Arithmetic, Multiplication Slide 86
Semisystolic Serial-Parallel MultiplierMultiplicand (parallel in)
Multiplier
(serial in)
LSB-first
Carry
Sum
FA
Product
(serial out)
FA FA FA
a3 a2
a1 a0
x
0x
1x
2x
3
Fig. 12.7 Semi-systolic circuit for 44 multiplication in 8 clock cycles.
This is called “semisystolic” because it has a large signal fan-out of k
(k-way broadcasting) and a long wire spanning all kpositions

Apr. 2020 Computer Arithmetic, Multiplication Slide 87
Systolic Retiming as a Design Tool
Fig. 12.8 Example of retiming by delaying the inputs to C
L
and advancing the outputs from C
Lby dunits Cut
CL CR CL CR
e
f
g
h
e+d
f+d
g–d
h–d
+d
–d
–d
+d
Original delays Adjusted delays
A semisystolic circuit can be converted to a systolic circuit
via retiming, which involves advancing and retarding signals
by means of delay removal and delay insertion in such a
way that the relative timings of various parts are unaffected

Apr. 2020 Computer Arithmetic, Multiplication Slide 88
Alternate Explanation of Systolic Retiming
Transferring delay from the outputs of a subsystem to its
inputs does not change the behavior of the overall system
td
1
d
2 t+a+d
1+d
2
t+d
1
t+a+d
1
tt
t +ad
1d
2 t+a+d
1+d
2

Apr. 2020 Computer Arithmetic, Multiplication Slide 89
A First Attempt
at Retiming
Fig. 12.9 A retimed version
of our semi-systolic multiplier.Multiplicand (parallel in)
Multiplier
(serial in)
LSB-first
Carry
FA
Product
(serial out)
FA FA FA
a3 a2
a1 a0
x
0
x
1
x
2
x
3
Sum
Cut 1Cut 2Cut 3 Multiplicand (parallel in)
Multiplier
(serial in)
LSB-first
Carry
Sum
FA
Product
(serial out)
FA FA FA
a3 a2
a1 a0
x
0x
1x
2x
3
Fig. 12.7

Apr. 2020 Computer Arithmetic, Multiplication Slide 90
Deriving a Fully
Systolic MultiplierMultiplicand (parallel in)
Multiplier
(serial in)
LSB-first
Carry
Sum
FA
Product
(serial out)
FA FA FA
a3 a2
a1 a0
x
0x
1x
2x
3
Fig. 12.7
Fig. 12.10 Systolic circuit for
44 multiplication in 15 cycles.Multiplicand (parallel in)
Multiplier
(serial in)
LSB-first
Sum
FA
Product
(serial out)
FA FA FA
a3 a2
a1 a0
x
0x
1x
2x
3
Carry

Apr. 2020 Computer Arithmetic, Multiplication Slide 91
A Direct Design for a Bit-Serial Multiplier
Fig. 12.13 Bit-serial multiplier
design in dot notation.p

x

a

Already
accumulated
into three
numbers
(i - 1)

a

x

(i - 1)

i

a

x

i



x

i

(i - 1)

a

i

a

x

(i - 1)

x

i

i

a

Already output
(a) Structure of the bit-matrix
(b) Reduction after each input bit
p

(i - 1)

i

a

x

(i - 1)


x

i

(i - 1)

a

x

i

i

a

2p

(i )

Shift right to
obtain p

(i )
M ux
(5; 3)-counter
0
1
012
a x
a x
s
s
c c
t t
in
out in
in
out
out
p
ii
ii
(i–1) a
x
ss
cc
tt
in
outin
inout
out
p
i
i
. . .
. . .
. . .
. . .
. . .
i
LSB
0
Fig. 12.11 Building block for a
latency-free bit-serial multiplier.
Fig. 12.12 The cellular structure
of the bit-serial multiplier based on
the cell in Fig. 12.11.

Apr. 2020 Computer Arithmetic, Multiplication Slide 92
12.4 Modular Multipliers. . .
FA FAFAFAFA Mod-15 CSA
Divide by 16
4

4

4

4

Mod-15 CSA
4

Mod-15 CPA
Fig. 12.14 Modulo-(2
b
–1)
carry-save adder.
Fig. 12.15 Design of a
4 4 modulo-15 multiplier.

Apr. 2020 Computer Arithmetic, Multiplication Slide 93
Other Examples of Modular Multiplication
Fig. 12.16 One way
to design of a 4 4
modulo-13 multiplier.
Fig. 12.17 A method for
modular multioperand addition. . . .
Table
n inputs
CSA Tree
sum mod m
3-input
Modulo-m
Adder
.
.
.
Address
Data

Apr. 2020 Computer Arithmetic, Multiplication Slide 94
12.5 The Special Case of Squaringx 0 x 1 x 2 x 3 x 4
x 0 x 1 x 2 x 3 x 4
x 0 x 1 x 2 x 3 x 4 x 0 x 0
p 0
x 4
x 1
x 4
x 0
x 1
x
2
x
3
x 4
x 0
x 1
x
2
x
3
x 4
x 0
Multiply x by x
x 1 x 2 x 3 x 4 x 0
x
1 x
2 x
3 x
4 x
0
x
1 x
2 x
3 x
4 x
0
x 1 x 2 x 3 x 4 x 0
x 1
x
2
x
3
x 1
x
2
x
3
x
2
x
3
x 4
p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9
x 1 x 2 x 3 x 4 x 0
x 1
x 0
x 2
x 0
x 1
x 0
x 2 x 3
x 4 x 0
x 3
x 4
x 0
x 1
x 2 x 1
x 2
x 3
x 3 x 4
x 4
p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9
0
_
Simplify
Fig. 12.18 Design of a 5-bit squarer.
x
1x
0–x
1x
0

Apr. 2020 Computer Arithmetic, Multiplication Slide 95
Divide-and-Conquer Squarers
Building wide squarers from narrower ones
Divide-and-conquer (recursive) strategy for synthesizing a
2b2bsquarer from bbsquarers and multiplier.a

p
Rearranged partial products
in 2b-by-2b multiplication
2b bits
3b bits
H aL
xH xL
aLxH
aLxL
aHxL
xHaH
aHxL
aLxH
aLxLxHaH
b bits

x
L
x
H
x
L
x
L
x
L
x
H
x
L
x
H
x
H
x
H

Apr. 2020 Computer Arithmetic, Multiplication Slide 96
12.6 Combined Multiply-Add Units
Fig. 12.19
Dot-notation
representations
of various methods
for performing
a multiply-add
operation
in hardware.
Multiply-add
versus
multiply-accumulate
Multiply-accumulate
units often have wider
additive inputs
(c)
Additive input
Dot matrix for the
4 4 multiplication
(a)
Additive input
CSA tree output
(b)
Carry-save additive input
CSA tree output
(d)
Carry-save additive input
Dot matrix for the
4 4 multiplication