SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & SAFWAT ZAKY

58,255 views 181 slides May 26, 2016
Slide 1
Slide 1 of 181
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181

About This Presentation

SOLUTION MANUAL OF COMPUTER ORGANIZATION BY CARL HAMACHER, ZVONKO VRANESIC & SAFWAT ZAKY


Slide Content

SOLUTION MANUAL
OF
COMPUTER ORGANIZATION
BY
CARL HAMACHER, ZVONK O VRANESIC &
SAFWAT ZAKY

Chapter1
BasicStructureofComputers
1.1.TransferthecontentsofregisterPCtoregisterMAR
IssueaReadcommandtomemory,andthenwaituntilithastransferredthe
requestedwordintoregisterMDR
TransfertheinstructionfromMDRintoIRanddecodeit
TransfertheaddressLOCAfromIRtoMAR
IssueaReadcommandandwaituntilMDRisloaded
TransfercontentsofMDRtotheALU
TransfercontentsofR0totheALU
PerformadditionofthetwooperandsintheALUandtransferresultinto
R0
TransfercontentsofPCtoALU
Add1tooperandinALUandtransferincrementedaddresstoPC
1.2.FirstthreestepsarethesameasinProblem1.1
TransfercontentsofR1andR2totheALU
PerformadditionoftwooperandsintheALUandtransferanswerintoR3
LasttwostepsarethesameasinProblem1.1
1.3.(a)
LoadA,R0
LoadB,R1
AddR0,R1
StoreR1,C
(b)Yes;
MoveB,C
Add A,C
1.4.(a)Non-overlappedtimeforProgramiis19timeunitscomposedas:
_ _
input compute output
Programi
1 3 21 1 1 1 13 32
1

Overlappedtimeiscomposedas:
_ _

Programi-1
Programi+1
input
input
1 3
9
1
1 1 1 1
1 1
3 3
3
output 15timeunits
Programi
compute output

Timebetweensuccessiveprogramcompletionsintheoverlappedcaseis15time
units,whileinthenon-overlappedcaseitis19timeunits.
Therefore,theratiois15/19.
(b)InthediscussioninSection1.5,theoverlapwasonlybetweeninputand
outputoftwosuccessivetasks.Ifitispossibletodooutputfromjobi1,
computeforjobi,andinputtojobi+1atthesametime,involvingallthreeunits
ofprinter,processor,anddiskcontinuously,thenpotentiallytheratiocouldbe
reducedtoward1/3.TheOSroutinesneededtocoordinatemultipleunitactivity
cannotbefullyoverlappedwithotheractivitybecausetheyusetheprocessor.
Therefore,theratiocannotactuallybereducedto1/3.
1.5.(a)LetTR=(NRSR)=RRandTC=(NCSC)=RCbeexecutiontimes
ontheRISCandCISCprocessors,respectively.Equatingexecutiontimesand
clockrates,wehave
1:2NR=1:5NC
Then
NC=NR=1:2=1:5=0:8
Therefore,thelargestallowablevalueforNCis80%ofNR.
2

(b)Inthiscase
1:2NR=1:15=1:5NC=1:00
Then
NC=NR=1:2=(1:151:5)=0:696
Therefore,thelargestallowablevalueforNCis69.6%ofNR.
1.6.(a)Letcacheaccesstimebe1andmainmemoryaccesstimebe20.Every
instructionthatisexecutedmustbefetchedfromthecache,andanadditional
fetchfromthemainmemorymustbeperformedfor4%ofthesecacheaccesses.
Therefore,
Speedupfactor=
1:020
(1:01)+(0:0420)
=11:1
(b)
Speedupfactor=
1:020
(1:01)+(0:0220)
=16:7
3

Chapter2
MachineInstructionsandPrograms
2.1.Thethreebinaryrepresentationsaregivenas:
DecimalSign-and-magnitude1's-complement2's-complement
values representationrepresentationrepresentation
5 0000101 0000101 0000101
2 1000010 1111101 1111110
14 0001110 0001110 0001110
10 1001010 1110101 1110110
26 0011010 0011010 0011010
19 1010011 1101100 1101101
51 0110011 0110011 0110011
43 1101011 1010100 1010101
2.2.(a)
(a) 00101(b) 00111(c) 10010
+01010 +01101 +01011
||| ||| |||
01111 10100 11101
nooverow overow nooverow
(d) 11011(e) 11101(f) 10110
+00111 +11000 +10011
||| ||| |||
00010 10101 01001
nooverow nooverow overow
(b)Tosubtractthesecondnumber,formits2's-complementandadditto
therstnumber.
(a) 00101(b) 00111(c) 10010
+10110 +10011 +10101
||| ||| |||
11011 11010 00111
nooverow nooverow overow
(d) 11011(e) 11101(f) 10110
+11001 +01000 +01101
||| ||| |||
10100 00101 00011
nooverow nooverow nooverow
1

2.3.No;anybinarypatterncanbeinterpretedasanumberorasaninstruction.
2.4.Thenumber44andtheASCIIpunctuationcharacter\comma".
2.5.Bytecontentsinhex,startingatlocation1000,willbe4A,6F,68,6E,73,
6F,6E.Thetwowordsat1000and1004willbe4A6F686Eand736F6EXX.
Byte1007(shownasXX)isunchanged.(SeeSection2.6.3forhexnota-
tion.)
2.6.Bytecontentsinhex,startingatlocation1000,willbe4A,6F,68,6E,73,
6F,6E.Thetwowordsat1000and1004willbe6E686F4AandXX6E6F73.
Byte1007(shownasXX)isunchanged.(Seesection2.6.3forhexnota-
tion.)
2.7.Clearthehigh-order4bitsofeachbyteto0000.
2.8.Aprogramfortheexpressionis:
Load A
MultiplyB
Store RESULT
Load C
MultiplyD
Add RESULT
Store RESULT
2

2.9.MemorywordlocationJcontainsthenumberoftests,j,andmemoryword
locationNcontainsthenumberofstudents,n.Thelistofstudentmarks
beginsatmemorywordlocationLISTintheformatshowninFigure2.14.
TheparameterStride=4(j+1)isthedistanceinbytesbetweenscores
onaparticulartestforadjacentstudentsinthelist.
TheBasewithindexaddressingmode(R1,R2)isusedtoaccessthescores
onaparticulartest.RegisterR1pointstothetestscoreforstudent1,
andR2isincrementedbyStrideintheinnerlooptoaccessscoresonthe
sametestbysuccessivestudentsinthelist.
Move J,R4 ComputeandplaceStride=4(j+1)
IncrementR4 intoregisterR4.
Multiply#4,R4
Move #LIST,R1 InitializebaseregisterR1tothe
Add #4,R1 locationofthetest1scoreforstudent1.
Move #SUM,R3 InitializeregisterR3tothelocation
ofthesumfortest1.
Move J,R10 InitializeouterloopcounterR10toj.
OUTER Move N,R11 InitializeinnerloopcounterR11ton.
Clear R2 ClearindexregisterR2tozero.
Clear R0 ClearsumregisterR0tozero.
INNER Add (R1,R2),R0AccumulatethesumoftestscoresinR0.
Add R4,R2 IncrementindexregisterR2byStridevalue.
DecrementR11 Checkifallstudentscoresoncurrent
Branch>0INNER testhavebeenaccumulated.
Move R0,(R3) Storesumofcurrenttestscoresand
Add #4,R3 incrementsumlocationpointer.
Add #4,R1 Incrementbaseregistertonexttest
scoreforstudent1.
DecrementR10 Checkifthesumsforalltestshave
Branch>0OUTER beencomputed.
3

2.10.(a)
Memory
accesses
||||
Move #AVEC,R1 1
Move #BVEC,R2 1
Load N,R3 2
Clear R0 1
LOOP Load (R1)+,R4 2
Load (R2)+,R5 2
MultiplyR4,R5 1
Add R5,R0 1
DecrementR3 1
Branch>0LOOP 1
Store R0,DOTPROD 2
(b)k1=1+1+2+1+2=7;andk2=2+2+1+1+1+1=8
2.11.(a)TheoriginalprograminFigure2.33isecientonthistask.
(b)k1=7;andk2=7
ThisisonlybetterthantheprograminProblem2.10(a)byasmall
amount.
2.12.ThedotproductprograminFigure2.33usesveregisters.Insteadof
usingR0toaccumulatethesum,thesumcanbeaccumulateddirectlyinto
DOTPROD.ThismeansthatthelastMoveinstructionintheprogramcan
beremoved,butDOTPRODisreadandwrittenoneachpassthroughthe
loop,signicantlyincreasingmemoryaccesses.ThefourregistersR1,R2,
R3,andR4,arestillneededtomakethisprogramecient,andtheyare
allusedintheloop.SupposethatR1andR2areretainedaspointersto
theAandBvectors.CounterregisterR3andtemporarystorageregister
R4couldbereplacedbymemorylocationsina2-registermachine;but
thenumberofmemoryaccesseswouldincreasesignicantly.
2.13.1220,partoftheinstruction,5830,4599,1200.
4

2.14.Linkedlistversionofthestudenttestscoresprogram:
Move #1000,R0
Clear R1
Clear R2
Clear R3
LOOP Add 8(R0),R1
Add 12(R0),R2
Add 16(R0),R3
Move 4(R0),R0
Branch>0LOOP
Move R1,SUM1
Move R2,SUM2
Move R3,SUM3
2.15.Assumethatthesubroutinecanchangethecontentsofanyregisterused
topassparameters.
Subroutine
Move R5,(SP)SaveR5onstack.
Multiply#4,R4 UseR4tocontaindistancein
bytes(Stride)betweensuccessive
elementsinacolumn.
Multiply#4,R1 BytedistancesfromA(0,0)
Multiply#4,R2 toA(0,x)andA(0,y)
placedinR1andR2.
LOOP Move (R0,R1),R5Addcorresponding
Add R5,(R0,R2)columnelements.
Add R4,R1 Incrementcolumnelement
Add R4,R2 pointersbyStridevalue.
DecrementR3 Repeatuntilall
Branch>0LOOP elementsareadded.
Move (SP)+,R5 RestoreR5.
Return Returntocallingprogram.
5

2.16.TheassemblerdirectivesORIGINandDATAWORDcausetheobjectpro-
grammemoryimageconstructedbytheassemblertoindicatethat300is
tobeplacedatmemorywordlocation1000atthetimetheprogramis
loadedintomemorypriortoexecution.
TheMoveinstructionplaces300intomemorywordlocation1000when
theinstructionisexecutedaspartofaprogram.
2.17.(a)
Move(R5)+,R0
Add(R5)+,R0
MoveR0,(R5)
(b)
Move16(R5),R3
(c)
Add#40,R5
6

2.18.(a)Wraparoundmustbeused.Thatis,thenextitemmustbeenteredat
thebeginningofthememoryregion,assumingthatlocationisempty.
(b)Acurrentqueueofbytesisshowninthememoryregionfrombyte
location1tobytelocationkinthefollowingdiagram.














































































Current queue
of bytes
Increasing addresses
INOUT
1 k
. . .. . .
TheINpointerpointstothelocationwherethenextbytewillbeappended
tothequeue.Ifthequeueisnotfullwithkbytes,thislocationisempty,
asshowninthediagram.
TheOUTpointerpointstothelocationcontainingthenextbytetobe
removedfromthequeue.Ifthequeueisnotempty,thislocationcontains
avalidbyte,asshowninthediagram.
Initially,thequeueisemptyandbothINandOUTpointtolocation1.
(c)Initially,asstatedinPartb,whenthequeueisempty,boththeIN
andOUTpointerspointtolocation1.Whenthequeuehasbeenlled
withkbytesandnoneofthemhavebeenremoved,theOUTpointerstill
pointstolocation1.ButtheINpointermustalsobepointingtolocation
1,because(followingthewraparoundrule)itmustpointtothelocation
wherethenextbytewillbeappended.Thus,inbothcases,bothpointers
pointtolocation1;butinonecasethequeueisempty,andintheother
caseitisfull.
(d)OnewaytoresolvetheprobleminPart(c)istomaintainatleastone
emptylocationatalltimes.Thatis,anitemcannotbeappendedtothe
queueif([IN]+1)Modulok=[OUT].Ifthisisdone,thequeueisempty
onlywhen[IN]=[OUT].
(e)Appendoperation:
LOC [IN]
IN ([IN]+1)Modulok
If[IN]=[OUT],queueisfull.RestorecontentsofINtocontentsof
LOCandindicatefailedappendoperation,thatis,indicatethatthe
queuewasfull.Otherwise,storenewitematLOC.
7

Removeoperation:
If[IN]=[OUT],thequeueisempty.Indicatefailedremoveoperation,
thatis,indicatethatthequeuewasempty.Otherwise,readtheitem
pointedtobyOUTandperformOUT ([OUT]+1)Modulok.
2.19.Usethefollowingregisterassignment:
R0Itemtobeappendedtoorremovedfromqueue
R1INpointer
R2OUTpointer
R3Addressofbeginningofqueueareainmemory
R4Addressofendofqueueareainmemory
R5Temporarystoragefor[IN]duringappendoperation
Assumethatthequeueisinitiallyempty,with[R1]=[R2]=[R3].
ThefollowingAPPENDandREMOVEroutinesimplementtheproce-
duresrequiredinPart(e)ofProblem2.18.
APPENDroutine:
Move R1,R5
IncrementR1 IncrementINpointer
Compare R1,R4 Modulok.
Branch0CHECK
Move R3,R1
CHECK Compare R1,R2 Checkifqueueisfull.
Branch=0FULL
MoveByteR0,(R5) Ifqueuenotfull,appenditem.
Branch CONTINUE
FULL Move R5,R1 RestoreINpointerandsend
Call QUEUEFULL messagethatqueueisfull.
CONTINUE :::
REMOVEroutine:
Compare R1,R2 Checkifqueueisempty.
Branch=0EMPTY Ifempty,sendmessage.
MoveByte(R2)+,R0 Otherwise,removebyteand
Compare R2,R4 incrementR2Modulok.
Branch0CONTINUE
Move R3,R2
Branch CONTINUE
EMPTY Call QUEUEEMPTY
CONTINUE :::
8

2.20.(a)Neithernestingnorrecursionaresupported.
(b)Nestingissupported,becausedierentCallinstructionswillsavethe
returnaddressatdierentmemorylocations.Recursionisnotsupported.
(c)Bothnestingandrecursionaresupported.
2.21.Toallownesting,therstactionperformedbythesubroutineistosave
thecontentsofthelinkregisteronastack.TheReturninstructionpops
thisvalueintotheprogramcounter.Thissupportsrecursion,thatis,
whenthesubroutinecallsitself.
2.22.AssumethatregisterSPisusedasthestackpointerandthatthestack
growstowardloweraddresses.Alsoassumethatthememoryisbyte-
addressableandthatallstackentriesare4-bytewords.Initially,thestack
isempty.Therefore,SPcontainstheaddress[LOWERLIMIT]+4.The
routinesCALLSUBandRETRNmustcheckforthestackfullandstack
emptycasesasshowninParts(b)and(a)ofFigure2.23,respectively.
CALLSUB Compare UPPERLIMIT,SP
Branch0FULLERROR
Move RL,(SP)
Branch (R1)
RETRN Compare LOWERLIMIT,SP
Branch>0EMPTYERROR
Move (SP)+,PC
2.23.IftheIDofthenewrecordmatchestheIDoftheHeadrecordofthe
currentlist,thenewrecordwillbeinsertedasthenewHead.IftheID
ofthenewrecordmatchestheIDofalaterrecordinthecurrentlist,the
newrecordwillbeinsertedimmediatelyafterthatrecord,includingthe
casewherethematchingrecordistheTailrecord.Inthislattercase,the
newrecordbecomesthenewTailrecord.
ModifyFigure2.37asfollows:
Addthefollowinginstructionastherstinstructionofthesubrou-
tine:
INSERTIONMove #0,ERRORAnticipatesuccessful
insertionofthenewrecord.
Compare#0,RHEAD (Existinginstruction.)
9

AfterthesecondCompareinstruction,insertthefollowingthreein-
structions:
Branch6=0CONTINUE1 Threenewinstructions.
Move RHEAD,ERROR
Return
CONTINUE1 Branch>0SEARCH (Existinginstruction.)
AfterthefourthCompareinstruction,insertthefollowingthreein-
structions:
Branch6=0CONTINUE2 Threenewinstructions.
Move RNEXT,ERROR
Return
CONTINUE2 Branch<0INSERT (Existinginstruction.)
2.24.Ifthelistisempty,theresultisunpredictablebecausetherstinstruction
willcomparetheIDofthenewrecordtothecontentsofmemorylocation
zero.Ifthelistisnotempty,thefollowinghappens.Ifthecontentsof
RIDNUMarelessthantheIDnumberoftheHeadrecord,theHeadrecord
willbedeleted.Otherwise,theroutineloopsuntilregisterRCURRENT
pointstotheTailrecord.ThenRNEXTgetsloadedwithzerobythe
instructionatLOOP,andtheresultisunpredictable.
ReplaceFigure2.38withthefollowingcode:
DELETION Compare #0,RHEAD Ifthelistisempty,
Branch6=0CHECKHEAD returnwithRIDNUMunchanged.
Return
CHECKHEAD Compare (RHEAD),RIDNUM CheckifHeadrecord
Branch6=0CONTINUE1 istobedeletedand
Move 4(RHEAD),RHEAD performdeletionifit
Move #0,RIDNUM is,returningwithzero
Return inRIDNUM.
CONTINUE1 Move RHEAD,RCURRENT Otherwise,continuesearching.
LOOP Move 4(CURRENT),RNEXT
Compare #0,RNEXT Ifallrecordschecked,
Branch6=0CHECKNEXT returnwithIDNUMunchanged.
Return
CHECKNEXT Compare (RNEXT),RIDNUM Checkifnextrecordis
Branch6=0CONTINUE2 tobedeletedandperform
Move 4(RNEXT),RTEMP deletionifitis,
Move RTEMP,4(RCURRENT) returningwithzero
Move #0,RIDNUM inRIDNUM.
Return
CONTINUE2 Move RNEXT,RCURRENT Otherwise,continue
Branch LOOP thesearch.
10

Chapter3
ARM,Motorola,andIntel
InstructionSets
PARTI:ARM
3.1.(a)R8,R9,andR10,contain1,2,and3,respectively.
(b)Thevalues20and30arepushedontoastackpointedtobyR1bythe
twoStoreinstructions,andtheyoccupymemorylocations1996and1992,
respectively.TheyarethenpoppedothestackintoR8andR9.Finally,
theSubtractinstructionresultsin10(3020)beingstoredinR10.The
stackpointerR1isreturnedtoitsoriginalvalueof2000.
(c)Thenumbersinmemorylocations1016and1020areloadedintoR4
andR5,respectively.Thesetwonumbersarethenaddedandthesumis
placedinregisterR4.ThenaladdressvalueinR2is1024.
3.2.(b)AmemoryoperandcannotbereferencedinaSubtractinstruction.
(d)Theimmediatevalue257is100000001inbinary,andisthustoolong
totinthe8-bitimmediateeld.Notethatitcannotbegeneratedby
therotationofany8-bitvalue.
3.3.Thefollowingtwoinstructionsperformthedesiredoperation:
MOVR0,R0,LSL#24
MOVR0,R0,ASR#24
3.4.UseregisterR0asacounterregisterandR1asaworkregister.
MOVR0,#32 LoadR0withcountvalue32.
MOVR1,#0 ClearregisterR1tozero.
LOOP MOVR2,R2,LSL#1ShiftcontentsofR2left
onebitposition,movingthe
high-orderbitintotheCag.
MOVR1,R1,RRX RotateR1rightonebit
position,includingtheCag,
asshowninFigure2.32d.
SUBSR0,R0,#1 Checkifnished.
BGT LOOP
MOVR2,R1 Loadreversedpattern
backintoR2.
1

3.5.Programtrace:
TIME R0R1R2
after1stexecutionofBGT 34NUM1+4
after2ndexecutionofBGT143NUM1+8
after3rdexecutionofBGT 132NUM1+12
3.6.Assumebytesareunsigned8-bitvalues.
LDR R0,N R0islistcounter
ADR R1,X R1pointstoXlist
ADR R2,Y R2pointstoYlist
ADR R3,LARGERR3pointstoLARGERlist
LOOP LDRB R4,[R1],#1LoadXlistbyteintoR4
LDRB R5,[R2],#1LoadYlistbyteintoR5
CMP R4,R5 Comparebytes
STRHSB R4,[R3],#1StoreXbyteiflargerorsame
STRLOB R5,[R3],#1StoreYbyteiflarger
SUBS R0,R0,#1 Checkifnished
BGT LOOP
3.7.Theinnerloopchecksforamatchateachpossibleposition.
LDR R0,N Computeouterloopcount
LDR R1,M andstoreinR2.
SUB R2,R0,R1
ADD R2,R2,#1
ADR R3,STRING UseR3andR4asbase
ADR R4,SUBSTRING pointersforeachmatch.
OUTER MOVR5,R3 UseR5andR6asrunning
MOVR6,R4 pointersforeachmatch.
LDR R7,M Initializeinnerloopcounter.
INNER LDRB R0,[R5],#1 Comparebytes.
LDRB R1,[R6],#1
CMP R0,R1
BNE NOMATCH Ifnotequal,gonext.
SUBS R7,R7,#1 Checkifallbytescompared.
BGT INNER
MOVR0,R3 Ifsubstringmatches,load
B NEXT itspositionintoR0andexit.
NOMATCHADD R3,R3,#1 Gotonextsubstring.
SUBS R2,R2,#1 Checkifallpositionstried.
BGT OUTER
MOVR0,#0 Ifyes,loadzerointo
NEXT ::: R0andexit.
2

3.8.Thissolutionassumesthatthelastnumberintheseriesofnnumberscan
berepresentedina32-bitword,andthatn>2.
MOVR0,N UseR0tocountnumbers
SUB R0,R0,#2 generatedafter1.
ADR R1,MEMLOC UseR1asmemorypointer.
MOVR2,#0 Storersttwonumbers,
STR R2,[R1],#4 0and1,fromR2
MOVR3,#1 andR3intomemory.
STR R3,[R1],#4
LOOP ADD R3,R2,R3 Startingwithnumberi1
STR R3,[R1],#4 inR2andiinR3,compute
andplacei+1inR3
andstoreinmemory.
SUB R2,R3,R2 Recoveroldiandplace
inR2.
SUBSR0,R0,#1 Checkifallnumbers
BGT LOOP havebeencomputed.
3.9.LetR0pointtotheASCIIwordbeginningatlocationWORD.Tochange
touppercase,weneedtochangebitb5from1to0.
NEXTLDRB R1,[R0] Getcharacter.
CMP #&20,R1 Checkifspacecharacter.
ANDNE #&DF,R1 Ifnotspace:clear
STRNEB R1,[R0],#1bit5,store
BNE NEXT convertedcharacter,
getnextcharacter.
3

3.10.MemorywordlocationJcontainsthenumberoftests,j,andmemoryword
locationNcontainsthenumberofstudents,n.Thelistofstudentmarks
beginsatmemorywordlocationLISTintheformatshowninFigure2.14.
TheparameterStride=4(j+1)isthedistanceinbytesbetweenscores
onaparticulartestforadjacentstudentsinthelist.
ThePost-indexedaddressingmode[R2],R3,LSL#2isusedtoaccessthe
successivescoresonaparticulartestintheinnerloop.Thevaluein
registerR2beforeeachentrytotheinnerloopistheaddressofthescore
onaparticulartestfortherststudent.RegisterR3containsthevalue
j+1.Therefore,registerR2isincrementedbytheStrideparameteron
eachpassthroughtheinnerloop.
LDR R3,J Loadj+1intoR3to
ADD R3,R3,#1 beusedasanaddressoset.
ADR R4,SUM InitializeR4tothesum
locationfortest1.
ADR R5,LIST Loadaddressoftest1score
ADD R5,R5,#4 forstudent1intoR5.
LDR R6,J Initializeouterloopcounter
R6toj.
OUTER LDR R7,N Initializeinnerloop
counterR7ton.
MOVR2,R5 InitializebaseregisterR2
tolocationofstudent1test
scorefornextinnerloop
sumcomputation.
MOVR0,#0 Clearsumaccumulator
registerR0.
INNER LDR R1,[R2],R3,LSL#2LoadtestscoreintoR1
andincrementR2byStrideto
pointtonexttestscore.
ADD R0,R0,R1 AccumulatescoreintoR0.
SUBSR7,R7,#1 Checkifallstudentscores
BGT INNER forcurrenttestareadded.
STR R0,[R4],#4 Storesuminmemory.
ADD R5,R5,#4 IncrementR5tonexttest
scoreforstudent1.
SUBSR6,R6,#1 Checkifsumsforalltest
BGT OUTER scoreshavebeenaccumulated.
4

3.11.Assumethatthesubroutinecanchangethecontentsofanyregistersused
topassparameters.
STR R5,[R13,#4]! Save[R5]onstack.
ADD R1,R0,R1,LSL#2 LoadaddressofA(0,x)intoR1.
ADD R2,R0,R2,LSL#2 LoadaddressofA(0,y)intoR2.
LOOP LDR R5,[R1],R4,LSL#2Load[A(i,x)]intoR5
andincrementpointerR1
byStride=4m.
LDR R0,[R2] Load[A(i,y)]intoR0.
ADD R0,R0,R5 Addcorrespondingcolumnentries.
STR R0,[R2],R4,LSL#2StoresuminA(i,y)and
incrementpointerR2byStride.
SUBSR3,R3,#1 Repeatloopuntilall
BGT LOOP entrieshavebeenadded.
LDR R5,[R13],#4 Restore[R5]fromstack.
MOVR15,R14 Return.
3.12.ThisprogramissimilartoFigure3.9,andmakesthesameassumptions
aboutregisterusageandstatuswordbitlocations.
LDR R0,N UseR0astheloopcounter
forreadingncharacters.
READ LDR R3,[R1] Load[INSTATUS]and
TST R3,#8 waitforcharacter.
BEQ READ
LDRB R3,[R1,#4] Readcharacterandpush
STRB R3,[R6,#1]!ontostack.
ECHO LDR R4,[R2] Load[OUTSTATUS]and
TST R4,#8 waitfordisplay.
BEQ ECHO
STRB R3,[R2,#4] Sendcharacter
todisplay.
SUBS R0,R0,#1 Repeatuntiln
BGT READ charactersread.
3.13.Assumethatmostofthetimebetweensuccessivecharactersbeingstruck
isspentinthethree-instructionwaitloopthatstartsatlocationREAD.
TheBEQ READ instructionisexecutedonceevery60nswhile
thisloopisbeingexecuted.Thereare10
9
=10=10
8
nsbetweensucces-
sivecharacters.Therefore,theBEQREAD instructionisexecuted
10
8
=60=1:666610
6
timespercharacterentered.
5

3.14.MainProgram
READLINE BL GETCHAR Callcharacterreadsubroutine.
STRBR3,[R0],#1Storecharacterinmemory.
BL PUTCHAR Callcharacterdisplaysubroutine.
TEQ R3,#CR Checkforend-of-linecharacter.
BNE READLINE
SubroutineGETCHAR
GETCHAR LDR R3,[R1] Waitforcharacter.
TST R3,#8
BEQ GETCHAR
LDRB R3,[R1,#4]LoadcharacterintoR3.
MOVR15,R14 Return.
SubroutinePUTCHAR
PUTCHAR STMFD R13!,fR4,R14gSaveR4andLinkregister.
DISPLAY LDR R4,[R2] Waitfordisplay.
TST R4,#8
BEQ DISPLAY
STRB R3,[R2,#4] Sendcharactertodisplay.
LDMFD R13!,fR4,R15gRestoreR4andReturn.
6

3.15.AddressINSTATUSispassedtoGETCHARonthestack;thecharacter
readispassedbackinthesamestackposition.Thecharactertobedis-
playedandtheaddressOUTSTATUSarepassedtoPUTCHAR onthe
stackinthatorder.ThestackframestructureshowninFigure3.13is
used.
MainProgram
READLINE LDR R1,POINTER1 LoadaddressINSTATUS
STR R1,[SP,#4]! containedinPOINTER1into
R1andpushontostack.
BL GETCHAR Callcharacterreadsubroutine.
LDRB R1,[SP] Loadcharacterfromtopof
STRB R1,[R0],#1 stackandstoreinmemory.
LDR R2,POINTER2 LoadaddressOUTSTATUS
STR R2,[SP,#4]! containedinPOINTER2into
R2andpushontostack.
BL PUTCHAR Callcharacterdisplaysubroutine.
ADD SP,SP,#8 Removeparametersfromstack.
TEQ R1,#CR Checkforend-of-linecharacter.
BNE READLINE
SubroutineGETCHAR
GETCHAR STMFD SP!,fR1,R3,FP,LRgSaveregisters.
ADD FP,SP,#8 Loadframepointer.
LDR R1[FP,#8] LoadaddressINSTATUSintoR1.
READ LDR R3,[R1] Waitforcharacter.
TST R3,#8
BEQ READ
LDRB R3,[R1,#4] LoadcharacterintoR3
STRB R3,[FP,#8] andoverwriteINSTATUS
onstack.
LDMFD SP!,fR1,R3,FP,PCgRestoreregistersandReturn.
SubroutinePUTCHAR
PUTCHAR STMFD SP!,fR2R4,FP,LRgSaveregisters.
ADD FP,SP,#12 Loadframepointer.
LDR R2,[FP,#8] LoadaddressOUTSTATUSinto
LDR R3,[FP,#12] R2andcharacterintoR3.
DISPLAY LDR R4,[R2] Waitfordisplay.
TST R4,#8
BEQ DISPLAY
STRB R3,[R2,#4] Sendcharactertodisplay.
LDMFD SP!,fR2R4,FP,PCgRestoreregistersandReturn.
7

3.16.Therstprogramsectionreadsthecharacters,storesthemina3-byte
areabeginningatCHARSTR,andechoesthemtoadisplay.Thesecond
sectiondoestheconversiontobinaryandstorestheresultinBINARY.
TheI/OdeviceaddressesINSTATUSandOUTSTATUSareinregisters
R1andR2.
ADR R0,CHARSTR Initializememorypointer
MOVR5,#3 R0andcounterR5.
READ LDR R3,[R1] Readacharacterand
TST R3,#8 storeitinmemory.
BEQ READ
LDRB R3,[R1,#4]
STRB R3,[R0],#1
ECHO LDR R4,[R2] Echothecharacter
TST R4,#8 tothedisplay.
BEQ ECHO
STRB R3,[R2,#4]
SUBS R5,R5,#1 Checkifallthree
BGT READ charactershavebeenread.
CONVERTADR R0,CHARSTR Initializememorypointers
ADR R1,HUNDREDS R0,R1,andR2.
ADR R2,TENS
LDRB R3,[R0,]#1 Loadhigh-orderBCDdigit
AND R3,R3,#&F intoR3.
LDR R4,[R1,R3,LSL#2]Loadbinaryvalue
correspondingtodecimal
hundredsvalueinto
accumulatorregisterR4.
LDRB R3,[R0],#1 LoadmiddleBCDdigit
AND R3,R3,#&F intoR3.
LDR R3,[R2,R3,LSL#2]Loadbinaryvalue
correspondingto
decimaltensvalue
intoregisterR3.
ADD R4,R4,R3 AccumulateintoR4.
LDRB R3,[R0],#1 Loadlow-orderBCDdigit
AND R3,R3,#&F intoR3.
ADD R4,R4,R3 AccumulateintoR4.
STR R4,BINARY Storeconvertedvalue
intolocationBINARY.
8

3.17.(a)ThenamesFP,SP,LR,andPC,areusedforregistersR12,R13,R14,
andR15(framepointer,stackpointer,linkregister,andprogramcounter).
The3-bytememoryareaforthecharactersbeginsataddressCHARSTR;
andtheconvertedbinaryvalueisstoredatBINARY.
Therstsubroutine,labeledREADCHARS, ispatternedafterthepro-
graminFigure3.9.Itechoesthecharactersbacktoadisplayaswellas
readingthemintomemory.ThesecondsubroutineislabeledCONVERT.
ThestackframeformatusedislikeFigure3.13.
Apossiblemainprogramis:
Mainprogram
ADR R10,CHARSTR Loadparametersinto
ADR R11,BINARY R10andR11and
STMFD SP!,fR10,R11g pushontostack.
BL READCHARS Branchtorstsubroutine.
RTNADDR ADD SP,SP,#8 Removetwoparameters
::: fromstackandcontinue.
FirstsubroutineREADCHARS
READCHARS STMFD SP!,fR0R5,FP,LRgSaveregisters
onstack.
ADD FP,SP,#28 Setupframe
pointer.
LDR R0,[FP,#4] LoadR0,R1,
ADR R1,INSTATUS andR2with
ADR R2,OUTSTATUS parameters.
MOV R5,#3 Samecodeas
::: insolutionto
BGT READ Problem3.16.
LDR R0,[FP,#8] LoadR0,R1,R2
LDR R5,[FP,#12] andR5with
ADR R1,HUNDREDS parameters.
ADR R2,TENS
BL CONVERT Callsecond
subroutine.
LDMFD SP!,fR0R5,FP,PCgReturnto
Mainprogram.
9

SecondsubroutineCONVERT
CONVERTSTMFD SP!,fR3,R4,FP,LRgSaveregisters
onstack.
ADD FP,SP,#8 Setupframe
pointer.
LDRB R3,[R0],#1 Samecodeas
::: insolutionto
ADD R4,R4,R3 Problem3.16.
STR R4,[R5] Storebinary
number.
LDMFD SP!,fR3,R4,FP,PCgReturnto
rstsubroutine.
(b)ThecontentsofthetopofthestackafterthecalltotheCONVERT
routineare:
[R0]
[R1]
[R2]
[R3]
[R4]
[R5]
FP! [FP]
[LR]=RTNADDR
CHARSTR
BINARY
OriginalTOS
10

3.18.SeethesolutiontoProblem2.18fortheproceduresneededtoperformthe
appendandremoveoperations.
Registerassignment:
R0Databytetoappendtoorremovefromqueue
R1INpointer
R2OUTpointer
R3Addressofrstqueuebytelocation
R4Addressoflastqueuebytelocation(=[R3]+k1)
R5Auxiliaryregisterforaddressofnextappendedbyte.
Initially,thequeueisemptywith[R1]=[R2]=[R3]
APPENDroutine:
MOV R5,R1
ADD R1,R1,#1 IncrementR1Modulok.
CMP R1,R4
MOVGTR1,R3
CMP R1,R2 Checkifqueueisfull.
MOVEQR1,R5 Ifqueuefull,restore
BEQ QUEUEFULL INpointerandsend
messagethatqueueisfull.
STRB R0,[R5] Ifqueuenotfull,
appendbyteandcontinue.
REMOVEroutine:
CMP R1,R2 Checkifqueueisempty.
BEQ QUEUEEMPTY Ifempty,sendmessage.
LDRB R0,[R2],#1 Otherwise,removebyte
CMP R2,R4 andincrementR2
MOVGTR2,R3 Modulok.
3.19.Programtrace:
TIME R0 R2 R3LISTLISTLISTLISTLIST
+1 +2 +3 +4
After1st12010041000106 13 67 45120
After2nd10610031000 67 13 45106120
After3rd6710021000 45 13 67106120
After4th4510011000 13 45 67106120
11

3.20.Callingprogram
ADRR4,LISTNPassparameterLISTNto
subroutineinR4.
AssumeLISTN+4=LIST.
BL SORT
SubroutineSORT
SORT STMFD R13!,fR0R3,R5,R14gSaveregisters.
LDR R0,[R4],#4 Initializeouterloop
ADD R2,R4,R0,LSL#2 baseregisterR2
toLIST+4n.
ADD R5,R4,#4 LoadLIST+4into
registerR5.
OUTER LDR R0,[R2,#4]! Commentssimilar
MOV R3,R2 asinFigure3.15.
INNER LDR R1,[R3,#4]!
CMP R1,R0
STRGT R1,[R2]
STRGT R0,[R3]
MOVGTR0,R1
CMP R3,R4
BNE INNER
CMP R2,R5
BNE OUTER
LDMFD R13!,fR0R3,R5,R15gRestoreregisters
andreturn.
12

3.21.ThealternativeprogramfromtheinstructionlabeledOUTERtotheend
is:
OUTER LDRB R0,[R2,#1]!LoadLIST(j)intoR0.
MOV R3,R2 Initializeinnerloopbaseregister
R3toLIST+n1.
MOV R6,R2 Loadaddressofinitiallargest
elementintoR6.
MOV R7,R0 Loadinitiallargestelement
intoR7.
INNER LDRB R1,[R3,#1]!LoadLIST(k)intoR1.
CMP R1,R7 CompareLIST(k)tocurrentlargest.
MOVGTR6,R3 Updateaddressandvalueof
MOVGTR7,R1 largestifLIST(k)larger.
CMP R3,R4 Checkifinnerloopcompleted.
BNE INNER
STRB R0,[R6] Swap;correctcodeevenifno
STRB R7,[R2] largerelementisfound.
CMP R2,R5
BNE OUTER
TheadvantageofthisapproachisthatthetwoMOVGTinstructionsin
theinnerloopofthealternativeprogramexecutefasterthanthethree-
instructioninterchangecodeinFigure3.15b.
3.22.TherecordpointerisregisterR0,andregistersR1,R2,andR3,areused
toaccumulatethethreesums,asinFigure2.15.Assumethatthelistis
notempty.
MOVR0,#1000
MOVR1,#0
MOVR2,#0
MOVR3,#0
LOOP LDR R5,[R0,#8]
ADD R1,R1,R5
LDR R5,[R0,#12]
ADD R2,R2,R5
LDR R5,[R0,#16]
ADD R3,R3,R5
LDR R0,[R0,#4]
CMP R0,#0
BNE LOOP
STR R1,SUM1
STR R2,SUM2
STR R3,SUM3
13

3.23.IftheIDofthenewrecordmatchestheIDoftheHeadrecord,thenew
recordwillbecomethenewHead.IftheIDmatchesthatofalaterrecord,
itwillbeinsertedimmediatelyafterthatrecord,includingthecasewhere
thematchingrecordistheTail.
ModifyFigure3.16asfollows:
Addthefollowinginstructionastherstinstructionofthesubrou-
tine:
INSERTIONMOVR10,#0Anticipatesuccessful
insertionofnewrecord.
AfterthesecondCMPinstruction,insertthefollowingtwoinstruc-
tions:
MOVEQR10,RHEAD IDmatchesthatof
MOVEQPC,R14 Headrecord.
AftertheinstructionlabeledLOOP,insertthefollowingfourinstruc-
tions:
LDR R0,[RNEXT]
CMP R0,R1
MOVEQR10,RNEXT
MOVEQPC,R14
Removetheinstructionwiththecomment\Gofurther?"becauseit
hasalreadybeendoneinthepreviousbullet.
14

3.24.Ifthelistisempty,theresultisunpredictablebecausethesecondinstruc-
tioncomparesthenewIDwiththecontentsofmemorylocationzero.If
thelistisnotempty,theprogramcontinuesuntilRCURRENTpointsto
theTailrecord.ThentheinstructionatLOOPloadszerointoRNEXT
andtheresultisunpredictable.
ReplaceFigure3.17withthefollowingcode:
DELETION CMP RHEAD,#0 Iflistisempty,return
MOVEQPC,R14 withRIDNUMunchanged.
CHECKHEAD LDR R0,[RHEAD] CheckifHeadrecordis
CMP R0,RIDNUM tobedeleted.Ifyes,
LDREQ RHEAD,[RHEAD,#4] deleteit,andthenreturn
MOVEQRIDNUM,#0 withzeroinRIDNUM.
MOVEQPC,R14
MOV RCURRENT,RHEAD Otherwise,continuesearch.
LOOP LDR RNEXT,[RCURRENT,#4]
CMP RNEXT,#0 Ifallrecordschecked,return
MOVEQPC,R14 withRIDNUMunchanged.
LDR R0,[RNEXT] Isnextrecordtheone
CMP R0,RIDNUM tobedeleted?
LDREQ R0,[RNEXT,#4] Ifyes,deleteit,and
STREQ R0,[RCURRENT,#4] returnwithzero
MOVEQRIDNUM,#0 inRIDNUM.
MOVEQPC,R14
MOV RCURRENT,RNEXT Otherwise,loopbackand
B LOOP continuetosearch.
15

PARTII:68000
3.25.(a)Location$2000 $1000+$3000=$4000
Theinstructionoccupiestwobytes.Onememoryaccessisneededtofetch
theinstructionand4toexecuteit.
(b)EectiveAddress=$1000+$1000=$2000,
D0 $3000+$1000=$4000
4bytes;2accessestofetchinstructionand2toexecuteit.
(c)$2000 $2000+$3000=$5000
6bytes;3accessestofetchinstructionand4toexecuteit.
3.26.(a)ADDX(A2),D3
InAddextended,boththedestinationandsourceoperandsmustusethe
sameaddressingmode,eitherregisterorautodecrement.
(b)LSR.L#9,D2
Thenumberofbitsshiftedmustbelessthan8.
(c)MOVE.B 520(A0,D2)
Theosetvaluerequiresmorethan8bits.Also,nodestinationoperand
isspecied.
(d)SUBA.L 12(A2,PC),A0
InrelativefulladdressingmodethePCmustbespeciedbeforethead-
dressregister.
(e)CMP.B#254,$12(A2,D1.B)
Thedestinationoperandmustbeadataregister.Alsothesourceoperand
isoutsidetherangeofsignedvaluesthatcanberepresentedin8bits.
3.27.Programtrace:
TIME D0D1A2NNUM1 SUM
after1stADD.W 835240252400 0
after2ndADD.W 1284240452400 0
after3rdADD.W 2843240652400 0
after4thADD.W 342240852400 0
after5thADD.W 1341241052400 0
afterlastMOVE.L1340241052400 134
16

3.28.(a)Thisprogramndsthelocationofthesmallestelementinalistwhose
startingaddressisstoredinMEM1,andsizeinMEM2.Thesmallest
elementisstoredinlocationDESIRED.
(b)16wordsarerequiredtostorethisprogram.Wehaveassumedthat
theassemblerusesshortabsoluteaddresses.(Longaddressesarenormally
speciedasMEM1.L,etc.)Otherwise,3morewordswouldbeneeded.
(c)TheexpressionformemoryaccessesisT=16+5n+4m.
3.29.(a)Theybothleavethe17thnegativewordinRSLT.
(b)Bothprogramsscanthroughthelisttondthe17thnegativenumber
inthelist.
(c)Program1takes26bytesofmemory,whileProgram2requires24.
(d)LetPbethenumberofnon-negativeentriesencountered.Program1
requires9+717+3PandProgram2requires10+617+4P
memoryaccesses.
(e)Program1requiresslightlymorememory,buthasaclearspeedad-
vantage.Program2destroystheoriginallist.
3.30.A68000programtocomparetwobytelistsatlocationsXandY,putting
thelargerbyteateachpositioninaliststartingatlocationLARGER,is:
MOVEA.L#X,A0
MOVEA.L#Y,A1
MOVEA.L#LARGER,A2
MOVE.W N,D0
SUBQ #1,D0 InitializeD0to[N]1
LOOP CMP.B (A0)+,(A1)+ Comparelistsandadvancepointers
BGT LISTY
MOVE.B 1(A0),(A2)+CopyitemfromlistX
BRA NEXT Checknextitem
LISTYMOVE.B 1(A1),(A2)+CopyitemfromlistY
NEXT DBRA D0,LOOP Continueifmoreentries
17

3.31.A68000programforstringmatching:
MOVEA.L#STRING,A0 GetlocationofSTRING
MOVE.W N,D0 LoadD0withappropriate
MOVE.W M,D1 countfor\matchattempts"
SUB.W D1,D0
LOOP MOVEA.L#SUBSTRING,A1 GetlocationofSUBSTRING
MOVE.W M,D1 GetsizeofSUBSTRING
MOVE.L A0,A2 SavelocationinSTRINGatwhich
comparisonwillstart
MATCHER DBRA D1,SUCCESS
CMP.B (A0)+,(A1)+ Compareandadvancepointers
BEQ MATCHER Ifsame,checknextcharacter
MOVEA.LA2,A0 Matchfailed;advancestarting
ADDQ.L #1,A0 characterpositioninSTRING
DBRA D0,LOOP CheckifendofSTRING
MOVE.L #0,D0 Substringnotfound
BRA NEXT
SUCCESS MOVEA.LA2,D0 Savelocationwherematchfound
NEXT Nextinstruction
NotethatDBRAisusedintwowaysinthisprogram,onceatthebeginning
andonceattheendofaloop.Intherstcase,thecounterisinitialized
to[M],whileinthesecondthecorrespondingcounterisinitializedto
[N][M].Thisarrangementhandlesasubstringofzerolengthcorrectly,
andstopstheattempttondamatchattheproperposition.
18

3.32.A68000programtogeneratetherstnnumbersoftheFibonacciseries:
MOVEA.L#MEMLOC,A0 Startingaddress
MOVE.B N,D0 Numberofentries
CLR D1 Therstentry=0
MOVE.B D1,(A0)+
MOVE #1,D2 Thesecondentry=1
MOVE.B D2,(A0)+
SUBQ.B #3,D0 Firsttwoentriesalreadysaved
LOOPMOVE.B 2(A0),D1 Getsecond-lastvalue
ADD.B D1,D2 Addtolastvalue
MOVE.B D2,(A0)+ Storenewvalue
DBRA D0,LOOP
Therst15numbersintheFibonaccisequenceare:0,1,1,2,3,5,8,13,
21,34,55,89,144,233,377.Therefore,thelargestvalueofnthatthis
programcanhandleis14,becausethelargestnumberthatcanbestored
inabyteis255.
3.33.LetA0pointtotheASCIIword.Tochangetouppercase,weneedto
changebitb5from1to0.
NEXTMOVE.B(A0),D0Getcharacter
CMP.B#$20,D0Checkifspacecharacter
BEQ END
ANDI.B #$DF,D0Clearbit5
MOVE.BD0,(A0)+Storeconvertedcharacter
BRA NEXT
END Nextinstruction
19

3.34.LetStride=2(j+1),whichisthedistanceinbytesbetweenscoresona
particulartestforadjacentstudentsinthelist.
MOVE J,D3 ComputeStride=2(j+1)
ADDQ #1,D3
LSL #1,D3
MOVEA.L#SUM,A4 UseA4aspointertothesums
MOVEA.L#LIST,A5 UseA5aspointertoscores
ADDQ #2,A5 forstudent1
MOVE J,D6 UseD6asouterloopcounter
SUBQ #1,D6 AdjustforuseofDBRAinstruction
OUTER MOVE N,D7 UseD7asinnerloopcounter
SUBQ #1,D7 AdjustforuseofDBRAinstruction
MOVE A5,A2 UseA2asbaseforscanningtestscores
CLR D0 UseD0assumaccumulator
INNER ADD [A2],D0 Accumulatetestscores
ADD D3,A2 Pointtonextscore
DBRA D7,INNER Checkifscoreforcurrenttest
forallstudentshavebeenadded
MOVE D0,[A4] Storesuminmemory
ADDQ #2,A5 Incrementtonexttest
ADDQ #2,A4 Pointtonextsum
DBRA D6,OUTER Checkifscoresforalltests
havebeenaccumulated
3.35.ThisprogramissimilartoFigure3.27,andmakesthesameassumptions
aboutstatuswordbitlocations.
MOVE #N,D0
SUBQ.W #1,D0 InitializeD0ton1
READBTST.W #3,INSTATUS
BEQ READ Waitfordataready
MOVE.BDATAIN,D1 Getnewcharacter
MOVE.BD1,(A0) Pushonuserstack
ECHOBTST.W #3,OUTSTATUS
BEQ ECHO Waitforterminalready
MOVE.BD1,DATAOUT Outputnewcharacter
DBRA D0,READ Readnextcharacter
20

3.36.Assumethatmostofthetimebetweensuccessivecharactersbeingstruck
isspentinthetwo-instructionwaitloopthatstartsatlocationREAD.The
BEQREAD instructionisexecutedonceevery40nswhilethisloopis
beingexecuted.Thereare10
9
=10=10
8
nsbetweensuccessivecharacters.
Therefore,theBEQREAD instructionisexecuted10
8
=40=2:5
10
6
timespercharacterentered.
3.37.AssumethatregisterA4isusedasamemorypointerbythemainprogram.
MainProgram
READLINE BSR GETCHAR Callcharacterreadsubroutine.
MOVE.BD0,(A4)+ Storecharacterinmemory.
BSR PUTCHAR Callcharacterdisplaysubroutine.
CMPI.B #CR,D0 Checkforend-of-linecharacter.
BNE READLINE
SubroutineGETCHAR
GETCHAR BTST.W #3,(A0) Waitforcharacter.
BEQ GETCHAR
MOVE.B(A1),D0 LoadcharacterintoD0.
RTS Return.
SubroutinePUTCHAR
PUTCHAR BTST.W #3,(A2) Waitfordisplay.
BEQ PUTCHAR
MOVE.BD0,(A3) Sendcharactertodisplay.
RTS Return.
21

3.38.AddressesINSTATUSandDATAINarepushedontotheprocessorstack
inthatorderbythemainprogramasparametersforGETCHAR.The
characterreadispassedbacktothemainprogramintheDATAINposi-
tiononthestack.TheaddressesOUTSTATUSandDATAOUTandthe
charactertobedisplayedarepushedontotheprocessorstackinthatorder
bythemainprogramasparametersforPUTCHAR.Astackstructurelike
thatshowninFigure3.29isused.
GETCHARusesregistersA0,A1,andD0toholdINSTATUS,DATAIN,
andthecharacterread.
PUTCHARusesregistersA0,A1,andD0toholdOUTSTATUS,DATAOUT,
andthecharactertobedisplayed.
ThemainprogramusesregisterA0asamemorypointer,andusesregister
D0toholdthecharacterread.
MainProgram
READLINE MOVE.L#INSTATUS,(A7) Pushaddressparameters
MOVE.L#DATAIN,(A7) ontothestack.
BSR GETCHAR Callcharacterreadsubroutine.
MOVE.L(A7)+,D0 Poplongwordcontaining
MOVE.BD0,(A0)+ characterfromtopof
stackintoD0and
storecharacterintomemory.
ADDI #4,A7 RemoveINSTATUSfromstack.
MOVE.L#OUTSTATUS,(A7)Pushaddressparameters
MOVE.L#DATAOUT,(A7) ontostack.
MOVE.LD0,(A7) Pushlongwordcontaining
characterontostack.
BSR PUTCHAR Callcharacterdisplaysubroutine.
ADDI #12,A7 Removethreeparametersfromstack.
CMPI.B #CR,D0 Checkforend-of-linecharacter.
BNE READLINE
SubroutineGETCHAR
GETCHAR MOVEM D0/A0-A1,(A7)Saveregisters.
MOVE.L20(A7),A0 LoadaddressINSTATUSintoA0.
MOVE.L16(A7),A1 LoadaddressDATAINintoA1.
READ BTST #3,(A0) Waitforcharacter.
BEQ READ
MOVE.B(A1),D0 LoadcharacterintoD0and
MOVE.LD0,16(A7) pushontothestack,
overwritingDATAIN.
MOVEM (A7)+,D0/A0-A1Restoreregisters.
RTS Return.
22

SubroutinePUTCHAR
PUTCHAR MOVEM D0/A0-A1,(A7)Saveregisters.
MOVE.L24(A7),A0 LoadaddressOUTSTATUSintoA0.
MOVE.L20(A7),A1 LoadaddressDATAOUTintoA1.
MOVE.L16(A7),D0 Loadlongwordcontaining
characterintoD0.
DISPLAY BTST #3,(A0) Waitfordeviceready.
BEQ DISPLAY
MOVE.BD0,(A1) Sendcharactertodisplay.
MOVEM (A7)+,D0/A0-A1Restoreregisters.
RTS Return.
23

3.39.SeethesolutiontoProblem2.18fortheproceduresneededtoperformthe
appendandremoveoperations.
Registerassignment:
D0Databytetoappendtoorremovefromqueue
A1INpointer
A2OUTpointer
A3Addressofrstqueuebytelocation
A4Addressoflastqueuebytelocation(=[A3]+k1)
A5Auxiliaryregisterforaddressofnextappendedbyte
Initially,thequeueisemptywith[A1]=[A2]=[A3]
APPENDroutine:
MOVEA.LA1,A5
ADDQ.L #1,A1 IncrementA1Modulok.
CMPA.L A1,A4
BGE CHECK
MOVEA.LA3,A1
CHECK CMPA.L A1,A2 Checkifqueueisfull.
BNE APPEND Ifqueuenotfull,appendbyte.
MOVEA.LA5,A1 Otherwise,restore
BRA QUEUEFULL INpointerandsend
messagethatqueueisfull.
APPEND MOVE.B D0,[A5] Appendbyte.
REMOVEroutine:
CMPA.L A1,A2 Checkifqueueisempty.
BEQ QUEUEEMPTY Ifempty,sendmessage.
MOVE.B (A2)+,D0 Otherwise,removebyte
CMPA.L A2,A4 andincrementA2
BGE NEXT Modulok.
MOVEA.LA3,A2
NEXT :::
24

3.40.UsingthesameassumptionsasinProblem3.35anditssolution,a68000
programtoconvert3inputdecimaldigitstoabinarynumberis:
BSR READ Getrstcharacter
ASL #1,D0 Multiplyby2forwordoset
MOVE.WHUNDREDS(D0),D1 Gethundredsvalue
BSR READ Getsecondcharacter
ASL #1,D0 Multiplyby2forwordoset
ADD.W TENS(D0),D1 Gettensvalue
BSR READ Getthirdcharacter
ADD.W D0,D1 D1containsvalueofbinary
number
READBTST.W #3,INSTATUS
BEQ READ Waitfornewcharacter
MOVE.BDATAIN,D0 Getnewcharacter
AND.B #$0F,D0 Converttoequivalentbinary
value
RTS
25

3.41.(a)Thesubroutinesconvert3decimaldigitstoabinaryvalue.
GETDECIMAL MOVEM.LD0/A0A1,(A7) Saveregisters
MOVEA.L20(A7),A0 Getstringbueraddress
MOVE.B #2,D0 UseD0ascharactercounter
READ BTST.W #3,INSTATUS
BEQ READ
MOVE.B DATAIN,(A0)+ Getandstorecharacter
DBRA D0,READ Repeatuntilallcharacters
received
MOVE.L 16(A7),A1 Pointertoresult
BSR CONVERSION
MOVEM.L(A7)+,D0/A0A1 Restoreregisters
RTS
CONVERSION MOVEM.LD0D1,(A7) Saveregisters
MOVE.B (A0),D0 Getleastsig.digit
AND.W #$0F,D0 Numericvalueofdigit
MOVE.B (A0),D1 Gettensdigit
AND.W #$0F,D1 Numericvalueofdigit
ASL #1,D1
ADD.W TENS(D1),D0 Addtensvalue
MOVE.B (A0),D1 Gethundredsdigit
AND.W #$0F,D1 Numericvalueofdigit
ASL #1,D1
ADD.W HUNDREDS(D1),D0 Addhundredsvalue
MOVE.W D0,(A1) Storeresult
MOVEM.L(A7)+,D0D1 Restoreregisters
RTS
(b)ThecontentsofthetopofthestackafterthecalltotheCONVERSION
routineare:
ReturnaddressofCONVERSION
D0MAIN
A1MAIN
A0MAIN
ReturnaddressofGETDECIMAL
Resultaddress
Bueraddress
ORIGTOS
26

3.42.Assumethatthesubroutinecanchangethecontentsofanyregistersused
topassparameters.LetStride=2m,whichisthedistanceinbytes
betweensuccessivewordelementsinagivencolumn.
LSL #1,D4 SetStrideinD4
SUB D1,D2 SetD2tocontain
LSL #1,D2 2(yx)
LSL #1,D1 SetA0toaddress
ADDAD1,A0 A(0,x)
BRA START
LOOP MOVE(A0),D1 Load[A(i,x)]intoD1
ADD D1,(A0,D2)Addarrayelements
ADD D4,A0 Movetonextrow
STARTDBRA D3,LOOP Repeatloopuntilall
entrieshavebeenadded
RTS Return
NotethatLOOPisenteredbybranchingtotheDBRAinstruction.So
DBRAdecrementsD3tocontainn1,whichisthecorrectstartingvalue
whentheDBRAinstructionisused.
3.43.A68000programtoreversetheorderofbitsinregisterD2:
MOVE#15,D0 UseD0ascounter
CLR D1 D1willreceivenewvalue
LOOPLSL D2 ShiftMSBofD2intoXbit
ROXRD1 ShiftXbitintoMSBofD1
DBRA D0,LOOPRepeatuntilD0reaches1
MOVED1,D2 PutnewvaluebackinD2
27

3.44. Bytes/access
MOVEA.L#LOC,A0 6/3
MOVE.B (A0)+,D0 2/2
LSL.B #4,D0 2/1
MOVE.B (A0),D1 2/2
ANDI.B #$F,D1 4/2
OR.B D0,D1 2/1
MOVE.B D1,PACKED 4/3
Totalsizeis22bytesandexecutioninvolves14memoryaccesscycles.
3.45.Thetracetableis:
TIME 10001001100210031004D1D2D3
after1stBGTOUTER 10613 67 4512031120
after2ndBGTOUTER 67 13 4510612021106
after3rdBGTOUTER 45 13 671061201167
after4thBGTOUTER 13 45 671061200145
3.46.AssumethelistaddressispassedtothesubroutineinregisterA1.When
thesubroutineisentered,thenumberoflistentriesneedstobeloaded
intoD1.ThenA1mustbeupdatedtopointtotherstentryinthelist.
Becauseaddressesmustbeincrementedordecrementedby2tohandle
wordquantities,theaddressmode(A1,D1)isnolongeruseful.Also,
sincetheinitialaddresspointstothebeginningofthelist,wewillscan
thelistforwards.
MOVE (A1)+,D1 Loadnumberofentries,n
SUBQ #2,D1 Outerloopcounter n2(j:0ton2)
OUTER MOVE D1,D2 Innerloop outerloopcounter
MOVEAA1,A2 UseA2asapointerintheinnerloop
ADDQ #2,A2 k j+1(k:1ton1)
INNER MOVE (A1),D3 CurrentmaximumvalueinD3
CMP (A2),D3
BLE NEXT IfLIST(j)LIST(k),gotonext
MOVE (A2),(A1)InterchangeLIST(k)
MOVE D3,(A2) andLIST(j).
NEXT ADDQ #2,A2
DBRA D2,INNER
ADDQ #2,A1
DBRA D1,OUTER Ifnotnished,
RTS return
28

3.47.UseD4tokeeptrackofthepositionofthelargestelementintheinner
loopandD5torecorditsvalue.
MOVEA.L#LIST,A1 Pointertothestartofthelist
MOVE N,D1 Initializeouterloop
SUBQ #1,D1 indexjinD1
OUTER MOVE D1,D2 Initializeinnerloop
SUBQ #1,D2 indexkinD2
MOVE.L D1,D4 Indexoflargestelement
MOVE.B (A1,D1),D5 Valueoflargestelement
INNER MOVE.B (A1,D2),D3 Getnewelement,LIST(k)
CMP.B D3,D5 Comparetocurrentmaximum
BCC NEXT Iflowergotonextentry
MOVE.L D2,D4 Updateindexoflargestelement
MOVE.L D3,D5 Updatelargestvalue
NEXT DBRA D2,INNER Innerloopcontrol
MOVE.B (A1,D1),(A1,D4)SwapLIST(j)andLIST(k);
MOVE.B D5,(A1,D1) correctevenifsame
SUBQ #1,D1 Branchback
BGT OUTER ifnotnished
Thepotentialadvantageisthattheinnerloopofthenewprogramshould
executefaster.
3.48.AssumethatregisterA0pointstotherstrecord.Wewilluseregisters
D1,D2,andD3toaccumulatethethreesums.Assumealsothatthelist
isnotempty.
CLR D1
CLR D2
CLR D3
LOOP ADD.L 8(A0),D1Accumulatescoresfortest1
ADD.L 12(A0),D2Accumulatescoresfortest2
ADD.L 16(A0),D3Accumulatescoresfortest3
MOVE.L 4(A0),D0Getlink
MOVEA.LD0,A0 Loadinpointerregister
BNE LOOP
MOVE.L D1,SUM1
MOVE.L D2,SUM2
MOVE.L D3,SUM3
NotethattheMOVEinstructionthatreadsthelinkvalueintoregister
D0setstheZandNags.TheMOVEAinstructiondoesnotaectthe
conditioncodeags.Hence,theBNEinstructionwilltestthecorrect
values.
29

3.49.IntheprogramofFigure3.35,iftheIDofthenewrecordmatchestheID
oftheHeadrecord,thenewrecordwillbecomethenewHead.IftheID
matchesthatofalaterrecord,itwillbeinsertedimmediatelyafterthat
record,includingthecasewherethematchingrecordistheTail.
Modifytheprogramasfollows.
Addthefollowingastherstinstruction
INSERTION MOVE.L#0,A6 Anticipateasuccessfulinsertion
AftertheinstructionlabeledHEADinsert
BEQ DUPLICATE1Newrecordmatcheshead
AftertheBLTINSERTinstructioninsert
BEQ DUPLICATE2Newrecordmatchesarecord
otherthanhead
Addthefollowinginstructionsattheend
DUPLICATE1MOVE.LA0,A6 Returntheaddressofthehead
RTS
DUPLICATE2MOVE.LA3,A6 Returnaddressofmatchingrecord
RTS
3.50.IftheIDofthenewrecordislessthanthatofthehead,theprogram
inFigure3.36willdeletethehead.Ifthelistisempty,theresultis
unpredictablebecausetherstinstructioncomparesthenewIDwiththe
contentsofmemorylocationzero.Ifthelistisnotempty,theprogram
continuesuntilA2pointstotheTailrecord.Thentheinstructionat
LOOPloadszerointoA3andtheresultisunpredictable.
Tocorrectbehavior,modifytheprogramasfollows.
AftertherstBGTinstructioninsert
BLT ERRORIDofnewrecordlessthanhead
MOVE.L#0,D1 Deletionsuccessful
AftertheBEQDELETEinstructioninsert
BGT ERRORIDofNewrecordislessthan
thatofthenextrecordand
greaterthanthecurrentrecord
AddthefollowinginstructionafterDELETE
MOVE.L#0,D1 Deletionsuccessful
Addthefollowinginstructionattheend
ERRORRTS Recordnotinthelist
30

PARTIII:IntelIA-32
3.51.Initialmemorycontentsare:
[1000]=1
[1004]=2
[1008]=3
[1012]=4
[1016]=5
[1020]=6
(a)[EBX+ESI*4+8]=1016
EAX 10+5=15
(b)Thevalues20and30arepushedontotheprocessorstack,andthen30
ispoppedintoEAXand20ispoppedintoEBX.TheSubtractinstruction
thenperforms3020,andplacestheresultof10intoEAX.
(c)Theaddressvalue1008isloadedintoEAX,andthen3isloadedinto
EBX.
3.52.(a)OK
(b)ERROR:Onlyoneoperandcanbeinmemory.
(c)OK
(d)ERROR:Scalefactorcanonlybe1,2,4,or8.
(e)OK
(f)ERROR:Animmediateoperandcannotbeadestination.
(g)ERROR:ESPcannotbeusedasanindexregister.
3.53.Programtrace:
TIME EAX EBXECX
After1stexecutionofLOOP 113NUM144
After2ndexecutionofLOOP 129NUM143
After3rdexecutionofLOOP 78NUM142
31

3.54.Assumebytesareunsigned8-bitvalues.
MOVECX,N ECXislistcounter.
LEA ESI,X ESIpointstoXlist.
SUB ESI,1
LEA EDI,Y EDIpointstoYlist.
SUB EDI,1
LEA EDX,LARGER EDXpointstoLARGERlist.
SUB EDX,1
START: MOVAL,[ESI+ECX] LoadXbyteintoAL.
MOVBL,[EDI+ECX],LoadYbyteintoBL.
CMP AL,BL Comparebytes.
JAE XLARGER BranchifXbyte
largerorsame.
MOV[EDX+ECX],BLOtherwise,store
Ybyte.
JMP CHECK
XLARGERMOV[EDX+ECX],ALStoreXbyte.
CHECK LOOP START Checkifdone.
32

3.55.Theinnerloopchecksforamatchateachpossibleposition.
MOVEDX,N Computeouterloopcount
SUB EDX,M andstoreinEDX.
INC EDX
LEA EAX,STRING UseEAXasabase
pointerforeachmatch
attempt.
OUTER: MOVESI,EAX UseESIandEDIas
LEA EDI,SUBSTRING runningpointersfor
eachmatchattempt.
MOVECX,M Initializeinnerloopcounter.
INNER: MOVBL,[EDI] Loadnextsubstringbyte
CMP BL,[ESI] intoBLandcompareto
correspondingstringbyte.
JNE NOMATCH Ifnotequal,goto
nextsubstringposition.
INC ESI Ifequal,incrementrunning
INC EDI pointerstonextbyte
positions.
LOOP INNER Checkifallsubstring
bytescompared.
JMP NEXT Ifamatchisfound,
exitwithstringposition
inEAX.
NOMATCH:INC EAX IncrementEAXtonextpossible
substringposition.
DEC EDX Checkifallpositionstried.
JG OUTER
MOVEAX,0 Ifyes,loadzerointo
EAXandexit.
NEXT: ...
33

3.56.Thissolutionassumesthatthelastnumberintheseriesofnnumberscan
berepresentedina32-bitdoubleword,andthatn>2.
MOVECX,N UseECXtocountnumbers
SUB ECX,2 generatedafter1.
LEA EDI,MEMLOC UseEDIasamemory
pointer.
MOVEAX,0 Storersttwonumbers
MOV[EDI],EAX fromEAXandEBXinto
MOVEBX,1 memory.
ADD EDI,4
MOV[EDI],EBX
LOOPSTART:ADD EDI,4 Incrementmemorypointer.
MOVEAX,[EDI8]Loadsecondlastvalue.
ADD EBX,EAX Addtolastvalue.
MOV[EDI],EBX Storenewvalue.
LOOP LOOPSTARTCheckifallnnumbers
generated.
3.57.AssumeregisterEAXcontainstheaddress(WORD)oftherstcharacter.
Tochangecharactersfromlowercasetouppercase,changebitb5from1
to0.
NEXT:MOVBL,[EAX]LoadnextcharacterintoBL.
CMP BL,20H Checkifspacecharacter.
JE END Ifspace,exit.
AND BL,DFH Clearbitb5.
MOV[EAX],BLStoreconvertedcharacter.
INC EAX Incrementmemorypointer.
JMP NEXT Convertnextcharacter.
END: ...
34

3.58.TheparameterStride=(j+1)isthedistanceindoublewordsbetween
scoresonaparticulartestforadjacentstudentsinthelist.
MOVEDX,J LoadouterloopcounterEDX.
INC J IncrementmemorylocationJ
tocontainStride=j+1.
LEA EBX,SUM LoadaddressSUMintoEBX.
LEA EDI,LIST Loadaddressoftestscore1
ADD EDI,4 forstudent1intoEDI.
OUTER: MOVECX,N LoadinnerloopcounterECX.
MOVEAX,0 ClearscoresaccumulatorEAX.
MOVESI,0 ClearindexregisterESI.
INNER: ADD EAX,[EDI+ESI*4]Addnexttestscore.
ADD ESI,J IncrementindexregisterESI
byStridevalue.
LOOP INNER Checkifallnscores
havebeenadded.
MOV[EBX],EAX Storecurrenttestsum.
ADD EBX,4 Incrementsumlocationpointer.
ADD EDI,4 Incrementbasepointertonext
testscoreforstudent1.
DEC EDX Checkifalltestscoressummed.
JG OUTER
ThissolutionusessixoftheIA-32registers.ItdoesnotuseregistersEBP
orESP,whicharenormallyreservedaspointersfortheprocessorstack.
UseofEBPtoholdtheparameterStridewouldresultinasomewhatmore
ecientinnerloop.
3.59.UseregisterECXasacounterregister,anduseEBXasaworkregister.
MOVECX,32 LoadECXwithcountvalue32.
MOVEBX,0 ClearworkregisterEBX.
LOOPSTART:SHL EAX,1 ShiftcontentsofEAXleft
onebitposition,movingthe
high-orderbitintotheCFag.
RCR EBX,1 RotateEBXrightonebit
position,includingtheCFag.
LOOP LOOPSTARTCheckifnished.
MOVEAX,EBX LoadreversedpatternintoEAX.
35

3.60.SeethesolutiontoProblem2.18fortheproceduresneededtoperformthe
appendandremoveoperations.
Registerassignment:
ALDatabytetoappendtoorremovefromthequeue
ESIINpointer
EDIOUTpointer
EBXAddressofrstqueuebytelocation
ECXAddressoflastqueuebytelocation([EBX]+k1)
EDXAuxiliaryregisterforlocationofnextappendedbyte
Initially,thequeueisemptywith[ESI]=[EDI]=[EBX].
Appendroutine:
MOVEDX,ESI SavecurrentvalueofIN
pointerESIinauxiliary
registerEDX.
INC ESI Thesefourinstructions
CMP ECX,ESI incrementESIModulok.
JGE CHECK
MOVESI,EBX
CHECK: CMP EDI,ESI Checkifqueueisfull.
JNE APPEND Ifnotfull,appendbyte.
MOVESI,EDX Otherwise,restoreINpointer
JMP QUEUEFULL andsendmessagethat
queueisfull.
APPEND: MOV[EDX],AL Appendbyte.
Removeroutine:
CMP EDI,ESI Checkifqueueisempty.
JE QUEUEEMPTY Ifempty,sendmessage.
MOVAL,[EDI] Otherwise,removebyteand
INC EDI incrementEDIModulok.
CMP ECX,EDI
JGE NEXT
MOVEDI,EBX
NEXT:...
36

3.61.ThisprogramissimilartoFigure3.44;anditmakesthesameassumptions
aboutstatuswordbitlocations.
MOVECX,N UseECXastheloopcounter.
READ:BT INSTATUS,3 Waitforthecharacter.
JNC READ
MOVAL,DATAIN TransfercharacterintoAL.
DEC EBX Pushcharacterontouserstack.
MOV[EBX],AL
ECHO:BT OUTSTATUS,3Waitforthedisplay.
JNC ECHO
MOVDATAOUT,AL Sendcharactertodisplay.
LOOP READ Checkifallncharactersread.
3.62.Assumethatmostofthetimebetweensuccessivecharactersbeingstruck
isspentinthetwo-instructionwaitloopthatstartsatlocationREAD.The
JNCREAD instructionisexecutedonceevery20nswhilethisloopis
beingexecuted.Thereare10
9
=10=10
8
nsbetweensuccessivecharacters.
Therefore,theJNCREAD instructionisexecuted10
8
=20=510
6
timespercharacterentered.
3.63AssumethatregisterECXisusedasamemorypointerbythemainpro-
gram.
MainProgram
READLINE: CALLGETCHAR
MOV[ECX],AL Storecharacterinmemory.
INC ECX Incrementmemorypointer.
CALLPUTCHAR
CMP AL,CR Checkforend-of-line.
JNE READLINE Gobackformore.
SubroutineGETCHAR
GETCHAR: BT DWORDPTR[EBX],3Waitforcharacter.
JNC GETCHAR
MOVAL,[EDX] LoadcharacterintoAL.
RET
SubroutinePUTCHAR
PUTCHAR: BT DWORDPTR[ESI],3Waitfordisplay.
JNC PUTCHAR
MOV[EDI],AL Displaycharacter.
RET
37

3.64.AddressesINSTATUSandDATAINarepushedontotheprocessorstack
inthatorderbythemainprogramasparametersforGETCHAR.The
characterreadispassedbacktothemainprogramintheDATAINposi-
tiononthestack.TheaddressesOUTSTATUSandDATAOUTandthe
charactertobedisplayedarepushedontotheprocessorstackinthatorder
bythemainprogramasparametersforPUTCHAR.Astackstructurelike
thatshowninFigure3.46isused.
GETCHARusesregistersEBX,EDX,andAL(EAX)toholdINSTATUS,
DATAIN,andthecharacterread.
PUTCHARusesregistersESI,EDI,andAL(EAX)toholdOUTSTATUS,
DATAOUT,andthecharactertobedisplayed.
AssumethatregisterECXisusedasamemorypointerbythemainpro-
gram.
MainProgram
READLINE: PUSHOFFSETINSTATUS Pushaddressparameters
PUSHOFFSETDATAIN ontothestack.
CALLGETCHAR
POP EAX Popthedoubleword
containingthecharacter
readintoEAX.
MOV[ECX],AL Storecharacterin
low-orderbyteofEAX
intothememory.
INC ECX Incrementthememorypointer.
ADD ESP,4 RemoveparameterINSTATUS
fromtopofthestack.
PUSHOFFSETOUTSTATUSPushaddressparameters
PUSHOFFSETDATAOUT ontothestack.
PUSHEAX Pushdoublewordcontaining
thecharactertobedisplayed
ontothestack.
CALLPUTCHAR
ADD ESP,12 Removethreeparameters
fromthestack.
CMP AL,CR Checkforend-of-line
character.
JNE READLINE Gobackformore.
38

SubroutineGETCHAR
GETCHAR: PUSHEAX Saveregisterstobe
PUSHEBX usedinthesubroutine.
PUSHEDX
MOVEBX,[ESP+20] LoadINSTATUSintoEBX.
MOVEDX,[ESP+16] LoadDATAINintoEDX.
READ: BT DWORDPTR[EBX],3Waitforcharacter.
JNC READ
MOVAL,[EDX] ReadcharacterintoAL.
MOV[ESP+16],EAX OverwriteDATAINinthe
stackwiththedoubleword
containingthecharacterread.
POP EDX Restoreregisters.
POP EBX
POP EAX
RET
SubroutinePUTCHAR
PUTCHAR: PUSHEAX Saveregisterstobe
PUSHESI usedinthesubroutine.
PUSHEDI
MOVESI,[ESP+24] LoadOUTSTATUS.
MOVEDI,[ESP+20] LoadDATAOUT.
MOVEAX,[ESP+16] Loaddoublewordcontaining
charactertobedisplayed
intoregisterEAX.
DISPLAY:BT DWORDPTR[ESI],3Waitforthedisplay.
JNC DISPLAY
MOV[EDI],AL Displaycharacter.
POP EDI Restoreregisters.
POP ESI
POP EAX
RET
39

3.65.UsingthesameassumptionsasinProblem3.61anditssolution,anIA-32
programtoconvert3inputdecimaldigitstoabinarynumberis:
CALLREAD Getrstcharacter
MOVEBX,[HUNDREDS +EAX*4]Gethundredsvalue
CALLREAD Getsecondcharacter
ADD EBX,[TENS+EAX*4] Addtensvalue
CALLREAD Getthirdcharacter
ADD EBX,EAX EBXcontainsvalueof
binarynumber
READ:BT INSTATUS,3
JNC READ Waitfornewcharacter
MOVAL,DATAIN Getnewcharacter
AND AL,0FH Converttoequivalent
binaryvalue
RET
40

3.66.(a)Thesubroutinesconvert3decimaldigitstoabinaryvalue.
GETCHARS: PUSHECX Saveregisters.
PUSHEBX
PUSHEAX
MOVECX,3 UseECXascharacter
counter.
MOVEBX,[ESP+20] Loadcharacterbuer
addressintoEBX.
READ: BT INSTATUS,3
JNC READ
MOVBYTEPTR[EBX],DATAIN Getandstorecharacter.
INC EBX Incrementbuerpointer.
LOOPREAD Repeatuntilall
charactersreceived.
MOVEAX,[ESP+16] Pointertoresult.
CALLCONVERT
POP EAX Restoreregisters.
POP EBX
POP ECX
RET
CONVERT:PUSHECX Saveregisters.
PUSHEDX
DEC EBX Loadlow-orderdigit
MOVDL,[EBX] numericalvalue
AND DL,0FH intoEDX.
DEC EBX Loadandadd
MOVCL,[EBX] tensdigitvalue
AND CL,0FH intoEDX.
ADD EDX,[TENS+ECX*4]
DEC EBX Loadandadd
MOVCL,[EBX] hundredsdigitvalue
AND CL,0FH intoEDX.
ADD EDX,[HUNDREDS+ECX*4]
MOV[EAX],EDX Storeresult.
POP EDX Restoreregisters.
POP ECX
RET
41

(b)ThecontentsofthetopofthestackafterthecalltotheCONVERT
subroutineare:
...
ReturnaddresstoGETCHARS
[EAX]
[EBX]
[ECX]
ReturnaddresstoMain
Resultaddress
Bueraddress
ORIGINALTOS
...
3.67.Assumethatthesubroutinecanchangethecontentsofanyregistersused
topassparameters.LetStride=4m,whichisthedistanceinbytes
betweensuccessivedoublewordelementsinagivencolumn.
SHL EBX,2 SetStrideinEBX.
SUB EDI,ESI SetEDItoyx.
SHL ESI,2 SetEDXto
ADD EDX,ESI addressA(0,x).
LOOP:MOVESI,[EDX] AddA(i,x)toA(i,y).
ADD [EDX+EDI*4],ESI
ADD EDX,EBX Movetonextrow.
DEC EAX Repeatloopuntilall
JG LOOP entrieshavebeenadded.
RET Return.
3.68.Programtrace:
TIME EDIECXDLLISTLISTLISTLISTLIST
+1 +2 +3 +4
After1st 3 1120106 13 67 45120
After2nd 2 1106 67 13 45106120
After3rd 1 167 45 13 67106120
After4th 0 145 13 45 67106120
42

3.69.AssumethatthecallingprogrampassestheaddressLIST4tothe
subroutineinregisterEAX.
SubroutineSORT
SORT: PUSH EDI Saveregisters.
PUSH ECX
PUSH EDX
MOV EDI,[EAX] Initializeouterloopindex
DEC EDI registerEDItoj=n1.
ADD EAX,4 SetEAXtocontainLIST.
OUTER: MOV ECX,EDI Initializeinnerloopindex
DEC ECX registertok=j1.
MOV EDX,[EAX+EDI*4]LoadLIST(j)intoEDX.
INNER: CMP [EAX+ECX*4],EDXCompareLIST(k)toLIST(j).
JLE NEXT IfLIST(k)LIST(j),
gotonextkindexentry;
XCHG[EAX+ECX*4],EDXOtherwise,interchangeLIST(k)
MOV [EAX+EDI*4],EDX andLIST(j),leaving
(new)LIST(j)inEDX.
NEXT: DEC ECX Decrementinnerloopindexk.
JGE INNER Repeatorterminateinnerloop.
DEC EDI Decrementouterloopindexj.
JG OUTER Repeatorterminateouterloop.
POP EDX Restoreregisters.
POP ECX
POP EDI
RET
43

3.70.UseregisterESItokeeptrackoftheindexpositionofthelargestelement
intheinnerloop,anduseregisterEDX(DL)torecorditsvalue.Register
EBX(BL)isusedtoholdsublistvaluestobecomparedtothecurrent
largestvalue.
LEA EAX,LIST
MOV EDI,N
DEC EDI
OUTER: MOV ECX,EDI
DEC ECX
MOV ESI,EDI Initialindexoflargest.
MOV DL,[EAX+EDI]Initialvalueoflargest.
INNER: MOV BL,[EAX+ECX]GetLIST(k)element.
CMP BL,DL Comparetocurrentlargest.
JLE NEXT Ifnotlarger,checknext;
MOV DL,BL Otherwise,updatelargest
MOV ESI,ECX andupdateitsindex.
NEXT: DEC ECX Repeatorterminate
JGE INNER innerloop.
XCHG[EAX+EDI],DLInterchangeLIST(j)
MOV [EAX+ESI],DL withLIST([ESI]).
DEC EDI Repeatorterminate
JG OUTER outerloop.
Thepotentialadvantageisthattheinnerloopshouldexecutefaster.
3.71.AssumethatregisterESIpointstotherstrecord,anduseregistersEAX,
EBX,andECX,toaccumulatethethreesums.
MOVEAX,0
MOVEBX,0
MOVECX,0
LOOP:ADD EAX,[ESI+8]Accumulatescoresfortest1.
ADD EBX,[ESI+12]Accumulatescoresfortest2.
ADD ECX,[ESI+16]Accumulatescoresfortest3.
MOVESI,[ESI+4] Getlink.
CMP ESI,0 Checkifdone.
JNE LOOP
MOVSUM1,EAX Storesums.
MOVSUM2,EBX
MOVSUM3,ECX
44

3.72.IftheIDofthenewrecordmatchestheIDoftheHeadrecordofthe
currentlist,thenewrecordwillbeinsertedasthenewHead.IftheID
ofthenewrecordmatchestheIDofalaterrecordinthecurrentlist,the
newrecordwillbeinsertedimmediatelyafterthatrecord,includingthe
casewherethematchingrecordistheTailrecord.Inthislattercase,the
newrecordbecomesthenewTailrecord.
ModifyFigure3.51asfollows:
Addthefollowinginstructionastherstinstructionofthesubrou-
tine:
INSERTION:MOVEDX,0 Anticipatesuccessful
insertionofthenew
record.
MOVRNEWID,[RNEWREC] (Existinginstruction.)
AfterthesecondCMPinstruction,insertthefollowingthreeinstruc-
tions:
JNE CONTINUE1 Threenewinstructions.
MOVEDX,RHEAD
RET
CONTINUE1: JG SEARCH (Existinginstruction.)
AfterthefourthCMPinstruction,insertthefollowingthreeinstruc-
tions:
JNE CONTINUE2 Threenewinstructions.
MOVEDX,RNEXT
RET
CONTINUE2: JL INSERT (Existinginstruction.)
45

3.73.Ifthelistisempty,theresultisunpredictablebecausetherstinstruction
willcomparetheIDofthenewrecordtothecontentsofmemorylocation
zero.Ifthelistisnotempty,thefollowinghappens.Ifthecontentsof
RIDNUMarelessthantheIDnumberoftheHeadrecord,theHeadrecord
willbedeleted.Otherwise,theroutineloopsuntilregisterRCURRENT
pointstotheTailrecord.ThenRNEXTgetsloadedwithzerobythe
instructionatLOOPSTART,andtheresultisunpredictable.
ReplaceFigure3.52withthefollowingcode:
DELETION: CMP RHEAD,0 Ifthelistisempty,
JNE CHECKHEAD returnwithRIDNUM
RET unchanged.
CHECKHEAD: CMP RIDNUM,[RHEAD] CheckifHeadrecord
JNE CONTINUE1 istobedeletedand
MOVRHEAD,[RHEAD +4] performdeletionifit
MOVRIDNUM,0 is,returningwithzero
RET inRIDNUM.
CONTINUE1: MOVRCURRENT,RHEAD Otherwise,continue
searching.
LOOPSTART:MOVRNEXT,[RCURRENT+4]
CMP RNEXT,0 Ifallrecordschecked,
JNE CHECKNEXT returnwithIDNUM
RET unchanged.
CHECKNEXT: CMP RIDNUM,[RNEXT] Checkifnextrecordis
JNE CONTINUE2 tobedeletedand
MOVRTEMP,[RNEXT+4] performdeletionif
MOV[RCURRENT+4],RTEMP itis,returningwith
MOVRIDNUM,0 zeroinRIDNUM.
RET
CONTINUE2: MOVRCURRENT,RNEXT Otherwise,continue
JMP LOOPSTART thesearch.
46

Chapter4–Input/OutputOrganization
4.1.Afterreadingtheinputdata,itisnecessarytocleartheinputstatusagbefore
theprogrambeginsanewreadoperation.Otherwise,thesameinputdatawould
bereadasecondtime.
4.2.TheASCIIcodeforthenumbers0to9canbeobtainedbyadding$30tothe
number.Thevalues10to15arerepresentedbythelettersAtoF,whoseASCII
codescanbeobtainedbyadding$37tothecorrespondingbinarynumber.
Assumetheoutputstatusbitis

inregisterStatus,andtheoutputdataregister
isOutput.
Move #10,R0 UseR0ascounter
Move #LOC,R1UseR1aspointer
Next Move (R1),R2Getnextbyte
Move R2,R3
Shift-right#4,R3 Preparebits

-

Call Convert
Move R2,R3 Preparebits

-

Call Convert
Move $20,R3 Printspace
Call Print
IncrementR1
DecrementR0
Branch
0Next Repeatifmorebytesleft
End
ConvertAnd #0F,R3 Keeponlylow-order4bits
Compare #9,R3
Branch
0LettersBranchif[R3] 9
Or #$30,R3ConverttoASCII,forvalues0to9
Branch Print
LettersAdd #$37,R3ConverttoASCII,forvalues10to15
PrintBitTest #4,StatusTestoutputstatusbit
Branch
0Print Loopbackifequalto0
Move R3,OutputSendcharactertooutputregister
Return
4.3.7CA4,7DA4,7EA4,7FA4.
4.4.Asubroutineiscalledbyaprograminstructiontoperformafunctionneededby
thecallingprogram.Aninterrupt-serviceroutineisinitiatedbyaneventsuchas
aninputoperationorahardwareerror.Thefunctionitperformsmaynotbeat
1

allrelatedtotheprogrambeingexecutedatthetimeofinterruption.Hence,it
mustnotaffectanyofthedataorstatusinformationrelatingtothatprogram.
4.5.Ifexecutionoftheinterruptedinstructionistobecompletedafterreturnfrom
interrupt,alargeamountofinformationneedstobesaved.Thisincludesthe
contentsofanytemporaryregisters,intermediateresults,etc.Analternativeis
toaborttheinterruptedinstructionandstartitsexecutionfromthebeginning
afterreturnfrominterrupt.Inthiscase,theresultsofaninstructionmustnotbe
storedinregistersormemorylocationsuntilitisguaranteedthatexecutionof
theinstructionwillbecompletedwithoutinterruption.
4.6.(a)Interruptsshouldbeenabled,exceptwhenCisbeingserviced.Thenesting
rulescanbeenforcedbymanipulatingtheinterrupt-enableagsintheinterfaces
ofAandB.
(b)AandBshouldbeconnectedtoINTR
,andCtoINTR .Whenaninterrupt
requestisreceivedfromeitherAorB,interruptsfromtheotherdevicewillbe
automaticallydisableduntiltherequesthasbeenserviced.However,interrupt
requestsfromCwillalwaysbeaccepted.
4.7.Interruptsaredisabledbeforetheinterrupt-serviceroutineisentered.Oncede-
vice
turnsoffitsinterruptrequest,interruptsmaybesafelyenabledinthepro-
cessor.Iftheinterfacecircuitofdevice
turnsoffitsinterruptrequestwhenit
receivestheinterruptacknowledgesignal,interruptsmaybeenabledatthebe-
ginningoftheinterrupt-serviceroutineofdevice
.Otherwise,interruptsmay
beenabledonlyaftertheinstructionthatcausesdevice
toturnoffitsinterrupt
requesthasbeenexecuted.
4.8.Yes,becauseotherdevicesmaykeeptheinterruptrequestlineasserted.
4.9.Thecontrolprogramincludesaninterrupt-serviceroutine,INPUT,whichreads
theinputcharacters.Transferofcontrolamongvariousprogramstakesplaceas
showninthediagrambelow.
_ _
RTI
INT
INTERRUPTCALL
PROG INPUT
CONTROL
RTI
RET
AnumberofstatusvariablesarerequiredtocoordinatethefunctionsofPROG
andINPUT,asfollows.
2

BLK-FULL:Abinaryvariable,indicatingwhetherablockisfullandreadyfor
processing.
IN-COUNT:Numberofcharactersread.
IN-POINTER:Pointsatthelocationwherethenextinputcharacteristobe
stored.
PROG-BLK:PointsatthelocationoftheblocktobeprocessedbyPROG.
Twomemorybuffersareneeded,eachcapableofstoringablockofdata.Let
BLK(0)andBLK(1)betheaddressesofthetwomemorybuffers.Thestructure
ofCONTROLandINPUTcanbedescribedasfollows.
CONTROLBLK-FULL:=false
IN-POINTER:=BLK(
)
IN-COUNT:=0
Enableinterrupts
:=0
Loop
WaitforBLK-FULL
Ifnotlastblockthen

BLK-FULL:=false Preparetoreadthenextblock
IN-POINTER:=BLK(
)
IN-COUNT:=0
Enableinterrupts

PROG-BLK:=BLK( ) Processtheblockjustread
CallPROG
Iflastblockthenexit

EndLoop
Interrupt-serviceroutine
INPUT: StoreinputcharacterandincrementIN-COUNTandIN-POINTER
IfIN-COUNT=NThen

disableinterruptsfromdevice
BLK-FULL:=true

Returnfrominterrupt
4.10.Correction:Inthelastparagraph,change“equivalentvalue”to“equivalent
condition”.
Assumethattheinterfaceregistersforeachvideoterminalarethesameasin
Figure4.3.Alistofdeviceaddressesisstoredinthememory,startingatDE-
VICES,wheretheaddressgiveninthelist,DEVADRS,isthatofDATAIN.The
pointerstodataareas,PNTR
,arealsostoredinalist,startingatPNTRS.
Notethatdependingontheprocessor,severalinstructionsmaybeneededto
performthefunctionofoneoftheinstructionsusedbelow.
3

POLL Move #20,R1 UseR1asdevicecounter,
LOOP Move DEVICES(R1),R2Getaddressofdevice
BitTest #0,2(R2) Testinputstatusofadevice
Branch
0NXTDV Skipreadoperationifnotready
Move PNTRS(R1),R3 Getpointertodatafordevice

MoveByte(R2),(R3)+ Getandstoreinputcharacter
Move R3,PNTRS(R1) Updatepointerinmemory
NXTDV DecrementR1
Branch
0LOOP
Return
INTERRUPTSameasPOLL,exceptthatitreturnsonceacharacter
isread.Ifseveraldevicesarereadyatthesametime,
theroutinewillbeenteredseveraltimesinsuccession.
Incasea,POLLmustbeexecutedatleast100timespersecond.Thus
!"
ms.
Theequivalentconditionforcasebcanbeobtainedbyconsideringthecasewhen
all20terminalsbecomereadyatthesametime.Thetimerequiredforinterrupt
servicingmustbelessthantheinter-characterdelay.Thatis,
#$&%'#(()%*,+.-0/
21(3
,or 34/5#(6$879$( char/s.
Thetimespentservicingtheterminalsineachsecondisgivenby:
Casea:Time
:(;%*<$(;%=",+>- ns ?<(A@ s
Caseb:Time
B#$C%EDC%*#$(;%=",+>-F%G(FH($ID ns
Casebisabetterstrategyfor
D*/5KJ # .
Thereadermayrepeatthisproblemusingaslightlymorecompletemodelin
whichthepollingtime,
L,forcase Misafunctionofthenumberofterminals.
Forexample,assumethat
Lincreasesby0.5 @sforeachterminalthatisready,
thatis,
LNO#$)P#$IDQ%Q8J 6 .
4.11.(a)Readtheinterruptvectornumberfromthedevice(1transfer).
SavePCandSR(3transfersona16-bitbus).
Readtheinterruptvector(2transfers)andloaditinthePC.
(b)The68000instructionrequiringthemaximumnumberofmemorytransfers
is:
MOVEM.LD0-D7/A0-A7,LOC.L
whereLOC.Lisa32-bitabsoluteaddress.Fourmemorytransfersare
neededtoreadtheinstruction,followedby2transfersforeachregister,
foratotalof36.
(c)36forcompletionofcurrentinstructionplus6forinterrupthandling,fora
totalof42.
4.12.(a)
RTSVUXWE&RTSVUAYE
RTSVUXWV#;RTSVUAYV#)Z
RTSVUAYE
RTSVUXWV[;RTSVUAYV[)Z RTSVUAYEBZ RTS0UAY\#
4

(b)Seelogicequationsinparta.
(c)Yes.
(d)Inthecircuitbelow,DECIDEisusedtolockinterruptrequests.Thepro-
cessorshouldsettheinterruptacknowledgesignal,INTA,afterDECIDE
returnstozero.Thiswillcausethehighestpriorityrequesttobeacknowl-
edged.Notethatlatchesareplacedattheinputsoftheprioritycircuit.
Theycouldbeplacedattheoutputs,butthecircuitwouldbelessreliable
wheninterruptschangeataboutthesametimeasarbitrationistakingplace
(racesmayoccur).
_ _
INTR
1
INTR
2
INTR
3
Reset
]
^
_
`
DECIDE
a
INTA
1
b
INTA
2
c INTA
3
INTA
d
e
f
g
h
4.13.Inthecircuitgivenbelow,registerArecordswhichdevicewasgivenagrant
mostrecently.Onlyoneofitsoutputsisequalto1atanygiventime,identifying
thehighest-priorityline.ThefallingedgeofDECIDErecordstheresultsof
thecurrentarbitrationcycleinAandatthesametimerecordsnewrequestsin
registerB.Thispreventsrequeststhatarrivelaterfromchangingthegrant.
Thecircuitrequirescarefulinitialization,becauseoneandonlyoneoutputof
registerAmustbeequalto1.Thisoutputdeterminesthehighest-priorityline
duringagivenarbitrationcycle.Forexample,iftheLSBofAisequalto1,point
E2willbeequalto0,givingREQ2thehighestpriority.
5

_ _
E1
GR1
E2
GR2
E3
GR3
E4
GR4
ADECIDE
B
DECIDE
REQ2
REQ3
REQ1
REQ4
4.14.Thetruthtableforapriorityencoderisgivenbelow.
1234567IPL
IPL
IPL

00000000 0 0
10000000 0 1
x1000000 1 0
xx100000 1 1
xxx10001 0 0
xxxx1001 0 1
xxxxx101 1 0
xxxxxx11 1 1
Apossibleimplementationforthisprioritycircuitisasfollows:
6

RTikjl mn

Pnpo0PGnpqVPn

RTikj

n
q
Pn

P RTirj
ps
n

PGn
ut
RTikj

n

Pn
o
Z"n
q
P RTirj
us
n

PGn

Z"n

t
4.15.AssumethattheinterfaceregistersarethesameasinFigure4.3andthatthe
characterstobeprintedarestoredinthememory.
*ProgramA(MAIN)pointstothecharacterstringandcallsDSPLYtwice
MAIN MOVE.L #ISR,VECTORInitializeinterruptvector
ORI.B #$80,STATUS Enableinterruptsfromdevice
MOVE #$2300,SR Setinterruptmaskto3
MOVEA.L#CHARS,A0 Setpointertocharacterlist
BSR DSPLY
MOVEA.L#CHARS,A0
BSR DSPLY
END MAIN
*SubroutineDSPLYprintsthecharacterstringpointedtobyA0
*ThelastcharacterinthestringmustbetheNULLcharacter
DSPLY ...
RTS
*ProgramB,theinterrupt-serviceroutine,pointsatthenumberstringandcallsDSPLY
ISR MOVEM.LA0,
(A7) Saveregistersused
MOVE.L NEWLINE,A0 Startanewline
BSR DSPLY
MOVEA.L#NMBRS,A0 Pointtothenumberstring
BSR DSPLY
MOVEM.L(A7)+,A0 Restoreregisters
RTE
*Charactersandnumberstobedisplayed
CHARS CC /AB...Z/
NEWLINE CB $0D,$0A,0 CodesforCR,LFandNull
NMBRS CB $0D,$0A
CC /01...901...901...9/
CB $0D,$0A,0
WhenISRisentered,theinterruptmaskinSRisautomaticallysetto4bythe
hardware.Toallowinterruptnesting,themaskmustbesetto3atthebeginning
ofISR.
4.16.ModifysubroutineDSPLYinProblem4.15tokeepcountofthenumberofchar-
actersprintedinregisterD1.BeforeISRreturns,itshouldcallRESTORE,which
sendsanumberofspacecharacters(ASCIIcode20
vq)equaltothecountinD1.
7

DSPLY ...
MOVE #$2400,SR Disablekeyboardinterrupts
MOVEBD0,DATAOUT Printcharacter
ADDQ #1,D1
MOVE #$2300,SR Enablekeyboardinterrupts
...
RESTOREMOVE.LD1,D2
BR TEST
LOOP BTST #1,STATUS
BEQ LOOP
MOVEB#$20,DATAOUT
TEST DBRA D2,LOOP
RTS
NotethatinterruptsaredisabledinDSPLYbeforeprintingacharactertoensure
thatnofurtherinterruptsareaccepteduntilthecountisupdated.
4.17.Thedebuggercanusethetraceinterrupttoexecutethesavedinstructionthen
regaincontrol.Thedebuggerputsthesavedinstructionatthecorrectaddress,
enablestraceinterruptsandreturns.Theinstructionwillbeexecuted.Then,a
secondinterruptionwilloccur,andthedebuggerbeginsexecutionagain.Thede-
buggercannowremovetheprograminstruction,reinstallthebreakpoint,disable
traceinterrupts,thenreturntoresumeprogramexecution.
4.18.(a)Thereturnaddress,whichisinregisterR14svc,isPC+4,wherePCisthe
addressoftheSWIinstruction.
LDRR2,[R14,#-4] GetSWIinstruction
BICR2,R2,#&FFFFFF00Clearhigh-orderbits
(b)Assumethatthelow-order8bitsinSWIhavethevalues1,2,3,...to
requestservicesnumber1,2,3,etc.UseregisterR3topointtoatable
ofaddressesofthecorrespondingroutines,ataddresses[R3]+4,[R3]+8,
respectively.
ADRR3,EntryTable Getthetable'saddress
LDRR15,[R3,R2,LSL#2]Loadstartingaddressofroutine
4.19.Eachdevicepullsthelinedown(closesaswitchtoground)whenitisnotready.
Itopenstheswitchwhenitisready.Thus,thelinewillbehighwhenalldevices
areready.
4.20.Therequestfromonedevicemaybemaskedbytheother,becausetheprocessor
mayseeonlyoneedge.
INTR
REQ1
REQ2
8

4.21.AssumethatwhenBRbecomesactive,theprocessorassertsBG1andkeepsit
asserteduntilBRisnegated.
BG1
BR1
BG3
Dev. 1 Dev. 3Processor
Dev. 3 asserts BR
BBSY
4.22.(a)Device2requeststhebusandreceivesagrant.Beforeitreleasesthebus,
device1alsoassertsBR.Whendevice2isnishednothingwillhappen.BRand
BG1remainactive,butsincedevice1doesnotseeatransitiononBG1itcannot
becomethebusmaster.
(b)NodevicemayassertBRifitsBGinputisactive.
4.23.Forbetterclarity,changeBRtowYanduseaninverterwithdelay xptogenerate
BG1.
BG1
BR3
BG3
BG4
d
1
2d
d
W
d
2
Assumingdevice3assertsBG4shortlyafteritdropsthebusrequest(delay xy ),
aspuriouspulseofwidth
z{|xp}PG[8x~xy willappearonBG4.
4.24.RefertothetimingdiagraminProblem4.23.AssumethatbothBR1andBR5
areactivatedduringthedelayperiod
xy .InputBG1willbecomeactiveandatthe
sametimethepulseonBG4willtraveltoBG5.Thus,bothdeviceswillreceive
abusgrantatthesametime.
9

4.25.Astatemachinefortherequiredcircuitisgiveninthegurebelow.Anoutput
calledACKhasbeenadded,indicatingwhenthedevicemayusethebus.Note
thattherestrictioninSolution4.22baboveisobserved(stateB).
BUSREQ, BGi, BBSY/BR, BG(i+1), BBSY, ACK
x1x/0100
x0x/0000
00x/0000
10x/0000
10x/1000
110/1000
1xx/0011
0xx/0000
x1x/0100
A C
D
B
4.26.Thepriorityregisterinthecircuitbelowcontains1111forthehighestpriority
deviceand0000forthelowest.
_ _
o.c.
ARB3*
o.c.
ARB2*
o.c.
ARB1*
o.c.
ARB0*
Winner
StartArbitration
register
Priority
10

4.27.Alargerdistancemeanslongerdelayforthesignalstravelingbetweenthepro-
cessorandtheinputdevice.Primarily,thismeansthat
9 €y ,


E9 and
9oXE

willincrease.Sincelongerdistancesmayalsomeanlargerskew,theintervals


=
and


G
mayhavetobeincreasedtocoverworst-casedifferencesin
propagationdelay.
InthecaseofFigure4.24,theclockperiodmustbeincreasedtoaccommodate
themaximumpropagationdelay.
4.28.Apossiblecircuitisgivenbelow.
_ _
D
0
D
7
V
cc
Drivers
‚
‚
‚
A
4
A
5
ƒ
ƒ
ƒ
DeviceSelected
AddressDecoder
Clock
„
„
„
Sensors
Tri-state
Enable
A
15
A
9
A
8
Read/Write
… …†…9…9…
A
3
A
0
11

4.29.AssumethatthedisplayhasthebusaddressFE40.Thecircuitbelowsetsthe
Loadsignalto0duringthesecondhalfofthewritecycle.Therisingedgeatthe
endoftheclockperiodwillloadthedataintothedisplayregister.
_ _
Display
7-segment
A
0
A
8,7,5,4
A
3
A
6
A
9
A
15
D
0
D
3
Clock
Read/Write
‡ ‡T‡T‡T‡
ˆ
ˆ
‰
‰
‰
Load
Register
4-bit
Š
4.30.GenerateSIN ‹ŒŽ’‘ inthesamewayasLoadinProblemP4.29.Thissignalshould
loadthedataonD6intoanInterrupt-Enableip-op,IntEn.Theinterruptre-
questcannowbegeneratedasRTSVUAY“ ”8RTS:ZRT•p–9—k• .
4.31.Hardwareorganizationandastatediagramforthememoryinterfacecircuitare
givenbelow.
MyAddress
MyAddress
A C
D
Read
Read
Enable
Slave-ready
Control
Memory Drivers
Tri-state
Data
Address
Clock
Slave-ready
EnableRead
12

4.32.(a)Oncethememoryreceivestheaddressanddata,thebusisnolongerneeded.
Operationsinvolvingotherdevicescanproceed.
(b)Thebusprotocolmaybedesignedsuchthatnoresponseisneededforwrite
operations,providedthatarrivaloftheaddressanddataintherstclockcycleis
guaranteed.Themainprecautionthatmustbetakenisthatthememoryinterface
cannotrespondtootherrequestsuntilithascompletedthewriteoperation.Thus,
asubsequentreadorwriteoperationmayencounteradditionaldelay.
Notethatwithoutaresponsesignaltheprocessorisnotinformedifthememory
doesnotreceivethedataforanyreason.Also,wehaveassumedasimpleuni-
processorenvironment.Foradiscussionoftheconstraintsinparallel-processing
systems,seeChapter12.
4.33.InthecaseofFigure4.24,thelackofresponsewillnotbedetectedandpro-
cessingwillcontinue,leadingtoerroneousresults.Forthisreason,aresponse
signalfromthedeviceshouldbeprovided,eventhoughitisnotessentialforbus
operation.TheschemesofbothFigures4.25and4.26providearesponsesignal,
Slave-ready.Noresponsewouldcausethebustohangup.Thus,aftersome
time-outperiodtheprocessorshouldabortthetransactionandbeginexecuting
anappropriatebuserrorexceptionroutine.
4.34.Thedevicemaycontainabuffertoholdtheaddressvalueifitrequiresadditional
timetodecodeitortoaccesstherequesteddata.Inthiscase,theaddressmaybe
removedfromthebusaftertherstcycle.
4.35.Minimumclockperiod=4+5+6+10+3=28ns
Maximumclockspeed=35.7MHz
Thesecalculationsassumenoclockskewbetweenthesenderandthereceiver.
4.36.
y}

busskew=4ns
9 0y =propagationdelay+addressdecoding+accesstime
=1to5+6+5to10=12to21ns


9 =propagationdelay+skew+setuptime
=1to5+4+3=8to12ns
9o0
=propagationdelay=1to5ns
Minimumcycle=4+12+8+1=25ns
Maximumcycle=4+21+12+5=42ns
13

Chapter5–TheMemorySystem
5.1.TheblockdiagramisessentiallythesameasinFigure5.10,exceptthat16rows
(offour5128chips)areneeded.AddresslinesA180areconnectedtoall
chips.AddresslinesA2219areconnectedtoa4-bitdecodertoselectoneofthe
16rows.
5.2.Theminimumrefreshrateisgivenby
5010
15
(4:53)
910
12
=8:3310
3
s
Therefore,eachrowhastoberefreshedevery8ms.
5.3.NeedcontrolsignalsMinandMouttocontrolstoringofdataintothememory
cellsandtogatethedatareadfromthememoryontothebus,respectively.A
possiblecircuitis
DQ DQ
ClkClkD
in
M
out
D
out
M
in
Data
Read/Write
circuits and latches
5.4.(a)Ittakes5+8=13clockcycles.
Totaltime=
13
(13310
6
)
=0:09810
6
s=98ns
Latency=
5
(13310
6
)
=0:03810
6
s=38ns
(b)Ittakestwiceaslongtotransfer64bytes,becausetwoindependent32-byte
transfershavetobemade.Thelatencyisthesame,i.e.38ns.
1

5.5.Afasterprocessorchipwillresultinincreasedperformance,buttheamount
ofincreasewillnotbedirectlyproportionaltotheincreaseinprocessorspeed,
becausethecachemisspenaltywillremainthesameifthemainmemoryspeed
isnotimproved.
5.6.(a)Mainmemoryaddresslengthis16bits.TAGeldis6bits.BLOCKeldis
3bits(8blocks).WORDeldis7bits(128wordsperblock).
(b)Theprogramwordsaremappedonthecacheblocksasfollows:
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
0 1024
127 1151
128 1152
255 1279
256 1280
383 1407
384 1408
511 1535
512
639
640
767
768
895
896
1023
Start
17
23
165
239
1200
1500
End
Hence,thesequenceofreadsfromthemainmemoryblocksintocacheblocksis
Block:0;1;2;3;4;5;6;7;0;1;
| {z }
Pass1
0;1;0;1;
|{z}
Pass2
0;1;:::;0;1;0;1;0;1;
|{z}
Pass9
0;1;0;1;2;3
|{z}
Pass10
2

Asthissequenceshows,boththebeginningandtheendoftheouterloopuse
blocks0and1inthecache.Theyoverwriteeachotheroneachpassthroughthe
loop.Blocks2to7remainresidentinthecacheuntiltheouterloopiscompleted.
Thetotaltimeforreadingtheblocksfromthemainmemoryintothecacheis
therefore
(10+49+2)12810=61;440
Executingtheprogramoutofthecache:
Outerloopinnerloop=[(120022)(239164)]101=11;030
Innerloop=(239164)2001=15;000
Endsectionofprogram=15001200=3001
Totalexecutiontime=87;770
5.7.Intherstpassthroughtheloop,theAddinstructionisstoredataddress4in
thecache,anditsoperand(A03C)ataddress6.Thentheoperandisoverwritten
bytheDecrementinstruction.TheBNEinstructionisstoredataddress0.Inthe
secondpass,thevalue05D9overwritestheBNEinstruction,thenBNEisread
fromthemainmemoryandagainstoredinlocation0.Thecontentsofthecache,
thenumberofwordsreadfromthemainmemoryandfromthecache,andthe
executiontimeforeachpassareasshownbelow.
005EBNE
005D
005D
Add
Dec
005EBNE
005D
005D
Add
Dec
005EBNE
005D
005D
Add
Dec
00AA10D7
Cache contentsMM accesses Cache accessesTimeAfter pass No.
1
2
4
2
1
0
2
3
40t
t
t
t
22
13
Total 5 75
3
7
3

5.8.Allthreeinstructionsarestoredinthecacheaftertherstpass,andtheyre-
maininplaceduringsubsequentpasses.Inthiscase,thereisatotalof6read
operationsfromthemainmemoryand6fromthecache.Executiontimeis66.
Instructionsanddataarebeststoredinseparatecachestoavoidthedataover-
writinginstructions,asinProblem5.7.
5.9.(a)4096blocksof128wordseachrequire12+7=19bitsforthemainmemory
address.
(b)TAGeldis8bits.SETeldis4bits.WORDeldis7bits.
5.10.(a)TAGeldis10bits.SETeldis4bits.WORDeldis6bits.
(b)Words0,1,2,,4351occupyblocks0to67inthemainmemory(MM).
Afterblocks0,1,2,,63havebeenreadfromMMintothecacheontherst
pass,thecacheisfull.Becauseofthefactthatthereplacementalgorithmis
LRU,MMblocksthatoccupytherstfoursetsofthe16cachesetsarealways
overwrittenbeforetheycanbeusedonasuccessivepass.Inparticular,MM
blocks0,16,32,48,and64continuallydisplaceeachotherincompetingfor
the4blockpositionsincacheset0.Thesamethingoccursincacheset1(MM
blocks,1,17,33,49,65),cacheset2(MMblocks2,18,34,50,66)andcache
set3(MMblocks3,19,35,51,67).MMblocksthatoccupythelast12sets
(sets4through15)arefetchedonceontherstpassandremaininthecachefor
thenext9passes.Ontherstpass,all68blocksoftheloopmustbefetched
fromtheMM.Oneachofthe9successivepasses,blocksinthelast12setsof
thecache(412=48)arefoundinthecache,andtheremaining20(6848)
blocksmustbefetchedfromtheMM.
Improvementfactor=
Timewithoutcache
Timewithcache
=
106810
16811+9(2011+481)
=2:15
5.11.Thisreplacementalgorithmisactuallybetteronthisparticular”large”loopex-
ample.Afterthecachehasbeenlledbythemainmemoryblocks0,1,,63
ontherstpass,block64replacesblock48inset0.Onthesecondpass,block
48replacesblock32inset0.Onthethirdpass,block32replacesblock16,and
onthefourthpass,block16replacesblock0.Onthefourthpass,therearetwo
replacements:0kicksout64,and64kicksout48.Onthesixth,seventh,and
eighthpasses,thereisonlyonereplacementinset0.Ontheninthpassthereare
tworeplacementsinset0,andonthenalpassthereisonereplacement.The
situationissimilarinsets1,2,and3.Again,thereisnocontentioninsets4
through15.Intotal,thereare11replacementsinset0inpasses2through10.
Thesameistrueinsets1,2,and3.Therefore,theimprovementfactoris
106810
16811+41111+(96844)1
=3:8
4

5.12.Fortherstloop,thecontentsofthecacheareasindicatedinFigures5.20
through5.22.Forthesecondloop,theyareasfollows.
(a)Direct-mappedcache
_ _
Contentsofdatacacheafterpass:
j=9i=1i=3i=5i=7i=9
Block
position
0
1
2
3
4
5
6
7
A(0,8)
A(0,9)
A(0,0)
A(0,1)
A(0,2)A(0,4)A(0,6)A(0,8)
A(0,3)A(0,5)A(0,7)A(0,9)
(b)Associative-mappedcache
_ _
A(0,9)
A(0,8)
A(0,7)
A(0,6)
A(0,2)
A(0,3)
A(0,4)
A(0,5)A(0,7) A(0,5)
A(0,4)
A(0,3)
A(0,2)
A(0,1)
A(0,0)
A(0,9)
A(0,8)
A(0,6)
A(0,5)
A(0,4)
A(0,3)
A(0,9)
A(0,8)
Contentsofdatacacheafterpass:
i=9i=5i=0j=9
Block
position
0
A(0,0)
A(0,7)
A(0,6)
A(0,5)
A(0,4)
A(0,3)
A(0,9)
A(0,8)
1
2
3
4
5
6
7
A(0,2)
5

(c)Set-associative-mappedcache
_ _
Set0
Set1
A(0,5)
A(0,4)
A(0,7)
A(0,6)
A(0,1)
A(0,3)
A(0,2)
A(0,7)
A(0,6)
i=7i=3
3
2
1
0
A(0,9)
A(0,8)
A(0,7)
A(0,6)
A(0,9)
A(0,8)
Contentsofdatacacheafterpass:
i=9j=9
Block
position
0
A(0,0)
1
2
3
Inall3cases,allelementsareoverwrittenbeforetheyareusedinthesecond
loop.ThissuggeststhattheLRUalgorithmmaynotleadtogoodperformanceif
usedwitharraysthatdonottintothecache.Theperformancecanbeimproved
byintroducingsomerandomnessinthereplacementalgorithm.
5.13.Thetwoleast-signicantbitsofanaddress,A10,specifyabytewithina32-bit
word.Foradirect-mappedcache,bitsA42specifytheblockposition.Fora
set-associative-mappedcache,bitA2speciestheset.
(a)Direct-mappedcache
_ _
Pass4Pass3Pass2Pass1
7
6
5
4
3
2
1
0
position
Block
Contentsofdatacacheafter:
[200]
[204]
[208]
[24C]
[2F0]
[2F4]
[218]
[21C] [21C]
[218]
[2F4]
[2F0]
[24C]
[208]
[204]
[200] [200]
[204]
[208]
[24C]
[2F0]
[2F4]
[218]
[21C][21C]
[218]
[2F4]
[2F0]
[24C]
[208]
[204]
[200]
Hitrate=33/48=0.69
6

(b)Associative-mappedcache
_ _
[200]
[204] [204]
[200][200]
[204]
[24C]
[21C]
[218]
[204]
[200]
Contentsofdatacacheafter:
Block
position
0
1
2
3
4
5
6
7
Pass1 Pass2 Pass3 Pass4
[24C]
[20C]
[2F4]
[2F0]
[21C]
[2F4] [2F4] [2F4]
[20C]
[2F0]
[218]
[218]
[21C]
[24C]
[20C]
[2F0]
[2F0]
[218]
[21C]
[24C]
[20C]
Hitrate=21/48=0.44
(c)Set-associative-mappedcache
_ _
Hitrate=30/48=0.63
2
Set1
3
1
0
Pass4Pass3Pass2Pass1
3
2
1
0
position
Block
Contentsofdatacacheafter:
[2F4]
[24C]
[204]
[218]
[2F0]
[208]
[200]
[21C]
[200]
[208]
[2F0]
[218]
[204]
[2F4] [2F4]
[204]
[218]
[2F0]
[208]
[200]
[21C]
[200]
[208]
[2F0]
[218]
[204]
[24C]
[2F4]
[21C] [21C]
[24C] [24C]
Set0
7

5.14.Thetwoleast-signicantbitsofanaddress,A10,specifyabytewithina32-bit
word.Foradirect-mappedcache,bitsA43specifytheblockposition.Fora
set-associative-mappedcache,bitA3speciestheset.
(a)Direct-mappedcache
_ _
Pass4Pass3Pass2Pass1
position
Block
Contentsofdatacacheafter:
[200]
[204]
[24C]
[2F0]
[2F4]
[218]
[21C] [21C]
[218]
[2F4]
[2F0]
[24C]
[204]
[200] [200]
[204]
[24C]
[2F0]
[2F4]
[218]
[21C][21C]
[218]
[2F4]
[2F0]
[24C]
[204]
[200]
0
1
2
3
[248] [248] [248] [248]
Hitrate=37/48=0.77
(b)Associative-mappedcache
_ _
[248][248]
3
2
1
0
[200]
[204]
[24C]
[2F0]
[2F4]
[218]
[21C]
[2F4]
[2F0]
[204]
[200]
[200]
[204]
[2F0]
[2F4]
[21C]
[218]
[2F4]
[2F0]
[24C]
[204]
[200]
Contentsofdatacacheafter:
Block
position
Pass1 Pass2 Pass3 Pass4
[218]
[21C]
[248]
[24C]
[218]
[21C]
[248]
[24C]
Hitrate=34/48=0.71
8

(c)Set-associative-mappedcache
_ _
[204][204][204][204]
[248][218][248][218]
[218][248][218][248]
[2F4][2F4][2F4][2F4]
1
1
0
0
Set1
Set0
[24C][24C]
[21C][21C] [24C]
[2F0]
[200]
[21C]
[200]
[2F0][2F0]
[200]
[21C]
[200]
[2F0]
[24C]
Contentsofdatacacheafter:
Block
position
Pass1 Pass2 Pass3 Pass4
Hitrate=34/48=0.71
5.15.Theblocksize(numberofwordsinablock)ofthecacheshouldbeatleast
aslargeas2
k
,inordertotakefulladvantageofthemultiplemodulememory
whentransferringablockbetweenthecacheandthemainmemory.Powerof2
multiplesof2
k
workjustasefciently,andarenaturalbecauseblocksizeis2
k
forkbitsinthe”word”eld.
5.16.Largersize
fewermissesifmostofthedataintheblockareactuallyused
wastefulifmuchofthedataarenotusedbeforethecacheblockisejected
fromthecache
Smallersize
moremisses
5.17.For16-wordblocksthevalueofMis1+8+34+4=25cycles.Then
Timewithoutcache
Timewithcache
=4:04
Inordertocomparethe8-wordand16-wordblocks,wecanassumethattwo
8-wordblocksmustbebroughtintothecacheforeach16-wordblock.Hence,
theeffectivevalueofMis217=34.Then
Timewithoutcache
Timewithcache
=3:3
9

Similarly,for4-wordblockstheeffectivevalueofMis4(1+8+4)=52cycles.
Then
Timewithoutcache
Timewithcache
=2:42
Clearly,interleavingismoreeffectiveiflargercacheblocksareused.
5.18.Thehitratesare
h1=h2=h=0:95forinstructions
=0:90fordata
Theaverageaccesstimeiscomputedas
tave=hC1+(1h)hC2+(1h)
2
M
(a)WithinterleavingM=17.Then
tave=0:951+0:050:9510+0:002517+0:3(0:91+0:10:910+0:0117)
=2:0585cycles
(b)WithoutinterleavingM=38.Thentave=2:174cycles.
(c)Withoutinterleavingtheaverageaccesstakes2:174=2:0585=1:056times
longer.
5.19.SupposethatittakesoneclockcycletosendtheaddresstotheL2cache,one
cycletoaccesseachwordintheblock,andonecycletotransferawordfromthe
L2cachetotheL1cache.ThisleadstoC2=6cycles.
(a)WithinterleavingM=1+8+4=13.Thentave=1:79cycles.
(b)WithoutinterleavingM=1+8+34+1=22.Thentave=1:86cycles.
(c)Withoutinterleavingtheaverageaccesstakes1:86=1:79=1:039times
longer.
5.20.Theanalogyisgoodwithrespectto:
relativesizesoftoolbox,truckandshopversusL1cache,L2cacheand
mainmemory
relativeaccesstimes
relativefrequencyofuseoftoolsinthe3storageplacesversusthedata
accessesincachesandthemainmemory
Theanalogyfailswithrespecttothefactsthat:
atthestartofaworkingdaythetoolsplacedintothetruckandthetoolbox
arepreselectedbasedontheexperiencegainedonpreviousjobs,whilein
thecaseofanewprogramthatisrunonacomputerthereisnorelevant
dataloadedintothecachesbeforeexecutionbegins
10

mostofthetoolsinthetoolboxandthetruckareusefulinsuccessivejobs,
whilethedataleftinacachebyoneprogramarenotusefulforthesubse-
quentprograms
toolsdisplacedbytheneedtouseothertoolsareneverthrownaway,while
datainthecacheblocksaresimplyoverwritteniftheblocksarenotagged
asdirty
5.21.Each32-bitnumbercomprises4bytes.Hence,eachpageholds1024numbers.
Thereisspacefor256pagesinthe1M-byteportionofthemainmemorythatis
allocatedforstoringdataduringthecomputation.
(a)Eachcolumnisonepage;therewillbe1024pagefaults.
(b)Processingofentirecolumns,oneatatime,wouldbeveryinefcientand
slow.However,ifonlyonequarterofeachcolumn(forallcolumns)isprocessed
beforethenextquarterisbroughtinfromthedisk,theneachelementofthearray
mustbeloadedintothememorytwice.Inthiscase,thenumberofpagefaults
wouldbe2048.
(c)Assumingthatthecomputationtimeneededtonormalizethenumbersis
negligiblecomparedtothetimeneededtobringapagefromthedisk:
Totaltimefor(a)is102440ms=41s
Totaltimefor(b)is204840ms=82s
5.22.Theoperatingsystemmayincreasethemainmemorypagesallocatedtoapro-
gramthathasalargenumberofpagefaults,usingspacepreviouslyallocatedto
aprogramwithafewpagefaults.
5.23.Continuingtheexecutionofaninstructioninterruptedbyapagefaultrequires
savingtheentirestateoftheprocessor,whichincludessavingallregistersthat
mayhavebeenaffectedbytheinstructionaswellasthecontrolinformationthat
indicateshowfartheexecutionhasprogressed.Thealternativeofre-executing
theinstructionfromthebeginningrequiresacapabilitytoreverseanychanges
thatmayhavebeencausedbythepartialexecutionoftheinstruction.
5.24.Theproblemisthatapagefaultmayoccurduringintermediatestepsintheexe-
cutionofasingleinstruction.Thepagecontainingthereferencedlocationmust
betransferredfromthediskintothemainmemorybeforeexecutioncanproceed.
Sincethetimeneededforthepagetransfer(adiskoperation)isverylong,as
comparedtoinstructionexecutiontime,acontext-switchwillusuallybemade.
(Acontext-switchconsistsofpreservingthestateofthecurrentlyexecutingpro-
gram,and”switching”theprocessortotheexecutionofanotherprogramthatis
residentinthemainmemory.)Thepagetransfer,viaDMA,takesplacewhile
thisotherprogramexecutes.Whenthepagetransferiscomplete,theoriginal
programcanberesumed.
Therefore,oneoftwofeaturesareneededinasystemwheretheexecutionof
anindividualinstructionmaybesuspendedbyapagefault.Therstpossibility
11

istosavethestateofinstructionexecution.Thisinvolvessavingmoreinfor-
mation(temporaryprogrammer-transparentregisters,etc.)thanneededwhena
programisinterruptedbetweeninstructions.Thesecondpossibilityisto”un-
wind”theeffectsoftheportionoftheinstructioncompletedwhenthepagefault
occurred,andthenexecutetheinstructionfromthebeginningwhentheprogram
isresumed.
5.25.(a)Themaximumnumberofbytesthatcanbestoredonthisdiskis2414000
400512=68:810
9
bytes.
(b)Thedatatransferrateis(4005127200)=60=24:5810
6
bytes/s.
(c)Need9bitstoidentifyasector,14bitsforatrack,and5bitsforasurface.
Thus,apossibleschemeistouseaddressbitsA80forsector,A229fortrack,
andA2723forsurfaceidentication.BitsA3128arenotused.
5.26.Theaverageseektimeandrotationaldelayare6and3ms,respectively.The
averagedatatransferratefromatracktothedatabufferinthediskcontrolleris
34Mbytes/s.Hence,ittakes8K/34M=0.23mstotransferablockofdata.
(a)Thetotaltimeneededtoaccesseachblockis9+0:23=9:23ms.The
portionoftimeoccupiedbyseekandrotationaldelayis9/9.23=0.97=97%.
(b)Onlyrotationaldelaysareinvolvedin90%ofthecases.Therefore,theaver-
agetimetoaccessablockis0:93+0:19+0:23=3:89ms.Theportion
oftimeoccupiedbyseekandrotationaldelayis3.6/3.89=0.92=92%.
5.27.(a)Therateoftransfertoorfromanyonediskis30megabytespersecond.
Maximummemorytransferrateis4/(1010
9
)=40010
6
bytes/s,whichis
400megabytespersecond.Therefore,13diskscanbesimultaneouslyowing
datato/fromthemainmemory.
(b)8K/30M=0.27msisneededtotransfer8Kbytesto/fromthedisk.Seekand
rotationaldelaysare6msand3ms,respectively.Therefore,8K/4=2Kwords
aretransferredin9.27ms.Butin9.27msthereare(9:2710
3
)=(0:01
10
6
)=92710
3
memory(word)cyclesavailable.Therefore,overalong
periodoftime,anyonediskstealsonly(2=927)100=0:2%ofavailable
memorycycles.
5.28.Thesectorsizeshouldinuencethechoiceofpagesize,becausethesectoristhe
smallestdirectlyaddressableblockofdataonthediskthatisreadorwrittenasa
unit.Therefore,pagesshouldbesomesmallintegralnumberofsectorsinsize.
5.29.Thenextrecord,j,tobeaccessedafteraforwardreadofrecordihasjustbeen
completedmightbeintheforwarddirection,withprobability0.5(4records
distancetothebeginningofj),ormightbeinthebackwarddirectionwithprob-
ability0.5(6recordsdistancetothebeginningofjplus2directionreversals).
Timetoscanoveronerecordandaninterrecordgapis
1
800
s
cm

1
2000
cm
bit
4000bits1000ms+3=2:5+3=5:5ms
12

Therefore,averageaccessandreadtimeis
0:5(45:5)+0:5(65:5+2225)+5:5=258ms
Ifrecordscanbereadwhilemovinginbothdirections,averageaccessandread
timeis
0:5(45:5)+0:5(55:5+225)+5:5=142:75ms
Therefore,theaveragepercentagegainis(258142:75)=258100=44:7%
Themajorgainisbecausetherecordsbeingreadarerelativelyclosetogether,
andonelessdirectionreversalisneeded.
13

Chapter6–Arithmetic
6.1.Overowcasesarespecicallyindicated.Inallothercases,nooverowoccurs.
010110 (+22) 101011 (21) 111111 (1)
+001001 +(+9) +100101 +(27) +000111 +(+7)
011111 (+31) 010000 (48) 000110 (+6)
overow
011001 (+25) 110111 (9) 010101 (+21)
+010000 +(+16) +111001 +(7) +101011 +(21)
101001 (+41) 110000 (16) 000000 (0)
overow
010110 (+22) 010110
011111 (+31) +100001
(9) 110111
111110 (2) 111110
100101 (27) +011011
(+25) 011001
100001 (31) 100001
011101 (+29) +100011
(60) 000100
overow
111111 (1) 111111
000111 (+7) +111001
(8) 111000
000111 (+7) 000111
111000 (8) +001000
(+15) 001111
011010 (+26) 011010
100010 (30) +011110
(+56) 111000
overow
1

6.2.(a)Inthefollowinganswers,roundinghasbeenusedasthetruncationmethod
(seeSection6.7.3)whentheanswercannotberepresentedexactlyinthesigned
6-bitformat.
0.5:010000allcases
0.123:100100Sign-and-magnitude
1110111's-complement
1111002's-complement
0.75:111000Sign-and-magnitude
1001111's-complement
1010002's-complement
0.1:100011Sign-and-magnitude
1111001's-complement
1111012's-complement
(b)
e=2
6
(assumingrounding,asin(a))
e=2
5
(assumingchoppingorVonNeumannrounding)
(c)assumingrounding:
(a)3
(b)6
(c)9
(d)19
6.3.Thetwoternaryrepresentationsaregivenasfollows:
Sign-and-magnitude3's-complement
+11011 011011
10222 212001
+2120 002120
1212 221011
+10 000010
201 222022
2

6.4.Ternarynumberswithadditionandsubtractionoperations:
Decimal Ternary Ternary
Sign-and-magnitude Sign-and-magnitude 3's-complement
56 +2002 002002
37 1101 221122
122 11112 011112
123 11120 211110
Additionoperations:
002002 002002 002002
+221122 +011112 +211110
000201 020121 220112
221122 221122 011112
+011112 +211110 +211110
010011 210002 222222
Subtractionoperations:
002002 002002
221122 +001101
010110
002002 002002
011112 +211111
220120
002002 002002
211110 +011120
020122
221122 221122
011112 +211111
210010
221122 221122
211110 +011120
010012
011112 011112
211110 +011120
100002
overow
3

6.5.(a)
x
y
x
y
x
y
c
s
xy sc
00 00
01 10
10 10
11 01
s=x Å y
c= x y
(b)
Half
adder
Half
adder
c
i+1
c
i
s
i
x
i
y
i
s
c
s
c
(c)ThelongestpaththroughthecircuitinPart(b)is6gatedelays(including
inputinversions)inproducingsi;andthelongestpaththroughthecircuitin
Figure6.2ais3gatedelaysinproducingsi,assumingthatsiisimplementedas
atwo-levelAND-ORcircuit,andincludinginputinversions.
4

6.6.AssumethatthebinaryintegerisinmemorylocationBINARY,andthestringof
bytesrepresentingtheanswerstartsatmemorylocationDECIMAL,high-order
digitsrst.
68000Program:
MOVE #10,D2
CLR.L D1
MOVE BINARY,D1 Getbinarynumber;
notethathigh-order
wordinD1isstillzero.
MOVE.B#4,D3 UseD3ascounter.
LOOPDIVU D2,D1 Leavesquotientin
lowhalfofD1and
remainderinhighhalf
ofD1.
SWAP D1
MOVE.BD1,DECIMAL(D3)
CLR D1 ClearslowhalfofD1.
SWAP D1
DBRA D3,LOOP
IA-32Program:
MOVEBX,10
MOVEAX,BINARY Getbinarynumber.
LEA EDI,DECIMAL
DEC EDI
MOVECX,5 LoadcounterECX.
LOOPSTART:DIV EBX [EAX]/[EBX];quotient
inEAXandremainder
inEDX.
MOV[EDI+ECX],DL
LOOPLOOPSTART
5

6.7.TheARMandIA-32subroutinesbothusethefollowingalgorithmtoconvertthe
four-digitdecimalintegerD3D2D1D0(eachDiisinBCDcode)intobinary:
MoveD0intoregisterREG.
MultiplyD1by10.
AddproductintoREG.
MultiplyD2by100.
AddproductintoREG.
MultiplyD3by1000.
AddproductintoREG.
(i)TheARMsubroutineassumesthattheaddressesDECIMALandBINARY
arepassedtoitontheprocessorstackinpositionsparam1andparam2asshown
inFigure3.13.Thesubroutinerstsavesregistersandsetsuptheframepointer
FP(R12).
ARMSubroutine:
CONVERTSTMFD SP!,fR0R6,FP,LRgSaveregisters.
ADD FP,SP,#28 Loadframepointer.
LDR R0,[FP,#8] LoadR0andR1
LDR R0,[R0] withdecimaldigits.
MOV R1,R0
AND R0,R0,#&F [R0]=D0.
MOV R2,#&F LoadmaskbitsintoR2.
MOV R4,#10 Loadmultipliers
MOV R5,#100 intoR4,R5,andR6.
MOV R6,#1000
AND R3,R2,R1,LSR#4 GetD1intoR3.
MLA R0,R3,R4,R0 Add10D1intoR0.
AND R3,R2,R1,LSR#8 GetD2intoR3.
MLA R0,R3,R5,R0 Add100D2intoR0.
AND R3,R2,R1,LSR#12GetD3intoR3.
MLA R0,R3,R6,R0 Add1000D3intoRo.
LDR R1,[FP,#12] Storeconvertedvalue
STR R0,[R1] intoBINARY.
LDMFD SP!,fR0R6,FP,PCgRestoreregisters
andreturn.
6

(ii)TheIA-32subroutineassumesthattheaddressesDECIMALandBINARY
arepassedtoitontheprocessorstackinpositionsparam1andparam2asshown
inFigure3.48.ThesubroutinerstsetsuptheframepointerEBP,andthen
allocatesandinitializesthelocalvariables10,100,and1000,onthestack.
IA-32Subroutine:
CONVERT:PUSHEBP Setupframe
MOVEBP,ESP pointer.
PUSH10 Allocateandinitialize
PUSH100 localvariables.
PUSH1000
PUSHEDX Saveregisters.
PUSHESI
PUSHEAX
MOVEDX,[EBP+8]Loadfourdecimal
MOVEDX,[EDX] digitsinto
MOVESI,EDX EDXandESI.
AND EDX,FH [EDX]=D0.
SHR ESI,4
MOVEAX,ESI
AND EAX,FH
MUL [EBP4]
ADD EDX,EAX [EDX]=binaryofD1D0.
SHR ESI,4
MOVEAX,ESI
AND EAX,FH
MUL [EBP8]
ADD EDX,EAX [EDX]=binaryofD2D1D0.
SHR ESI,4
MOVEAX,ESI
AND EAX,FH
MUL [EBP12]
ADD EDX,EAX [EDX]=binaryofD3D2D1D0.
MOVEAX,[EBP+12]Storeconverted
MOV[EAX],EDX valueintoBINARY.
POP EAX Restoreregisters.
POP ESI
POP EDX
ADD ESP,12 Removelocalparameters.
POP EBP RestoreEBP.
RET Return.
7

(iii)The68000subroutineusesaloopstructuretoconvertthefour-digitdecimal
integerD3D2D1D0(eachDiisinBCDcode)intobinary.Attheendofsucces-
sivepassesthroughtheloop,registerD0containstheaccumulatingvaluesD3,
10D3+D2,100D3+10D2+D1,andbinary=1000D3+100D2+10D1+D0.
AssumethatDECIMAListheaddressofa16-bitwordcontainingthefourBCD
digits,andthatBINARYistheaddressofa16-bitwordthatistocontainthe
convertedbinaryvalue.
TheaddressesDECIMALandBINARYarepassedtothesubroutineinregisters
A0andA1.
68000Subroutine:
CONVERTMOVEM.LD0D2,(A7)Saveregisters.
CLR.L D0
CLR.L D1
MOVE.W (A0),D1 Loadfourdecimal
digitsintoD1.
MOVE.B #3,D2 LoadcounterD3.
LOOP MULU.W #10,D0 Multiplyaccumulated
valueinD0by10.
ASL.L #4,D1 BringnextDidigit
SWAP.W D1 intolowhalfofD1.
ADD.W D1,D0 Addintoaccumulated
valueinD0.
CLR.W D1 Clearoutcurrent
SWAP.W D1 digitandbring
remainingdigitsinto
lowhalfofD1.
DBRA D2,LOOP Checkifdone.
MOVE.W D0,(A1) Storebinaryresult
inBINARY.
MOVEM.L(A7)+,D0D2Restoreregisters.
RTS Return.
8

6.8.(a)Theoutputcarryis1whenA+B10.Thisistheconditionthatrequires
thefurtheradditionof610.
(b)
(1)0101 5
+0110 +6
1011>101011
+0110
0001
outputcarry=1
(2)0011 3
+0100 +4
0111<1010 7
(c)
B
0B
1B
2B
3A
0A
1A
2A
3
S
3S
2S
1S
0
S
3S
2S
1S
0
ÒignoreÓ 0
c
out
0 0
c
in4-bit adder
4-bit adder
Ò+6
10Ó
S
3S
2S
1S
0
9

6.9.ConsiderthetruthtableinFigure6.1forthecasei=n1,thatis,forthesign
bitposition.Overowoccursonlywhenxn1andyn1arethesameandsn1
isdifferent.Thisoccursinthesecondandseventhrowsofthetable;andcnand
cn1aredifferentonlyinthoserows.Therefore,cncn1isacorrectindicator
ofoverow.
6.10.(a)Theadditionallogicisdenedbythelogicexpressions:
c16=G
II
0+P
II
0c0
c32=G
II
1
+P
II
1
G
II
0
+P
II
1
P
II
0
c0
c48=G
II
2+P
II
2G
II
1+P
II
2P
II
1G
II
0+P
II
2P
II
1P
II
0c0
c64=G
II
3
+P
II
3
G
II
2
+P
II
3
P
II
2
G
II
1
+P
II
3
P
II
2
P
II
1
G
II
0
+P
II
3
P
II
2
P
II
1
P
II
0
c0
Thisadditionallogicisidenticalinformtothelogicinsidethelookaheadcircuit
inFigure6.5.(Notethattheoutputsc16,c32,c48,andc64,producedbythe16-
bitaddersarenotneededbecausethoseoutputsareproducedbytheadditional
logic.)
(b)TheinputsG
II
i
andP
II
i
totheadditionallogicareproducedafter5gate
delays,thesameasthedelayforc16inFigure6.5.Thenalloutputsfromthe
additionallogic,includingc64,areproduced2gatedelayslater,foratotalof7
gatedelays.Thecarryinputc48tothelast16-bitadderisproducedafter7gate
delays.Thenc60intothelast4-bitadderisproducedafter2moregatedelays,
andc63isproducedafteranother2gatedelaysinsidethat4-bitadder.Finally,
afteronemoregatedelay(anXORgate),s63isproducedwithatotalof7+2+
2+1=12gatedelays.
(c)Thevariabless31andc32areproducedafter12and7gatedelays,respec-
tively,inthe64-bitadder.Thesetwovariablesareproducedafter10and7gate
delaysinthe32-bitadder,asshowninSection6.2.1.
10

6.11.(a)EachBcellrequires3gatesasshowninFigure6.4a.Thecarriesc1,c2,
c3,andc4,require2,3,4,and5,gates,respectively;andtheoutputsG
I
0and
P
I
0
require4and1gates,asseenfromthelogicexpressionsinSection6.2.1.
Therefore,atotalof12+19=31gatesarerequiredforthe4-bitadder.
(b)Four4-bitaddersrequire431=124gates,andthecarry-lookaheadlogic
blockrequires19gatesbecauseithasthesamestructureasthelookaheadblock
inFigure6.4.Totalgatecountisthus143.However,weshouldsubtract45=
20gatesfromthistotalcorrespondingtothelogicforc4,c8,c12,andc16,that
isinthe4-bitaddersbutwhichisreplacedbythelookaheadlogicinFigure6.5.
Therefore,totalgatecountforthe16-bitadderis14320=123gates.
6.12.Theworstcasedelaypathisshowninthefollowinggure:
Row 2
Row 3
Row (n-1)
Row n
n cells
EachofthetwoFAblocksinrows2throughn1introduces2gatedelays,for
atotalof4(n2)gatedelays.Rownintroduces2ngatedelays.Addinginthe
initialANDgatedelayforrow1andallothercells,totaldelayis:
4(n2)+2n+1=6n8+1=6(n1)1
11

6.13.Thesolutions,includingdecimalequivalentchecks,are:
_ _
´
(105)
(21)
(5)
(105)
´
000011
001000111
B
A
00101
11100
00011
0
01100
11000
10000
011
001
00111 512
20
1
4
12

6.14.Themultiplicationanddivisionchartsare:
_ _
0
0
0
0
0
0
1
1
000101
000000
add
000000000000
add
000000
000101
1
1
0
0
1
1
shift
shift
add
subtract
shift
3rdcycle
2ndcycle
0
0
1
1
111011
111101
000101
111000
0111100
111011
0000010
Initialcon®guration
000101
00000010101
1010100000
C A Q
0
0
0
0
0
0
0
0
0
0
00101
00010
00010
00001
00110
00011
00011
00001
00110
00011
10101
11010
11010
01101
01101
00110
00110
10011
10011
01001
1stcycle
2ndcycle
3rdcycle
4thcycle
5thcycle
0
00101
M
product
0
0
1
1
0
0
0
0
1
Initialcon®guration
1stcycle
QA
M
A´B:
A/B:
0
0
0
000000
111011
111011
1
1
4thcycle
shift
subtract
0
0 1
1
shift
0
000101addadd
0
0
1
1
0
0
add
110111
000101
111100
000101
000001
5thcycle
quotient
remainder
13

6.15.ARMProgram:
UseR0astheloopcounter.
MOV R1,#0
MOV R0,#32
LOOPTST R2,#1 TestLSBofmultiplier.
ADDNE R1,R3,R1 AddmultiplicandifLSB=1.
MOV R1,R1,RRXShift[R1]and[R2]right
MOV R2,R2,RRXonebitposition,with[C].
SUBS R0,R0,#1 Checkifdone.
BGT LOOP
68000program:
AssumethatD2andD3containthemultiplierandthemultiplicand,respectively.
Thehigh-andlow-orderhalvesoftheproductwillbestoredinD1andD2.Use
D0astheloopcounter.
CLR.L D1
MOVE.B#31,D0
LOOP ANDI.W #1,D2 TestLSBofmultiplier.
BEQ NOADD
ADD.L D3,D1 AddmultiplicandifLSB=1.
NOADDROXR.L#1,D1 Shift[D1]and[D2]right
ROXR.L#1,D2 onebitposition,with[C].
DBRA D0,LOOPCheckifdone.
IA-32Program:
UseregistersEAX,EDX,andEDI,asR1,R2,andR3,respectively,anduse
ECXastheloopcounter.
MOVEAX,0
MOVECX,32
SHR EDX,1 Set[CF]=LSBofmultiplier.
LOOPSTART:JNC NOADD
ADD EAX,EDI AddmultiplicandifLSB=1.
NOADD: RCR EAX,1 Shift[EAX]and[EDX]right
RCR EDX,1 onebitposition,with[CF].
LOOPLOOPSTARTCheckifdone.
14

6.16.ARMProgram:
UsetheregisterassignmentR1,R2,andR0,forthedividend,divisor,andre-
mainder,respectively.Ascomputationproceeds,thequotientwillbeshiftedinto
R1.
MOV R0,#0 ClearR0.
MOV R3,#32 InitializecounterR3.
LOOPMOVS R1,R1,LSL#1Two-registerleft
ADCS R0,R0,R0 shiftofR0andR1
byoneposition.
SUBCCS R0,R0,R2 Implementstep1
ADDCSS R0,R0,R2 ofthealgorithm.
ORRPL R1,R1,#1
SUBS R3,R3,#1 Checkifdone.
BGT LOOP
TST R0,R0 Implementstep2
ADDMI R0,R2,R0 ofthealgorithm.
68000Program:
AssumethatD1andD2containthedividendandthedivisor,respectively.We
willuseD0tostoretheremainder.Ascomputationproceeds,thequotientwill
beshiftedintoD1.
CLR D0 ClearD0.
MOVE.B#15,D3 InitializecounterD3.
LOOP ASL #1,D1 Two-registerleftshiftof
ROXL #1,D0 D0andD1byoneposition.
BCS NEGRM Implementstep1
SUB D2,D0 ofthealgorithm.
BRA SETQ
NEGRM ADD D2,D0
SETQ BMI COUNT
ORI #1,D1
COUNT DBRA D3,LOOPCheckifdone.
TST D0 Implementstep2
BPL DONE ofthealgorithm.
ADD D2,D0
DONE ...
15

IA-32Program:
UsetheregisterassignmentEAX,EBX,andEDX,forthedividend,divisor,and
remainder,respectively.Ascomputationproceeds,thequotientisshiftedinto
EAX.
MOVEDX,0 ClearEDX.
MOVECX,32 InitializecounterECX.
LOOPSTART:SHL EAX,1 Two-registerleft
RCL EDX,1 shiftofEDXandEAX
byoneposition.
JC NEGRM Implementstep1
SUB EDX,EBX ofthealgorithm.
JMP SETQ
NEGRM: ADD EDX,EBX
SETQ: JS COUNT
OR EAX,1
COUNT: LOOPLOOPSTARTCheckifdone.
TESTEDX,EDX Implementstep2
JNS DONE ofthealgorithm.
ADD EDX,EBX
DONE: ...
16

6.17.Themultiplicationanswersare:
_ _
´
-1 -1
-1
00 0
111100
+1
+1
011
+1
110
00
11111
1
11121
1111121
sign
extension
+1
+1
11111
0 00111111111
0111
0
11111101 100
0
111010000
1 1000
0
00110
0
1
0
01 11
0
1 0
00
0
110011
111000
0011 00000000
0000
11
0
011
0
1 110
1
0
01
10
0
011100000000
0111111
0000 011
´
´
´
´
´
´
(d)
(c)
(b)
(a) 010111
110110
101100
110011
011011
´ ´
´
+23
-230
-297
sign
extension
extension
sign
00
1111
000
0
111100011100
11
0
0
11111111
1110000 1
1
10000111000
´
0
15
15
225
-13
260
´
-10
-20
001111
001111
110101
27
-11
-1-1
-1
-1
17

6.18.Themultiplicationanswersare:
_ _
-1
-1
1112111
0001
1111
1
011011
110101
1
1
01011
0
100000000011
01
1
10100000
1011
010111
110110
0101
010010111111
0000101110
10010111
010110001111
0
1 0
101100
110011 1011
0
0000011
10
0 01
000011
0000000 11000
11111111 000
´
´
´
´
11
(a)
(b)
(c)
(d)
11
+2
+2
11 11
001111
001111
00
+1
11111111
11110000
10001110000 0
-2
-1-1
-1-1
18

6.19.BoththeAandMregistersareaugmentedbyonebittothelefttoholdasign
extensionbit.Theadderischangedtoann+1-bitadder.Abitisaddedtothe
rightendoftheQregistertoimplementtheBoothmultiplierrecodingoperation.
Itisinitiallysettozero.Thecontrollogicdecodesthetwobitsattherightendof
theQregisteraccordingtotheBoothalgorithm,asshowninthefollowinglogic
circuit.Therightshiftisanarithmeticrightshiftasindicatedbytherepetition
oftheextendedsignbitattheleftendoftheAregister.(Theonlycasethat
actuallyrequiresthesignextensionbitiswhenthen-bitmultiplicandisthevalue
2
(n1)
;forallotheroperands,theAandMregisterscouldhavebeenn-bit
registersandtheaddercouldhavebeenann-bitadder.)
q
n1Ð
m
n1Ð
n +1
Multiplicand M
Control
sequencer
Multiplier Q
Shift right (Arithmetic)
Register A (initially 0)
a
n1Ð
a
0
q
0
m
0
MUX
Nothing
Add M
Subtract M
Nothing
00
01
10
11
0
m
n
Nothing
Add M
Subtract M
O
M
M
bit
adder
~
~
~
ignore
Nothing
Add M
Subtract M
0
1
a
n
sign
extension
bit
19

6.20(a)
1110 2
1101 3
1110 6
0000
1000
0000
0110
(b)
0010 2
1110 2
0000 4
0100
1000
0000
1100
Thistechniqueworkscorrectlyforthesamereasonthatmodularadditioncan
beusedtoimplementsigned-numberadditioninthe2's-complementrepresenta-
tion,becausemultiplicationcanbeinterpretedasasequenceofadditionsofthe
multiplicandtoshiftedversionsofitself.
20

6.21.Thefour32-bitsubproductsneededtogeneratethe64-bitproductarelabeledA,
B,C,andD,andshownintheirpropershiftedpositionsinthefollowinggure:
A
R
0
R
1
R
2
R
3
R
2 R
0
R
2
R
1
R
3
R
0
R
3 R
1
R
15
R
14
R
13
R
12
B
C
D
X
X
X
X
X
21

The64-bitproductisthesumofA,B,C,andD.Usingregistertransfersand
multiplicationandadditionoperationsexecutedbythearithmeticunitdescribed,
the64-bitproductisgeneratedwithoutusinganyextraregistersbythefollowing
steps:
R12 [R0]
R13 [R2]
R14 [R1]
R15 [R3]
R3 [R14]
R1 [R15]
R13;R12 [R13][R12]
R15;R14 [R15][R14]
R3;R2 [R3][R2]
R1;R0 [R1][R0]
R13 [R2]Add[R13]
R14 [R3]Addwithcarry[R14]
R15 0Addwithcarry[R15]
R13 [R0]Add[R13]
R14 [R1]Addwithcarry[R14]
R15 0Addwithcarry[R15]
Thisproceduredestroystheoriginalcontentsoftheoperandregisters.Steps5
and6resultinswappingthecontentsofR1andR3sothatsubproductsBand
Ccanbecomputedinadjacentregisterpairs.Steps11,12,and13,addthe
subproductBintothe64-bitproductregisters;andsteps14,15,and16,addthe
subproductCintotheseregisters.
22

6.22.(a)TheworstcasedelaypathinFigure6.16aisalongthestaircasepatternthat
includesthetwoFAblocksattherightendofeachofthersttworows(atotal
offourFAblockdelays),followedbythefourFAblocksinthethirdrow.Total
delayistherefore17gatedelays,includingtheinitialANDgatedelaytodevelop
allbitproducts.
InFigure6.16b,theworstcasedelaypathisverticallythroughthersttworows
(atotaloftwoFAblockdelays),followedbythefourFAblocksinthethirdrow
foratotalof13gatedelays,includingtheinitialANDgatedelaytodevelopall
bitproducts.
(b)Botharraysare44cases.
Notethat17istheresultofapplyingtheexpression6(n1)1withn=4for
thearrayinFigure6.16a.
AsimilarexpressionfortheFigure6.16barrayisdevelopedasfollows.The
delaythrough(n2)carry-saverowsofFAblocksis2(n2)gatedelays,
followedby2ngatedelaysalongthenFAblocksofthelastrow,foratotalof
2(n2)+2n+1=4(n1)+1
gatedelays,includingtheinitialANDgatedelaytodevelopallbitproducts.The
answeristhus13,ascomputeddirectlyinPart(a),forthe44case.
6.23.Thenumberofreductionstepsntoreduceksummandsto2isgivenbyk(2=3)
n
=
2,becauseeachstepreduces3summandsto2.Thenwehave:
log
2k+n(log
22log
23)=log
22
log
2k=1+n(log
23log
22)
=1+n(1:591)
n=
(log
2
k)1
0:59
=1:7log
2
k1:7
Thisanswerisonlyanapproximationbecausethenumberofsummandsisnota
multipleof3ineachreductionstep.
23

6.24.(a)SixCSAlevelsareneeded:
1
2
6
5
4
3
(b)EightCSAlevelsareneeded:
1
2
6
5
4
3
8
7
(c)Theapproximationgives5.1and6.8CSAlevels,comparedto6and8from
Parts(a)and(b).
24

6.25.(a)
+1.7001111101101
0.012101000100010
+19010011001100
1/8001100000000
“Rounding”hasbeenusedasthetruncationmethodintheseanswers.
(b)Otherthanexact0andinnity,thesmallestnumbersare1:0000002
14
andthelargestnumbersare1:1111112
15
.
(c)Assumingsign-and-magnitudeformat,thesmallestandlargestintegers(other
than0)are1and(2
11
1);andthesmallestandlargestfractions(otherthan
0)are2
11
andapproximately1.
(d)
A+B=010001000000
AB=010001110110
AB=110010001011
A=B=110000011011
“Rounding”hasbeenusedasthetruncationmethodintheseanswers.
6.26.(a)ShiftthemantissaofBrighttwopositions,andtentativelysettheexponent
ofthesumto100001.Addmantissas:
(A) 1.11111111000
(B) 0.01001010101
10.01001001101
Shiftrightonepositiontoputinnormalizedform:1.001001001101andincrease
exponentofsumto100010.Truncatethemantissatotherightofthebinarypoint
to9bitsbyroundingtoobtain001001010.Theansweris0100010001001010.
(b)
Largest22
31
Smallest12
30
Thisassumesthatthetwoendvalues,63and0intheexcess-31exponent,are
usedtorepresentinnityandexact0,respectively.
25

6.27.LetAandBbetwooating-pointnumbers.First,assumethatSA=SB=0.
IfE
0
A
>E
0
B
,consideredasunsigned8-bitnumbers,thenA>B.IfE
0
A
=E
0
B
,
thenA>BifMA>MB.ThismeansthatA>Bifthe31bitsafterthesign
intherepresentationforAisgreaterthanthe31bitsrepresentingB,whenboth
areconsideredasintegers.Inthelogiccircuitshownbelow,allpossibilitiesfor
thesumbitarealsotakenintoaccount.Inthecircuit,letA=a31a30:::a0and
B=b31b30:::b0bethetwooating-pointnumberstobecompared.
Y= b
31b
30Éb
0X= a
31a
30Éa
0
b
31
a
31
X = YX > Y
A = B
A > B
These two outputs give the floating-point comparison.
If neither of these outputs is 1, then A < B.
32-bit unsigned
integer comparator
6.28.Convertthegivendecimalmantissaintoabinaryoating-pointnumberbyusing
theintegerfacilitiesinthecomputertoimplementtheconversionalgorithmsin
AppendixE.Thiswillyieldaoating-pointnumberfi.Then,usingthecom-
puter'soating-pointfacilities,computefiti,asrequired.
6.29.(0:1)
10
)(0:00011001100:::)
Thesigned,8-bitapproximationstothisdecimalnumberare:
Chopping: (0:1)10=(0:0001100)2
VonNeumannRounding:(0:1)10=(0:0001101)2
Rounding: (0:1)10=(0:0001101)2
26

6.30.ConsiderAB,whereAandBare6-bit(normalized)mantissasofoating-
pointnumbers.Becauseofdifferencesinexponents,Bmustbeshifted6posi-
tionsbeforesubtraction.
A=0:100000
B=0:100001
Aftershifting,wehave:
A=0.100000000
B=0.000000101 stickybit
0.011111011
normalize0.111110110
round0.111111 correctanswer(rounded)
Withonly2guardbits,wewouldhavehad:
A=0.10000000
B=0.00000011
0.01111101
normalize0.11111010
round0.111110
6.31.Thebinaryversionsofthedecimalfractions0:123and0:1arenotexact.
Using3guardbits,withthelastbitbeingthestickybit,thefractions0.123and
0.1arerepresentedas:
0:123=0:00011111
0:1=0:00011001
Thethreerepresentationsforbothfractionsusingeachofthethreetruncation
methodsare:
Chop VonNeumann Round
0:123:Sign-and-magnitude1.00011 1.00011 1.00100
1's-complement 1.11100 1.11100 1.11011
2's-complement 1.11101 1.11101 1.11100
0:1:Sign-and-magnitude1.00011 1.00011 1.00011
1's-complement 1.11100 1.11100 1.11100
2's-complement 1.11101 1.11101 1.11101
27

6.32.Therelevanttruthtableandlogicequationsare:
_ _
S
B8
sS
B8
s
25
s=1
1100
(25
s)
ADD(0)/
SUBTRACT(1)
(AS)
1
1
0
0
1
0
1
0
0
1
0
1
0
0
1
1
0
1
1
1
0
0
1
0
1
0
1
0
1
1
0
signfrom
subtractor
25-bitadder/
0
1
1
1
1
0 0 0
0
0
0 1 0
0
1 0 0
0
0
1 1 0
determineADD/SUB
thesevariables
1 1 1
0
101
0
0 1 1
0 0 1
S
A S
Bsignfrom
0
S
AS
B
0
0
0
01
1
1
10
1
001 1
0 0
00
00
0 0
0
0
d d
dddd
dddd
d d
11
11
11
11
1
1
0000 01111
00
0
0
1
11
1
00
0
11
1
01
00101110
ASS
A ASS
A
25
s=0
1
1
1
1
ADD/SUB=ASÅS
AÅS
B
ADD(0)/
SUBTRACT(1)
(AS)
0
d
d
0
1
d
1
1
d
1
1
1
1
0
d
0
1
d
0
d
0
1
0
d
d
1
1
d
0
1
d
0
0
d
0
0
1
1
0
S
R
0
ADD/
SUB
subtractor
8-bit
S
R=25
sS

A+25

sS
A8

s+ASS

B8
s+AS

S
B8
s
(8
s)
28

6.33.Thelargestthatncanbeis253fornormalvalues.Themantissas,including
theleadingbitof1,are24bitslong.Therefore,theoutputoftheSHIFTER
canbenon-zeroforn23only,ignoringguardbitconsiderations.Letn=
n7n6:::n0,anddeneanenablesignal,EN,asEN=n7n6n5.Thisvari-
ablemustbe1foranyoutputoftheSHIFTERtobenon-zero.Letm=
m23m22:::m0ands23s22:::s0betheSHIFTERinputsandoutputs,respec-
tively.Thelargestnetworkisrequiredforoutputs0,becauseanyofthe24input
bitscouldbeshiftedintothisoutputposition.Deneanintermediatebinaryvec-
tori=i23i22:::i0.WewillrstshiftmintoibasedonENandn4n3.(Then
wewillshiftiintos,basedonn2n1n0.)Onlythepartofineededtogenerates0
willbespecied.
i7=ENn4n3m23+ENn4n3m15+ENn4n3m7
i6=(:::)m22+(:::)m14+(:::)m6
i5=(:::)m21+(:::)m13+(:::)m5
:
:
:
i0=(:::)m16+(:::)m8+(:::)m0
Gateswithfan-inuptoonly4areneededtogeneratethese8signals.Notethat
allbitsofmareinvolved,asclaimed.Wenowgenerates0fromthesesignals
andn2n1n0asfollows:
s0=n2n1n0i7+n2n1n0i6+n2n1n0i5+n2n1n0i4
+n2n1n0i3+n2n1n0i2+n2n1n0i1+n2n1n0i0
Notethatthisrequiresafan-inof8intheORgate,sothat3gateswillbeneeded.
Othersipositionscanbegeneratedinasimilarway.
29

6.34.(a)

A0E«
A6E«
A7E«
B0E«
B6E«
B7
Sign

0E«
6E«
7
(b)TheSWAPnetworkisapairofmultiplexers,eachonesimilarto(a).
6.35.Letm=m24m23:::m0betheoutputoftheadder/subtractor.Theleftmostbit,
m24,istheoverowbitthatcouldresultfromaddition.(Weignorethehandling
ofguardbits.)Deriveaseriesofvariables,zi,asfollows:
z1=m24
z0=
m24m23
z1=m24m23m22
:
:
:
z23=m24m23:::m0
z24=m24m23:::m0
Notethatexactlyoneofthezivariablesisequalto1foranyparticularmvector.
Thenencodethesezivariables,for1i23,intoa6-bitsignalrepresenta-
tionforX,sothatifzi=1,thenX=i.Thevariablez24signieswhetheror
nottheresultingmantissaiszero.
30

6.36.Augmentthe24-bitoperandmagnitudesenteringtheadder/subtractorbyadding
asignbitpositionattheleftend.Subtractionisthenachievedbycomplement-
ingthebottomoperandandperformingaddition.Groupcorrespondingbit-pairs
fromthetwo,signed,25-bitoperandsintosixgroupsoffourbit-pairseach,plus
onebit-pairattheleftend,forpurposesofderivingPiandGifunctions.La-
belthesefunctionsP6,G6,:::,P0,G0,fromleft-to-right,followingthepattern
developedinSection6.2.
Thelookaheadlogicmustgeneratethegroupinputcarriesc0,c4,c8,...,c24,
accountingproperlyforthe“end-aroundcarry”.Thekeyfactisthatacarryci
mayhavethevalue1becauseofageneratecondition(i.e.,someGi=1)ina
higher-ordergroupaswellasinalower-ordergroup.Thisobservationleadsto
thefollowinglogicexpressionsforthecarries:
c0=G6+P6G5+:::+P6P5P4P3P2P1G0
c4=G0+P0G6+P0P6G5+:::+P0P6P5P4P3P2G1
:
:
:
Sincetheoutputofthisadderisin1's-complementform,thesignbitdetermines
whetherornottocomplementtheremainingbitsinordertosendthemagnitude
Montothe“NormalizeandRound”operation.Additionofpositivenumbers
leadingtooverowisavalidresult,asdiscussedinSection6.7.4,andmust
bedistinguishedfromanegativeresultthatmayoccurwhensubtractionisper-
formed.Somelogicattheleft-endsignpositionsolvesthisproblem.
31

Chapter7–BasicProcessingUnit
7.1.TheWMFCstepisneededtosynchronizetheoperationoftheprocessorandthe
mainmemory.
7.2.Datarequestedinstep1arefetchedduringstep2andloadedintoMDRatthe
endofthatclockcycle.Hence,thetotaltimeneededis7cycles.
7.3.Steps2and5willtake2cycleseach.Totaltime=9cycles.
7.4.TheminimumtimerequiredfortransferringdatafromoneregistertoregisterZ
isequaltothepropagationdelay+setuptime
=0.3+2+0.2=2.5ns.
7.5.FortheorganizationofFigure7.1:
(a)1.PCout,MARin,Read,Select4,Add,Zin
2.Zout,PCin,Yin,WMFC
3.MDRout,IRin
4.PCout,MARin,Read,Select4,Add,Zin
5.Zout,PCin,Yin
6.R1out,Yin,WMFC
7.MDRout,SelectY,Add,Zin
8.Zout,R1in,End
(b)1-4.Sameasin(a)
5.Zout,PCin,WMFC
6.MDRout,MARin,Read
7.R1out,Yin,WMFC
8.MDRout,Add,Zin
9.Zout,R1in,End
(c)1-5.Sameasin(b)
6.MDRout,MARin,Read,WMFC
7-10.Sameas6-9in(b)
7.6.Manyapproachesarepossible.Forexample,thethreemachineinstructionsim-
plementedbythecontrolsequencesinpartsa,b,andccanbethoughtofasone
instruction,Add,thathasthreeaddressingmodes,Immediate(Imm),Absolute
(Abs),andIndirect(Ind),respectively.Inordertosimplifythedecoderblock,
hardwaremaybeaddedtoenablethecontrolstepcountertobeconditionally
loadedwithanout-of-sequencenumberatanytime.Thisprovidesa”branch-
ing”facilityinthecontrolsequence.Thethreecontrolsequencesmaynowbe
mergedintoone,asfollows:
1-4.Sameasin(a)
5.Zout,PCin,IfImmbranchto10
1

6.WMFC
7.MDRout,MARin,Read,IfAbsbranchto10
8.WMFC
9.MDRout,MARin,Read
10.R1out,Yin,WMFC
11.MDRout,Add,Zin
12.Zout,R1in,End
Dependingonthedetailsofhardwaretiming,steps,6and7maybecombined.
Similarly,steps8and9maybecombined.
7.7.FollowingthetimingmodelofFigure7.5,steps2and5take16nseach.Hence,
the7-stepsequencetakes42nstocomplete,andtheprocessorisidle28/42=
67%ofthetime.
7.8.Usea4-inputmultiplexerwiththeinputs1,2,4,andY.
7.9.WithreferencetoFigure6.7,thecontrolsequenceneedstogeneratetheShift
rightandAdd/Noadd(multiplexercontrol)signalsandcontrolthenumberof
additions/subtractionsperformed.Assumethatthehardwareisconguredsuch
thatregisterZcanperformthefunctionoftheaccumulator,registerTEMPcan
beusedtoholdthemultiplierandisconnectedtoregisterZforshiftingasshown.
RegisterYwillbeusedtoholdthemultiplicand.Furthermore,themultiplexer
attheinputoftheALUhasthreeinputs,0,4,andY.Tosimplifycounting,a
counterregisterisavailableonthebus.Itisdecrementedbyacontrolsignal
DecrementanditsetsanoutputsignalZeroto1whenitcontainszero.Afacility
toplaceaconstantvalueonthebusisalsoavailable.
Afterfetchingtheinstructionthecontrolsequencecontinuesasfollows:
4.Constant=32,Constantout,Counterin
5.R1out,TEMPin
6.R2out,Yin
7.Zout,ifTEMP0=1thenSelectYelseSelect0,Add,Zin,Decrement
8.Shift,ifZero=0thenBranch7
9.Zout,R2in,End
7.10.Thecontrolstepsare:
1-3.Fetchinstruction(asinFigure7.9)
4.PCout,Offset-eld-of-IRout,Add,IfN=1thenPCin,End
2

7.11.LetSPbethestackpointerregister.Thefollowingsequenceisforaprocessor
thatstoresthereturnaddressonastackinthememory.
1-3.Fetchinstruction(asinFigure7.6)
4.SPout,Select4,Subtract,Zin
5.Zout,SPin,MARin
6.PCout,MDRin,Write,Yin
7.Offset-eld-of-IRout,Add,Zin
8.Zout,PCin,End,WMFC
7.12.1-3.Fetchinstruction(asinFigure7.9)
4.SPoutB,Select4,Subtract,SPin,MARin
5.PCout,R=B,MDRin,Write
6.Offset-eld-of-IRout,PCout,Add,PCin,WMFC,End
7.13.ThelatchinFigureA.27cannotbeusedtoimplementaregisterthatcanbeboth
thesourceandthedestinationofadatatransferoperation.Forexample,itcannot
beusedtoimplementregisterZinFigure7.1.Itmaybeusedinotherregisters,
providedthatholdtimerequirementsaremet.
7.14.Thepresenceofagateattheclockinputofaip-opintroducesclockskew.
Thismeansthatclockedgesdonotreachallip-opsatthesametime.For
example,considertwoip-opsAandB,withoutputQAconnectedtoinput
DB.AclockedgeloadsnewdataintoA,andthenextclockedgetransfersthese
datatoB.However,ifclockBisdelayed,thenewdataloadedintoAmayreach
BbeforetheclockandbeloadedintoBoneclockperiodtooearly.
ClockA
QA
ClockB
ClockA ClockB
QA QB
skew
Intheabsenceofclockskew,ip-opBrecordsa0attherstclockedge.
However,ifClockBisdelayedasshown,theip-oprecordsa1.
3

7.15.AddalatchsimilartothatinFigureA.27ateachofthetworegisterleoutputs.
AreadoperationisperformedintheRAMinthersthalfofaclockcycleand
thelatchinputsareenabledatthattime.Thedatareadenterthetwolatchesand
appearonthetwobusesimmediately.Duringthesecondphaseoftheclockthe
latchinputsaredisabled,lockingthedatain.Hence,thedatareadwillcontinue
tobeavailableonthebuseseveniftheoutputsoftheRAMchange.TheRAM
performsawriteoperationduringthisphasetorecordtheresultsofthedata
transfer.
Bus A Bus CBus B
Enable
in
RAM
WriteRead
Clock
Read
Write
Enable
in
7.16.ThestepcounteradvancesattheendofaclockperiodinwhichRunisequal
to1.WithreferencetoFigure7.5,Runshouldbesetto0duringtherstclock
cycleofstep2andsetto1assoonasMFCisreceived.Ingeneral,Runshould
besetto0byWMFCandreturnedto1whenMFCisreceived.Toaccountfor
thepossibilitythatamemoryoperationmayhavebeenalreadycompletedby
thetimeWMFCisissued,Runshouldbesetto0onlyiftherequestedmemory
operationisstillinprogress.Astatemachinethatcontrolsbusoperationand
generatestherunsignalisgivenbelow.
Write Read
MFC MFC
A BC
Run = WNFC
×(B + C)
4

7.17.ThefollowingcircuitusesamultiplexerarrangementsimilartothatinFigure
7.3.
00
01
10
Clock
MR
0
1
QD
7.18.Apossiblearrangementisshownbelow.Forclarity,wehaveassumedthatMDR
consistsoftwoseparateregistersforinputandoutputdata.MultiplexersMux-1
andMux-2selectinputBforevenandinputAforoddbyteoperations.Mux
3selectsinputAforwordoperationsandinputBforbyteoperations.Input
Bprovideseitherzeroextensionorsignextensionofbyteoperands.Forsign-
extensionitshouldbeconnectedtothemost-signicantbitoutputofmultiplexer
Mux-2.
MDR
H (in) MDR
L (in) MDR
H (out) MDR
L (out)
AB
Mux 3
AB
Mux 2
Sign ext.
Zero or
AB
Mux 1
Memory bus
7.19.Usethedelayelementinaringoscillatorasshownbelow.Thefrequencyof
oscillationis1/(2T).Byaddingthecontrolcircuitshown,theoscillatorwillrun
onlywhileRunisequalto1.Whenstopped,itsoutputAisequalto0.The
oscillatorwillalwaysgeneratecompleteoutputpulses.IfRungoesto0whileA
is1,thelatchwillnotchangestateuntilBgoesto1attheendofthepulse.
5

DelayT
DelayT
Ring oscillator
Ring oscillator
with run/stop control
Run
Output
Output
7.20.Inthecircuitbelow,Enableisequalto1wheneverShort/
Longisequalto1,
indicatingashortpulse.Whenthislinechangesto0,Enablechangesto0for
oneclockcycle.
D Q
Enable
Short/Long
Clock
D
Q
Enable
Short/Long
Clock
Short/LongQD
0 01
0 10
1 00
1 10
6

7.21.(a)Countsequenceis:000010001100111011110111001100010000
(b)A5-bitJohnsoncounterisshownbelow,withtheoutputsQ1toQ5decoded
togeneratethesignalsT1toT10.Thefeedbackcircuithasbeenmodiedto
makethecounterself-starting.Itimplementsthefunction
D1=Q5+Q3+Q4
ThiscircuitdetectsstatesthathaveQ3Q4Q5=010andchangesthefeedback
valuefrom1to0.Withoutthisorasimilarmodicationtothefeedbackcircuit,
thecountermaybestuckinsequencesotherthanthedesiredoneabove.
TheadvantageofaJohnsoncounteristhattherearenoglitchesindecodingthe
countvaluetogeneratethetimingsignals.
D Q D Q D Q D QD Q
T
5
T
6
T
7
T
8
T
9
T
4T
3T
2T
1T
0
7.22.WewillgenerateasignalcalledStoretorecirculatedatawhennoexternalaction
isrequired.
Store=(ARS+LSR+SL+LLD)
D15=ASRQ15+SLQ14+RORCarry+LDD15+StoreQ15
D1 =(ASR+LSR+ROR)Q2+SLQ0+LDD1+StoreQ1
D0 =(ASR+LSR+ROR)Q1+LDD0++StoreQ0
7

7.23.Astatediagramfortherequiredcontrollerisgivenbelow.ThisisaMoore
machine.Theoutputvaluesaregiveninsideeachstateastheyarefunctionsof
thestateonly.
Sincethereare6independentstates,aminimumofthreeip-opsr,s,andt
arerequiredfortheimplementation.Apossiblestateassignmentisshowninthe
diagram.IthasbeenchosentosimplifythegenerationoftheoutputsX,Y,and
Z,whicharegivenby
X=r+s+tY=sZ=t
UsingDip-opsforimplementationofthecontroller,therequiredinputstothe
ip-opsmaybegeneratedasfollows
D(r)=stB+st
D(s)=stA+stB
D(t)=stB+stA+stB
S0
000
S0
100
S0
110
S0
101
S0
001
S0
111
Initialization
rst
A
A
B
B
8

7.24.Microroutine:
AddressMicroinstruction
(Octal)
000-002SameasinFigure7.21
300 BranchfPC 161
161 PCout,MARin,Read,Select4,Add,Zin
162 Zout,PCin,WMFC
163 MDRout,Yin
164 Rsrcout,SelectY,Add,Zin
165 Zout,MARin,Read
166 BranchfPC 170;PC0 [IR8]g,WMFC
170-173SameasinFigure7.21
7.25.Conditionalbranch
AddressMicroinstruction
(Octal)
000-002SameasinFigure7.21
003 BranchfPC 300
300 ifZ+(NV=1thenBranchfPC 304g
301 PCout,Yin
302 Addressout,SelectY,Add,Zin
303 Zout,PCin,End
7.26.Assumemicroroutinestartsat300forallthreeinstructions.(Altenatively,the
instructiondecodermaybranchto302directlyinthecaseofanunconditional
branchinstruction.)
AddressMicroinstruction
(Octal)
000-002SameasinFigure7.21
003 BranchfPC 300g
300 ifZ+(NV=1)thenBranchfPC 000g
301 if(N=1)thenBranchfPC 000g
302 PCout,Yin
303 Offset-eld-of-IRout,SelectY,Add,Zin
304 Zout,PCin,End
9

7.27.Theanswertoproblem3.26holdsinthiscaseaswell,withtherestrictionthat
oneoftheoperandlocations(eithersourceordestination)mustbeadataregister.
AddressMicroinstruction
(Octal)
000-002SameasinFigure7.21
003 BranchfPC 010g
010 if(IR108=000)thenBranchfPC 101g
011 if(IR108=001)thenBranchfPC 111g
012 if(IR109=01)thenBranchfPC 121g
013 if(IR109=10)thenBranchfPC 141g
014 BranchfPC 161g
121 Rsrcout,MARin,Read,Select4,Add,Zin
122 Zout,Rsrcin
123 if(IR8=1)thenBranchfPC 171g
124 BranchfPC 170g
170-173SameasinFigure7.21
7.28.ThereisnochangefortheveaddressmodesinFigure7.20.AbsoluteandIm-
mediatemodesrequireaseparatepath.However,somesharingmaybepossible
amongabsolute,immediate,andindexed,asallthreemodesreadthewordfol-
lowingtheinstruction.Also,FullIndexedmodeneedstobeimplementedby
addingthecontentsofthesecondregistertogeneratetheeffectiveaddress.After
eachmemoryaccess,theprogramcountershouldbeupdatedby2,ratherthan4,
inthecaseofthe16-bitprocessor.
7.29.Thesamegeneralstructurecanbeused.Sincethedstoperandcanbespecied
inanyoftheveaddressingmodesasthesrcoperand,itisnecessarytorepli-
catethemicroinstructionsthatdeterminetheeffectiveaddressofanoperand.At
microinstruction172,thesourceoperandshouldplacedinatemporaryregister
andanothertreeofmicroinstructionsshouldbeenteredtofetchthedestination
operand.
10

7.30.(a)Apossibleaddressassignmentisasfollows.
AddressMicroinstruction
0000 A
0001 B
0010 if(b6b5)=00)thenBranch0111
0011 if(b6b5)=01)thenBranch1010
0100 if(b6b5)=10)thenBranch1100
0101 I
0110 Branch1111
0111 C
1000 D
1001 Branch1111
1010 E
1011 Branch1111
1100 F
1101 G
1110 H
1111 J
(b)Assumethatbitsb65ofIRareORedintobitPC32
AddressMicroinstruction
0000 A
0001 B;PC32 b65
0010 C
0011 D
0100 Branch1111
0101 E
0110 Branch1111
0111 F
1011 G
1100 H
1101 Branch1111
1110 I
1111 J
11

(c)
Address Microinstruction
Nextaddress Function
0000 0001 A
0001 0010 B;PC32 b65
0010 0011 C
0011 1111 D
0110 1111 E
1010 1011 F
1011 1100 G
1100 1111 H
1110 1111 I
1111 – J
7.31.PuttheYincontrolsignalasthefourthsignalinF5,toreduceF3byonebit.
CombineeldsF6,F7,andF8intoasingle2-biteldthatrepresents:
00:Select4
01:SelectY
10:WMFC
11:End
Combiningsignalsmeansthattheycannotbeissuedinthesamemicroinstruc-
tion.
7.32.Toreducethenumberofbits,weshoulduselargereldsthatspecifymoresig-
nals.This,inevitably,leadstofewerchoicesinwhatsignalscanbeactivatedat
thesametime.Thechoiceastowhichsignalscanbecombinedshouldtakeinto
accountwhatsignalsarelikelytobeneededinagivenstep.
Onewaytoprovideexibilityistodenecontrolsignalsthatperformmultiple
functions.Forexample,wheneverMARisloaded,itislikelythatareadcom-
mandshouldbeissued.Wecanusetwosignals:MARinandMARinRead.We
activatethesecondonewhenareadcommandistobeissued.Similarly,Zinis
alwaysaccompaniedbyeitherSelectYorSelect4.Hence,insteadofthesethree
signals,wecanuseZinSelect4andZinSelectY.
Apossible12-bitencodingusesthree4-biteldsFA,FB,andFC,whichcom-
binesignalsfromFigure7.19asfollows:
FA:F1plus,ZoutEnd,ZoutWMFC.(11signals)
FB:F2,F3,InsteadofZin,MARin,andMDRinuseZinSelect4,ZinSelectY,
MARin,MARinRead,andMDRinWrite.(13signals)
FC:F4(16signals)
12

Withthesechoices,step5inFigure7.6mustbesplitintotwosteps,leadingto
an8-stepsequence.Figure7.7remainsunchanged.
7.33.Figure7.8containstwobuses,AandB,oneconnectedtoeachofthetwoinputs
oftheALU.Therefore,twoeldsareneededinsteadofF1;oneeldtoprovide
gatingofregistersontobusA,andanotherontobusB.
7.34.Horizontalmicroinstructionsarelonger.Hence,theyrequirealargermicropro-
grammemory.Averticalorganizationrequiresmoreencodinganddecoding
ofsignals,hencelongerdelays,andleadstolongermicroprogramsandslower
operation.Withthehigh-densityoftoday'sintegratedcircuits,theverticalorga-
nizationisnolongerjustied.
7.35.Themainadvantageofhardwiredcontrolisfastoperation.Thedisadvantages
include:highercost,inexibilitywhenchangesoradditionsaretobemade,and
longertimerequiredtodesignandimplementsuchunits.
Microprogrammedcontrolischaracterizedbylowcostandhighexibility.Lower
speedofoperationbecomesaprobleminhigh-performancecomputers.
13

Chapter8–Pipelining
8.1.(

)Theoperationperformedineachstepandtheoperandsinvolvedareasgiven
inthegurebelow.
Fetch
Decode,
20, 2000
Add
Fetch
Decode,
3, 50
Mul
Fetch
Decode,
$3A, 50
And
Fetch
Decode,
2000, 50
Add
R1¬
2020
R3¬
150
R4¬
50
R5¬
2050
Clock cycle 1 2 3 4 5 6 7
I
2
: Mul
I
3
: And
I
4
: Add
I
1
: Add
Instruction
(

)
Clockcycle2 3 4 5
BufferB1 Addinstruction
(I
)
Mulinstruction
(I
)
Andinstruction
(I
)
Addinstruction
(I
)
BufferB2 Information
fromaprevious
instruction
DecodedI

Source
operands:
20,2000
DecodedI

Source
operands:
3,50
DecodedI

Source
operands:
$3A,50
BufferB3 Information
fromaprevious
instruction
Information
fromaprevious
instruction
ResultofI
:
2020
Destination

R1
ResultofI
:
150
Destination

R3
1

8.2.(

)
Fetch
Decode,
20, 2000
Add
Fetch
Decode,
3, 50
Mul
Fetch
Decode,
$3A, 2020
And
Fetch
Decode,
2000, 50
Add
R1¬
2020
R3¬
150
R4¬
32
R5¬
2050
Clock cycle 1 2 3 4 5 6 7
Mul
And
Add
Add
Instruction
$3A, ?
(

)Cycles2to4arethesameasinP8.1,butcontentsofR1arenotavailable
untilcycle5.Incycle5,B1andB2havethesamecontentsasincycle4.B3
containstheresultofthemultiplyinstruction.
8.3.StepD
maybeabandoned,toberepeatedincycle5,asshownbelow.But,
instructionI
mustremaininbufferB1.ForI toproceed,bufferB1mustbe
capableofholdingtwoinstructions.ThedecodestepforI
hastobedelayedas
shown,assumingthatonlyoneinstructioncanbedecodedatatime.
F
1
E
1
Clock cycle 1 2 3 4 5 6 7
I
2
(Add)
I
3
I
4
I
1
(Mul)
Instruction
D
1
W
1
F
2
E
2
D
2
W
2
F
3
E
3
D
3
W
3
F
4
E
4
D
4
W
4
D
2
8
2

8.4.Ifalldecodeandexecutestagescanhandletwoinstructionsatatime,onlyin-
structionI
isdelayed,asshownbelow.Inthiscase,allbuffersmustbecapable
ofholdinginformationfortwoinstructions.NotethatcompletinginstructionI

beforeI
couldcauseproblems.SeeSection8.6.1.
F
1
E
1
Clock cycle 1 2 3 4 5 6 7
I
2
(Add)
I
3
I
4
I
1
(Mul)
Instruction
D 1
W
1
F
2
E
2
D
2
W
2
F
3
E
3
D
3
W
3
F
4
E
4
D
4
W
4
8.5.Executionproceedsasfollows.
F
1
E
1
Clock cycle 1 2 3 4 5 6 7
I
2
I
3
I
4
I
1
Instruction
D
1
W
1
F
2
E
2
D
2
W
2
E
3
D
3
W
3
F
4
E
4
D
4
W
4
F
3
98
8.6.Theinstructionimmediatelyprecedingthebranchshouldbeplacedafterthe
branch.
LOOPInstruction1 LOOPInstruction1

Instruction
Instruction

Instruction ConditionalBranchLOOP
ConditionalBranchLOOP Instruction

Thisreorganizationispossibleonlyifthebranchinstructiondoesnotdependon
instruction
.
3

8.7.TheUltraSPARCarrangementisadvantageouswhenthebranchinstructionisat
theendoftheloopanditispossibletomoveoneinstructionfromthebodyof
theloopintothedelayslot.Thealternativearrangementisadvantageouswhen
thebranchinstructionisatthebeginningoftheloop.
8.8.Theinstructionexecutedonaspeculativebasisshouldbeonethatislikelytobe
thecorrectchoicemostoften.Thus,theconditionalbranchshouldbeplacedat
theendoftheloop,withaninstructionfromthebodyoftheloopmovedtothe
delayslotifpossible.Alternatively,acopyoftherstinstructionintheloop
bodycanbeplacedinthedelayslotandthebranchaddresschangedtothatof
thesecondinstructionintheloop.
8.9.Therstbranch(BLE)hastobefollowedbyaNOPinstructioninthedelayslot,
becausenoneoftheinstructionsarounditcanbemoved.Theinnerandouter
loopcontrolscanbeadjustedasshownbelow.Therstinstructionintheouter
loopisduplicatedinthedelayslotfollowingBLE.Itwillbeexecutedonemore
timethanintheoriginalprogram,changingthevalueleftinR3.However,this
shouldcausenodifcultyprovidedthecontentsofR3arenotneededoncethe
sortiscompleted.Themodiedprogramisasfollows:
ADD R0,LIST,R3
ADD R0,N,R1
SUB R1,1,R1
SUB R1,1,R2
OUTER LDUB [R3+R1],R5GetLIST(j)
LDUB [R3+R2],R6GetLIST(k)
INNER SUB R6,R5,R0
BLE,ptNEXT
SUB R2,1,R2 k
k 1
STUB R5,[R3+R2]
STUB R6,[R3+R1]
OR R0,R6,R5
NEXT BGE,pt,aINNER
LDUB [R3+R2],R6GetLIST(k)
SUB R1,1,R1
BGT,ptOUTER
SUB R1,1,R2
4

8.10.Withoutconditionalinstructions:
CompareA,B CheckA
B
Branch
0Action1
Action2... ... Oneormoreinstructions
Branch Next
Action1... ... Oneormoreinstructions
Next ...
Ifconditionalinstructionsareavailable,wecanuse:
CompareA,BCheckA
B
... ...Action1instruction(s),conditional
... ...Action2instruction(s),conditional
Next...
Inthesecondcase,allAction1andAction2instructionsmustbefetchedand
decodedtodeterminewhethertheyaretobeexecuted.Hence,thisapproachis
benecialonlyifeachactionconsistsofoneortwoinstructions.
F
1
Clock cycle 1 2 3 4 5 6
Branch>0
É
Branch
Compare
Instruction
E
1
É
Next
F
2E
2
F
3E
3
F
4E
4
F
6E

A,B
Action1
Next
Without conditional instructions
If >0 then action1
If£0then action2
CompareA,B
NEXTÉ
F
1E
1
F
2E
2
F
3E
3
F
4E
4
With conditional instructions
Action2
Action1
5

8.11.Buffercontentswillbeasshownbelow.
RSLT
Cycle No.
Clock
198 130 260
3 4 5
ALU Operation + Shift O
3
R3 45 130 260
8.12.UsingLoadandStoreinstructions,theprogrammayberevisedasfollows:
INSERTIONTest RHEAD
Branch
0HEAD
Move RNEWREC,RHEAD
Return
HEAD Load RTEMP1,(RHEAD)
Load RTEMP2,(RNEWREC)
CompareRTEMP1,RTEMP2
Branch
0SEARCH
Store RHEAD,4(RNEWREC)
Move RNEWREC,RHEAD
Return
SEARCH Move RHEAD,RCURRENT
LOOP Load RNEXT,4(RCURRENT)
Test RNEXT
Branch=0TAIL
Load RTEMP1,(RNEXT)
Load RTEMP2,(RNEWREC)
CompareRTEMP1,RTEMP2
Branch
0INSERT
Move RNEXT,RCURRENT
Branch LOOP
INSERT Store RNEXT,4(RNEWREC)
TAIL Store RNEWREC,4(RCURRENT)
Return
Thisprogramcontainsmanydependenciesandbranchinstructions.Therevery
fewpossibilitiesforinstructionreordering.Thecriticalpartwhereoptimization
shouldbeattemptedistheloop.Giventhatnoinformationisavailableonbranch
behaviorordelayslots,theonlyoptimizationpossibleistoseparateinstructions
thatdependoneach.Thiswouldreducetheprobabilityofstallingthepipeline.
Theloopmaybereorganizedasfollows.
6

LOOP Load RNEXT,4(RCURRENT)
Load RTEMP2,(RNEWREC)
Test RNEXT
Load RTEMP1,(RNEXT)
Branch=0TAIL
CompareRTEMP1,RTEMP2
Branch
0INSERT
Move RNEXT,RCURRENT
Branch LOOP
INSERTStore RNEXT,4(RNEWREC)
TAIL Store RNEWREC,4(RCURRENT)
Return
NotethatwehaveassumedthattheLoadinstructiondoesnotaffectthecondition
codeags.
8.13.Becauseofbranchinstructions,120clockcyclesareneededtoexecute100pro-
graminstructionswhendelayslotsarenotused.Usingthedelayslotswillelim-
inate0.85oftheidlecycles.Thus,theimprovementisgivenby:








Thatis,instructionthroughputwillincreaseby8.1%.
8.14.Numberofcyclesneededtoexecute100instructions:
Withoutoptimization 140
Withoptimization(
!"#$

#%$

) 127
Thus,throughputimprovementis
!'&(*)+,

,or10.2%
8.15.Throughputimprovementduetopipeliningis
,where
isthenumberofpipeline
stages.
Numberofcyclesneeded
toexecuteoneinstruction:
Throughput
4-stage:



$

-,

! 4/1.043.85
6-stage:


!#



.#



$

-

/ 6/1.19
5.04
Thus,the6-stagepipelineleadstohigherperformance.
7

8.16.Fora“dowhile”loop,theterminationconditionistestedatthebeginningofthe
loop.Aconditionalbranchatthatlocationwillbetakenwhenexitingtheloop.
Hence,itshouldbepredictednottaken.Thatis,thestatemachineshouldbe
startedinthestateLNT,unlesstheloopisnotlikelytobeexecutedatall.
A“dountil”loopisexecutedatleastonce,andthebranchconditionistested
attheendoftheloop.Assumingthattheloopislikelytobeexecutedseveral
times,thebranchshouldbepredictedtaken.Thatis,thestatemachineshouldbe
startedinstateLT.
8.17.Aninstructionfetchedincycle
0reachestheheadofthequeueandentersthe
decodestageincycle
02143 .AssumethattheinstructionprecedingI isdecoded
andinstructionI
5isfetchedincycle1.ThisleadstoinstructionsI toI 5beingin
thequeueatthebeginningofcycle2.Executionwouldthenproceedasshown
below.
Notethatthequeueisalwaysfull,becauseatmostoneinstructionisdispatched
anduptotwoinstructionsarefetchedinanygivencycle.Undertheseconditions,
thequeuelengthwoulddropbelow6onlyinthecaseofacachemiss.
X
D
1E
1E
1E
1W
1
W
3E
3
I
5
(Branch)
I
1
D
2
1 2 3 4 5 6 7 8 9Clock cycle
E
2W
2
D
3
E
4
D
4
W
4
D
5
F
6
F
k
D
k
E
k
F
k+1 D
k+1
I
2
I
3
I
4
I
6
I
k
I
k+1
W
k
E
k+1
10
6 6 6 6 6 6 6 6 6Queue length 6
Time
É
É
8

Chapter9–EmbeddedSystems
9.1.Connectcharacterinputtotheserialportandthe7-segmentdisplayunittopar-
allelportA.Connectbits

to

tothedisplaysegments
to

,respectively.UsethesegmentencodingshowninFigureA.37.Forexample,
thedecimaldigit0setsthesegments
,
,...,

tothehexpattern7E.
AsuitableprogrammayuseatabletoconverttheASCIIcharactersintothehex
patternsforthedisplay.TheASCII-encodeddigits(seeTableE.2)arerepre-
sentedbythepattern111inbitpositions

andthecorrespondingBCDvalue
(seeTableE.1)inbitpositions

.Hence,extractingthebits
fromthe
ASCIIcodeprovidesanindex,
,whichcanbeusedtoaccesstherequiredentry
intheconversiontable(list).Apossibleprogramisisobtainedbymodifyingthe
programinFigure9.11asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSSTAT(volatilechar*)0xFFFFFFE2
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B
;
charj;
/*Initializetheparallelport*/
*PADIR=0xFF; /*CongurePortAasoutput*/
/*Transferthecharacters*/
while(1)

/*Inniteloop*/
while((*SSTAT&0x1)==0);/*Waitforanewcharacter*/
j=*RBUF&0xF; /*ExtracttheBCDvalue*/
*PAOUT=seg7[j]; /*Sendthe7-segmentcodetoPortA*/


1

9.2.ThearrangementexplainedinthesolutionforProblem9.1canbeused.Theen-
triesintheconversiontablecanbeaccessedusingtheindexedaddressingmode.
LetthetableoccupytenbytesstartingataddressSEG7.Then,usingregister
R0astheindexregister,thetableisaccessedusingthemodeSEG7(R0).The
desiredprogrammaybeobtainedbymodifyingtheprograminFigure9.10as
follows:
RBUF EQU $FFFFFFE0 Receivebuffer.
SSTATEQU $FFFFFFE2 Statusregisterforserialinterface.
PAOUTEQU $FFFFFFF1 PortAoutputdata.
PADIREQU $FFFFFFF2 PortAdirectionregister.
*Denetheconversiontable
ORIGIN $200
SEG7 DataByte$7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
*Transferthecharacters
LOOP Testbit#0,SSTAT Checkifnewcharacterisready.
Branch=0LOOP
MoveByteRBUF,R0 TransferacharactertoR0.
And #$F,R0 ExtracttheBCDvalue.
MoveByteSEG7(R0),PAOUTSendthe7-segmentcodetoPortA.
Branch LOOP
2

9.3.ThearrangementexplainedinthesolutionforProblem9.1canbeused.The
desiredprogrammaybeobtainedbymodifyingtheprograminFigure9.16as
follows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSCONT(char*)0xFFFFFFE3
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#deneintaddr(int*)(0x24)
voidintserv();
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B ;
charj;
/*Initializetheparallelport*/
*PADIR=0xFF; /*CongurePortAasoutput*/
/*Initializetheinterruptmechanism*/
int
addr=&intserv; /*Setinterruptvector*/
asm(”Move#0x40,%PSR”);/*ProcessorrespondstoIRQinterrupts*/
*SCONT=0x10; /*Enablereceiverinterrupts*/
/*Transferthecharacters*/
while(1); /*Inniteloop*/

/*Interruptserviceroutine*/
voidintserv()

j=*RBUF&0xF; /*ExtracttheBCDvalue*/
*PAOUT=seg7[j]; /*Sendthe7-segmentcodetoPortA*/
asm(”ReturnI”); /*Returnfrominterrupt*/

3

9.4.ThearrangementexplainedinthesolutionsforProblems9.1and9.2canbeused.
ThedesiredprogrammaybeobtainedbymodifyingtheprograminFigure9.14
asfollows:
RBUF EQU $FFFFFFE0 Receivebuffer.
SCONT EQU $FFFFFFE3 Controlregisterforserialinterface.
PAOUT EQU $FFFFFFF1 PortAoutputdata.
PADIR EQU $FFFFFFF2 PortAdirectionregister.
*Denetheconversiontable
ORIGIN $200
SEG7 DataByte$7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
Move #INTSERV,$24 Settheinterruptvector.
Move #$40,PSR ProcessorrespondstoIRQinterrupts.
MoveByte#$10,SCONT Enablereceiverinterrupts.
*Transferloop
LOOP Branch LOOP Innitewaitloop.
*Interruptserviceroutine
INTSERVMoveByteRBUF,R0 TransferacharactertoR0.
And #$F,R0 ExtracttheBCDvalue.
MoveByteSEG7(R0),PAOUTSendthe7-segmentcodetoPortA.
ReturnI Returnfrominterrupt.
4

9.5.ThearrangementexplainedinthesolutionforProblem9.1canbeused,having
the7-segmentdisplaysconnectedtoPortsAandB.Upondetectingthecharacter
H,therstdigithastobesavedanddisplayedonlywhentheseconddigitarrives.
ThedesiredprogrammaybeobtainedbymodifyingtheprograminFigure9.11
asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSSTAT(volatilechar*)0xFFFFFFE2
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B ;
charj,temp;
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Transferthecharacters*/
while(1)

/*Inniteloop*/
while((*SSTAT&0x1)==0);/*Waitforanewcharacter*/
if(*RBUF=='H')

while((*SSTAT&0x1)==0);/*Waitfortherstdigit*/
j=*RBUF&0xF; /*ExtracttheBCDvalue*/
temp=seg7[j]; /*Prepare7-segmentcodeforPortA*/
while((*SSTAT&0x1)==0);/*Waitfortheseconddigit*/
j=*RBUF&0xF; /*ExtracttheBCDvalue*/
*PBOUT=seg7[j]; /*Sendthe7-segmentcodetoPortB*/
*PAOUT=temp; /*Sendthe7-segmentcodetoPortA*/



5

9.6.ThearrangementexplainedinthesolutionsforProblems9.1and9.2canbeused,
havingthe7-segmentdisplaysconnectedtoPortsAandB.Upondetectingthe
characterH,therstdigithastobesavedanddisplayedonlywhenthesecond
digitarrives.Thedesiredprogrammaybeobtainedbymodifyingtheprogram
inFigure9.10asfollows:
RBUF EQU $FFFFFFE0 Receivebuffer.
SSTATEQU $FFFFFFE2 Statusregisterforserialinterface.
PAOUTEQU $FFFFFFF1 PortAoutputdata.
PADIR EQU $FFFFFFF2 PortAdirectionregister.
PBOUTEQU $FFFFFFF4 PortBoutputdata.
PBDIREQU $FFFFFFF5 PortBdirectionregister.
*Denetheconversiontable
ORIGIN $200
SEG7 DataByte$7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
MoveByte#$FF,PBDIR CongurePortBasoutput.
*Transferthecharacters
LOOP Testbit#0,SSTAT Checkifnewcharacterisready.
Branch=0LOOP
MoveByteRBUF,R0 Readthecharacter.
Compare#$48,R0 CheckifH.
Branch

0LOOP
LOOP2Testbit#0,SSTAT Checkifrstdigitisready.
Branch=0LOOP2
MoveByteRBUF,R0 Readtherstdigit.
And #$F,R0 ExtracttheBCDvalue.
LOOP3Testbit#0,SSTAT Checkifseconddigitisready.
Branch=0LOOP3
MoveByteRBUF,R1 Readtheseconddigit.
And #$F,R1 ExtracttheBCDvalue.
MoveByteSEG7(R1),PBOUTSendthe7-segmentcodetoPortB.
MoveByteSEG7(R0),PAOUTSendthe7-segmentcodetoPortA.
Branch LOOP
6

9.7.ThearrangementexplainedinthesolutionforProblem9.1canbeused,having
the7-segmentdisplaysconnectedtoPortsAandB.Upondetectingthecharacter
H,therstdigithastobestoredanddisplayedonlywhentheseconddigitarrives.
InterruptsareusedtodetectthearrivalofbothHandthesubsequentpairof
digits.Therefore,theinterruptserviceroutinehastokeeptrackofthereceived
characters.Variable
issetto2whenanHisdetected,anditisdecrementedas
thesubsequentdigitsarrive.Thedesiredprogrammaybeobtainedbymodifying
theprograminFigure9.16asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSCONT(char*)0xFFFFFFE3
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
#deneint
addr(int*)(0x24)
voidintserv();
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B
;
charj;
chardigits[2]; /*BufferforreceivedBCDdigits*/
intk=0; /*SetuptodetecttherstH*/
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Initializetheinterruptmechanism*/
int
addr=&intserv; /*Setinterruptvector*/
asm(”Move#0x40,%PSR”);/*ProcessorrespondstoIRQinterrupts*/
*SCONT=0x10; /*Enablereceiverinterrupts*/
/*Transferthecharacters*/
while(1); /*Inniteloop*/

7

/*Interruptserviceroutine*/
voidintserv()

*SCONT=0; /*Disableinterrupts*/
if(k
0)

j=*RBUF&0xF; /*ExtracttheBCDvalue*/
k=k
1;
digits[k]=seg7[j];/*Save7-segmentcodefornewdigit*/
if(k==0)

*PAOUT=digits[1];/*SendrstdigittoPortA*/
*PBOUT=digits[0];/*SendseconddigittoPortB*/

elseif(*RBUF=='H')k=2;

*SCONT=0x10; /*Enablereceiverinterrupts*/
asm(”ReturnI”);/*Returnfrominterrupt*/

9.8.ThearrangementexplainedinthesolutionsforProblems9.1and9.2canbe
used,havingthe7-segmentdisplaysconnectedtoPortsAandB.Upondetecting
thecharacterH,therstdigithastobestoredanddisplayedonlywhenthe
seconddigitarrives.InterruptsareusedtodetectthearrivalofbothHandthe
subsequentpairofdigits.Therefore,theinterruptserviceroutinehastokeep
trackofthereceivedcharacters.Variable
issetto2whenanHisdetected,
anditisdecrementedasthesubsequentdigitsarrive.Thedesiredprogrammay
beobtainedbymodifyingtheprograminFigure9.14asfollows:
8

RBUF EQU $FFFFFFE0 Receivebuffer.
SCONT EQU $FFFFFFE3 Controlregforserialinterface.
PAOUT EQU $FFFFFFF1 PortAoutputdata.
PADIR EQU $FFFFFFF2 PortAdirectionregister.
PBOUT EQU $FFFFFFF4 PortBoutputdata.
PBDIR EQU $FFFFFFF5 PortBdirectionregister.
*Denetheconversiontableandbufferforrstdigit
ORIGIN $200
SEG7 DataByte $7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
DIG ReserveByte1 Bufferforrstdigit.
K Data 0 SetuptodetectrstH.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
MoveByte#$FF,PBDIR CongurePortBasoutput.
Move #INTSERV,$24 Settheinterruptvector.
Move #$40,PSR ProcessorrespondstoIRQ.
MoveByte#$10,SCONT Enablereceiverinterrupts.
*Transferloop
LOOP Branch LOOP Innitewaitloop.
*Interruptserviceroutine
INTSERVMoveByte#0,SCONT Disableinterrupts.
MoveByteRBUF,R0 Readthecharacter.
Move K,R1 Seeifanewdigit
Branch
0 NEWDIG isexpected.
Compare #$48,R0 CheckifH.
Branch

0 DONE
Move #2,K DetectedanH.
Branch DONE
NEWDIG And #$F,R0 ExtracttheBCDvalue.
Subtract #1,R1 DecrementK.
Move R1,K
Branch=0 DISP Seconddigitreceived.
MoveByteSEG7(R0),DIG Savetherstdigit.
Branch DONE
DISP MoveByteDIG,PAOUT Send7-segmentcodetoPortA.
MoveByteSEG7(R0),PBOUTSend7-segmentcodetoPortB.
DONE MoveByte#$10,SCONT Enablereceiverinterrupts.
ReturnI Returnfrominterrupt.
9

9.9.ConnecttheparallelportsAandBtothefourBCDto7-segmentdecoders.
Choosethat

,



,
!"
and
!


displaytherst,second,third
andfourthreceiveddigits,respectively.Assumethatallfourdigitsarriveim-
mediatelyafterthecharacterHhasbeenreceived.Thetaskcanbeachievedby
modifyingtheprograminFigure9.11asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSSTAT(volatilechar*)0xFFFFFFE2
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
voidmain()

chartemp;
chardigits[4]; /*Bufferforreceiveddigits*/
inti;
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Transferthecharacters*/
while(1)

/*Inniteloop*/
while((*SSTAT&0x1)==0); /*Waitforanewcharacter*/
if(*RBUF=='H')

for(i=3;i =0;i )

while((*SSTAT&0x1)==0); /*Waitforthenextdigit*/
digits[i]=*RBUF; /*Savethenewdigit(ASCII)*/

temp=digits[3] ##4; /*Shiftleftrstdigitby4bits,*/
*PAOUT=temp
$(digits[2]&0xF);/*appendsecondandsendtoA*/
temp=digits[1]
##4; /*Shiftleftthirddigitby4bits,*/
*PBOUT=temp
$(digits[0]&0xF);/*appendfourthandsendtoB*/



10

9.10.ConnecttheparallelportsAandBtothefourBCDto7-segmentdecoders.
Choosethat

,



,
!"
and
!


displaytherst,second,third
andfourthreceiveddigits,respectively.Assumethatallfourdigitsarriveimme-
diatelyafterthecharacterHhasbeenreceived.Then,thedesiredprogrammay
beobtainedbymodifyingtheprograminFigure9.10asfollows:
RBUF EQU $FFFFFFE0Receivebuffer.
SSTATEQU $FFFFFFE2Statusregisterforserialinterface.
PAOUTEQU $FFFFFFF1PortAoutputdata.
PADIR EQU $FFFFFFF2PortAdirectionregister.
PBOUTEQU $FFFFFFF4PortBoutputdata.
PBDIREQU $FFFFFFF5PortBdirectionregister.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIRCongurePortAasoutput.
MoveByte#$FF,PBDIRCongurePortBasoutput.
*Transferthecharacters
LOOP Testbit#0,SSTAT Checkifnewcharacterisready.
Branch=0LOOP
MoveByteRBUF,R0 Readthecharacter.
Compare#$48,R0 CheckifH.
Branch

0LOOP
LOOP2Testbit#0,SSTAT Checkifrstdigitisready.
Branch=0LOOP2
MoveByteRBUF,R0 Readtherstdigit.
LShiftL#4,R0 Shiftleft4bitpositions.
LOOP3Testbit#0,SSTAT Checkifseconddigitisready.
Branch=0LOOP3
MoveByteRBUF,R1 Readtheseconddigit.
And #$F,R1 ExtracttheBCDvalue.
Or R1,R0 ConcatenatedigitsforPortA.
LOOP4Testbit#0,SSTAT Checkifthirddigitisready.
Branch=0LOOP4
MoveByteRBUF,R1 Readthethirddigit.
LShiftL#4,R1 Shiftleft4bitpositions.
LOOP5Testbit#0,SSTAT Checkiffourthdigitisready.
Branch=0LOOP5
MoveByteRBUF,R2 Readthefourthdigit.
And #$F,R2 ExtracttheBCDvalue.
Or R2,R1 ConcatenatedigitsforPortB.
MoveByteR0,PAOUT SenddigitstoPortA.
MoveByteR1,PBOUT SenddigitstoPortB.
Branch LOOP
11

9.11.ConnecttheparallelportsAandBtothefourBCDto7-segmentdecoders.
Choosethat

,



,
!"
and
!


displaytherst,second,third
andfourthreceiveddigits,respectively.UpondetectingthecharacterH,thesub-
sequentfourdigitshavetobesavedanddisplayedonlywhenthefourthdigit
arrives.InterruptsareusedtodetectthearrivalofbothHandthefourdigits.
Therefore,theinterruptserviceroutinehastokeeptrackofthereceivedcharac-
ters.Variable
issetto4whenanHisdetected,anditisdecrementedasthe
subsequentdigitsarrive.Thedesiredprogrammaybeobtainedbymodifyingthe
programinFigure9.16asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSCONT(char*)0xFFFFFFE3
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
#deneint
addr(int*)(0x24)
voidintserv();
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B
;
chartemp;
chardigits[4]; /*BufferforreceivedBCDdigits*/
intk=0; /*SetuptodetecttherstH*/
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Initializetheinterruptmechanism*/
int
addr=&intserv; /*Setinterruptvector*/
asm(”Move#0x40,%PSR”);/*ProcessorrespondstoIRQinterrupts*/
*SCONT=0x10; /*Enablereceiverinterrupts*/
/*Transferthecharacters*/
while(1); /*Inniteloop*/

12

/*Interruptserviceroutine*/
voidintserv()

*SCONT=0; /*Disableinterrupts*/
if(k
0)

k=k
1;
digits[k]=*RBUF; /*Savethenewdigit(ASCII)*/
if(k==0)

temp=digits[3]
##4; /*Shiftleftrstdigitby4bits,*/
*PAOUT=temp
$(digits[2]&0xF);/*appendsecondandsendtoA*/
temp=digits[1]
##4; /*Shiftleftthirddigitby4bits*/
*PBOUT=temp
$(digits[0]&0xF);/*appendfourthandsendtoB*/

elseif(*RBUF=='H')k=4;

*SCONT=0x10; /*Enablereceiverinterrupts*/
asm(”ReturnI”); /*Returnfrominterrupt*/

9.12.ConnecttheparallelportsAandBtothefourBCDto7-segmentdecoders.
Choosethat

,



,
!"
and
!


displaytherst,second,third
andfourthreceiveddigits,respectively.UpondetectingthecharacterH,thesub-
sequentfourdigitshavetobesavedanddisplayedonlywhenthefourthdigit
arrives.InterruptsareusedtodetectthearrivalofbothHandthefourdigits.
Therefore,theinterruptserviceroutinehastokeeptrackofthereceivedcharac-
ters.Variable
issetto4whenanHisdetected,anditisdecrementedasthe
subsequentdigitsarrive.Thedesiredprogrammaybeobtainedbymodifyingthe
programinFigure9.14asfollows:
13

RBUF EQU $FFFFFFE0 Receivebuffer.
SCONT EQU $FFFFFFE3 Controlregforserialinterface.
PAOUT EQU $FFFFFFF1 PortAoutputdata.
PADIR EQU $FFFFFFF2 PortAdirectionregister.
PBOUT EQU $FFFFFFF4 PortBoutputdata.
PBDIR EQU $FFFFFFF5 PortBdirectionregister.
ORIGIN $200
DIG ReserveByte4 Bufferforreceiveddigits.
K Data 0 SetuptodetectrstH.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
MoveByte#$FF,PBDIR CongurePortBasoutput.
Move #INTSERV,$24Settheinterruptvector.
Move #$40,PSR ProcessorrespondstoIRQ.
MoveByte#$10,SCONT Enablereceiverinterrupts.
*Transferloop
LOOP Branch LOOP Innitewaitloop.
*Interruptserviceroutine
INTSERVMoveByte#0,SCONT Disableinterrupts.
MoveByteRBUF,R0 Readthecharacter.
Move K,R1 Seeifanewdigit
Branch
0 NEWDIG isexpected.
Compare #$48,R0 CheckifH.
Branch

0 DONE
Move #4,K DetectedanH.
Branch DONE
NEWDIG And #$F,R0 ExtracttheBCDvalue.
Subtract #1,R1 DecrementK.
MoveByteR0,DIG(R1) Savethedigit.
Move R1,K
Branch
0 DONE Expectmoredigits.
Move #DIG,R0 Pointertobufferfordigits.
DISP MoveByte(R0)+,R1 Getfourthdigit.
MoveByte(R0)+,R2 Getthirddigitand
LShiftL #4,R2 shiftitleft.
Or R1,R2 ConcatenatedigitsforPortB.
MoveByteR2,PBOUT SenddigitstoPortB.
MoveByte(R0)+,R1 Getseconddigit.
MoveByte(R0)+,R2 Getrstdigitand
LShiftL #4,R2 shiftitleft.
Or R1,R2 ConcatenatedigitsforPortA.
MoveByteR2,PAOUT SenddigitstoPortA.
DONE MoveByte#$10,SCONT Enablereceiverinterrupts.
ReturnI Returnfrominterrupt.
14

9.13.UseatabletoconvertareceivedASCIIdigitintoa7-segmentcodeasexplained
inthesolutionforProblem9.1.Connectthebits
%'&to %)(ofallfourregisters
tobits

ofPortA.Usebits
!

to
!
ofPortBasLoadsignalsfor
theregistersdisplayingtherst,second,thirdandfourthreceiveddigits,recpec-
tively.Then,therequiredtaskcanbeachievedbymodifyingtheprogramin
Figure9.11asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSSTAT(volatilechar*)0xFFFFFFE2
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B
;
charj;
chardigits[4]; /*BufferforreceivedBCDdigits*/
intk=0; /*SetuptodetecttherstH*/
inti;
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Transferthecharacters*/
while(1)

/*Inniteloop*/
while((*SSTAT&0x1)==0);/*Waitforanewcharacter*/
if(*RBUF=='H')

for(i=3;i
=0;i
)

while((*SSTAT&0x1)==0);/*Waitforthenextdigit*/
j=*RBUF&0xF; /*ExtracttheBCDvalue*/
digits[i]=seg7[j]; /*Save7-segmentcodeforthedigit*/

for(i=0;i #=3;i++)

*PAOUT=digits[i]; /*SendadigittoPortA*/
*PBOUT=1
##i; /*Loadthedigitintoitsregister*/
*PBOUT=0; /*CleartheLoadsignal*/




15

9.14.UseatabletoconvertareceivedASCIIdigitintoa7-segmentcodeasexplained
inthesolutionforProblem9.1.Connectthebits
%'&to
%)(ofallfourregisters
tobits


ofPortA.Usebits
!

to
!

ofPortBasLoadsignalsfor
theregistersdisplayingtherst,second,thirdandfourthreceiveddigits,recpec-
tively.Then,therequiredtaskcanbeachievedbymodifyingtheprogramin
Figure9.10asfollows:
RBUF EQU$FFFFFFE0Receivebuffer.
SSTATEQU$FFFFFFE2Statusregisterforserialinterface.
PAOUTEQU$FFFFFFF1PortAoutputdata.
PADIR EQU$FFFFFFF2PortAdirectionregister.
PBOUTEQU$FFFFFFF4PortBoutputdata.
PBDIREQU$FFFFFFF5PortBdirectionregister.
*Denetheconversiontableandbufferforreceiveddigits
ORIGIN $200
SEG7 DataByte $7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
DIG ReserveByte4 Bufferforreceiveddigits.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
MoveByte#$FF,PBDIR CongurePortBasoutput.
*Transferthecharacters
LOOP Testbit #0,SSTAT Checkifnewcharacterisready.
Branch=0 LOOP
MoveByteRBUF,R0 Readthecharacter.
Compare #$48,R0 CheckifH.
Branch

0 LOOP
Move #3,R1 Setupacounter.
LOOP2Testbit #0,SSTAT Checkifnextdigitisavailable.
Branch=0 LOOP2
MoveByteRBUF,R0 Readthedigit.
And #4,R0 ExtracttheBCDvalue.
MoveByteSEG7(R0),DIG(R1)Save7-segcodeforthedigit.
Subtract #1,R1 Checkifmoredigits
Branch

0LOOP2 areexpected.
Move #DIG,R0 Pointertobufferfordigits.
Move #8,R1 SetupLoadsignalfor
*+.
DISP MoveByte(R0)+,PAOUT Send7-segmentcodetoPortA.
MoveByteR1,PBOUT Loadthedigitintoitsregister.
MoveByte#0,PBOUT CleartheLoadsignal.
LShiftR #1,R1 SetLoadforthenextdigit.
Branch
0 DISP Therearemoredigitstosend.
Branch LOOP
16

9.15.UseatabletoconvertareceivedASCIIdigitintoa7-segmentcodeasexplained
inthesolutionsforProblems9.1.and9.2.Connectthebits
%)&to %)(ofall
fourregisterstobits

ofPortA.Usebits
!

to
!"
ofPortBasLoad
signalsfortheregistersdisplayingtherst,second,thirdandfourthreceived
digits,recpectively.UpondetectingthecharacterH,thesubsequentfourdigits
havetobesavedanddisplayedonlywhenthefourthdigitarrives.Interruptsare
usedtodetectthearrivalofbothHandthefourdigits.Therefore,theinterrupt
serviceroutinehastokeeptrackofthereceivedcharacters.Variable
issetto
4whenanHisdetected,anditisdecrementedasthesubsequentdigitsarrive.
ThedesiredprogrammaybeobtainedbymodifyingtheprograminFigure9.16
asfollows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSCONT(char*)0xFFFFFFE3
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePBOUT(char*)0xFFFFFFF4
#denePBDIR(char*)0xFFFFFFF5
#deneint
addr(int*)(0x24)
voidintserv();
voidmain()

/*Denetheconversiontable*/
charseg7[10]=

0x7E,0x30,0x6C,0x79,0x33,0x5B,0x5F,0x30,0x3F,0x3B ;
charj;
chardigits[4]; /*BufferforreceivedBCDdigits*/
intk=0; /*SetuptodetecttherstH*/
inti;
/*Initializetheparallelports*/
*PADIR=0xFF; /*CongurePortAasoutput*/
*PBDIR=0xFF; /*CongurePortBasoutput*/
/*Initializetheinterruptmechanism*/
int
addr=&intserv; /*Setinterruptvector*/
asm(”Move#0x40,%PSR”);/*ProcessorrespondstoIRQinterrupts*/
*SCONT=0x10; /*Enablereceiverinterrupts*/
/*Transferthecharacters*/
while(1); /*Inniteloop*/

17

/*Interruptserviceroutine*/
voidintserv()

*SCONT=0; /*Disableinterrupts*/
if(k
0)

j=*RBUF&0xF; /*ExtracttheBCDvalue*/
k=k
1;
digits[k]=seg7[j]; /*Save7-segmentcodefornewdigit*/
if(k==0)

for(i=0;i #=3;i++)

*PAOUT=digits[i];/*SendadigittoPortA*/
*PBOUT=1
##i; /*Loadthedigitintoitsregister*/
*PBOUT=0; /*CleartheLoadsignal*/


elseif(*RBUF=='H')k=4;

*SCONT=0x10; /*Enablereceiverinterrupts*/
asm(”ReturnI”); /*Returnfrominterrupt*/

9.16.UseatabletoconvertareceivedASCIIdigitintoa7-segmentcodeasexplained
inthesolutionsforProblems9.1.and9.2.Connectthebits
%)&to
%)(ofall
fourregisterstobits

ofPortA.Usebits
!

to
!"
ofPortBasLoad
signalsfortheregistersdisplayingtherst,second,thirdandfourthreceived
digits,recpectively.UpondetectingthecharacterH,thesubsequentfourdigits
havetobesavedanddisplayedonlywhenthefourthdigitarrives.Interruptsare
usedtodetectthearrivalofbothHandthefourdigits.Therefore,theinterrupt
serviceroutinehastokeeptrackofthereceivedcharacters.Variable
issetto
4whenanHisdetected,anditisdecrementedasthesubsequentdigitsarrive.
ThedesiredprogrammaybeobtainedbymodifyingtheprograminFigure9.14
asfollows:
RBUF EQU$FFFFFFE0Receivebuffer.
SCONTEQU$FFFFFFE3Controlregforserialinterface.
PAOUTEQU$FFFFFFF1PortAoutputdata.
PADIR EQU$FFFFFFF2PortAdirectionregister.
PBOUTEQU$FFFFFFF4PortBoutputdata.
PBDIREQU$FFFFFFF5PortBdirectionregister.
18

*Denetheconversiontableandbufferforreceiveddigits
ORIGIN $200
SEG7 DataByte $7E,$30,$6C,$79,$33,$5B,$5F,$30,$3F,$3B
DIG ReserveByte4 Bufferforreceiveddigits.
K Data 0 SetuptodetectrstH.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
MoveByte#$FF,PBDIR CongurePortBasoutput.
Move #INTSERV,$24 Settheinterruptvector.
Move #$40,PSR ProcessorrespondstoIRQ.
MoveByte#$10,SCONT Enablereceiverinterrupts.
*Transferloop
LOOP Branch LOOP Innitewaitloop.
*Interruptserviceroutine
INTSERVMoveByte#0,SCONT Disableinterrupts.
MoveByteRBUF,R0 Readthecharacter.
Move K,R1 Seeifanewdigit
Branch
0 NEWDIG isexpected.
Compare #$48,R0 CheckifH.
Branch

0 DONE
Move #4,K DetectedanH.
Branch DONE
NEWDIG And #$F,R0 ExtracttheBCDvalue.
Subtract #1,R1 DecrementK.
MoveByteSEG7(R0),DIG(R1)Save7-segcodeforthedigit.
Move R1,K
Branch
0 DONE Expectmoredigits.
Move #DIG,R0 Pointertobufferfordigits.
Move #8,R1 SetupLoadsignalfor
*
.
DISP MoveByte(R0)+,PAOUT Send7-segmentcodetoPortA.
MoveByteR1,PBOUT Loadthedigitintoitsregister.
MoveByte#0,PBOUT CleartheLoadsignal.
LShiftR #1,R1 SetLoadforthenextdigit.
Branch
0 DISP Therearemoredigitstosend.
DONE MoveByte#$10,SCONT Enablereceiverinterrupts.
ReturnI Returnfrominterrupt.
9.17.ProgramsinFigures9.17and9.18wouldnotworkproperlyifthecircularbuffer
waslledwith80characters.Aftertheheadpointerwrapsaround,itwould
trailthetailpointerandwouldcatchupwithitifthebufferisfull.Atthis
pointitwouldbeimpossibletousethesimplecomparisonofthetwopointers
todeterminewhetherthebufferisemptyorfull.Thesimplestmodicationisto
increasethebuffersizeto81characters.
19

9.18.Usingacountervariable,
,,theprograminFigure9.17canbemodiedas
follows:
/*Deneregisteraddresses*/
#deneRBUF(volatilechar*)0xFFFFFFE0
#deneSSTAT(volatilechar*)0xFFFFFFE2
#denePAOUT(char*)0xFFFFFFF1
#denePADIR(char*)0xFFFFFFF2
#denePSTAT(volatilechar*)0xFFFFFFF6
#deneBSIZE80
voidmain()

unsignedcharmbuffer[BSIZE];
unsignedcharn,fout;
unsignedchartemp;
intM=0;
/*InitializePortAandcircularbuffer*/
*PADIR=0xFF; /*CongurePortAasoutput*/
n=0;
fout=0;
/*Transferthecharacters*/
while(1)

/*Inniteloop*/
while((*SSTAT&0x1)==0)

/*Waitforanewcharacter*/
if(M
0)

/*Ifcircularbufferisnotempty*/
if(*PSTAT&0x2)

/*andoutputdeviceisready*/
*PAOUT=mbuffer[fout];/*sendacharactertoPortA*/
M=M
1; /*Decrementthequeuecounter*/
if(fout
#BSIZE
1) /*Updatetheoutputindex*/
fout++;
else
fout=0;



mbuffer[n]=*RBUF; /*Readacharacterfromreceivebuffer*/
M=M+1; /*Incrementthequeuecounter*/
if(n
#BSIZE 1) /*Updatetheinputindex*/
n++;
else
n=0;


20

9.19.Usingacountervariable, ,,theprograminFigure9.18canbemodiedas
follows:
RBUF EQU $FFFFFFE0 Receivebuffer.
SSTATEQU $FFFFFFE2 Statusregforserialinterface.
PAOUTEQU $FFFFFFF1 PortAoutputdata.
PADIREQU $FFFFFFF2 PortAdirectionregister.
PSTATEQU $FFFFFFF6 Statusregforparallelinterface.
MBUFReserveByte80 Denethecircularbuffer.
*Initialization
ORIGIN $1000
MoveByte#$FF,PADIR CongurePortAasoutput.
Move #MBUF,R0 R0pointstothebuffer.
Move #0,R1 Initializeheadpointer.
Move #0,R2 Initializetailpointer.
Move #0,R3 Initializequeuecounter.
*Transferthecharacters
LOOP Testbit #0,SSTAT Checkifnewcharacterisready.
Branch

0 READ
Compare #0,R3 Checkifqueueisempty.
Branch=0 LOOP Queueisempty.
Testbit #1,PSTAT CheckifPortAisready.
Branch=0 LOOP
MoveByte(R0,R2),PAOUTSendacharactertoPortA.
Subtract #1,R3 Decrementthequeuecounter.
Add #1,R2 Incrementthetailpointer.
Compare #80,R2 Isthepointerpastqueuelimit?
Branch
#0 LOOP
Move #0,R2 Wraparound.
Branch LOOP
READ MoveByteRBUF,(R0,R1)Placenewcharacterintoqueue.
Add #1,R3 Incrementthequeuecounter.
Add #1,R1 Incrementtheheadpointer.
Compare #80,R1 Isthepointerpastqueuelimit?
Branch
#0 LOOP
Move #0,R1 Wraparound.
Branch LOOP
21

9.20.Connectthetwo7-segmentdisplaystoPortA.Usethe3bitsofPortBtoconnect
totheswitchesandLEDasshowninFigure9.19.Itisnecessarytomodifythe
conversionanddisplayportionsofprogramsinFigures9.20and9.21.
TheendoftheprograminFigure9.20shouldbe:
/*Computethetotalcount*/
totalcount=(0xFFFFFFFF countervalue);
/*Convertcounttotime*/;
actualtime=totalcount/1000000; /*Timeinhundredthsofseconds*/
tenths=actualtime/10;
hundredths=actualtime
tenths*10;
*PAOUT=((tenths
##4) $hundredths);/*Displaytheelapsedtime*/


TheendoftheprograminFigure9.20shouldbe:
*Convertthecounttoactualtimeinhundredthsofseconds,
*andthentoBCD.PuttheBCDdigitsinR4.
Move #1000000,R1Determinethecountin
Divide R1,R2 hundredthsofseconds.
Move #10,R1 Divideby10tondthedigitthat
Divide R1,R2 denotes1/10thofasecond.
LShiftL#4,R3 TheBCDdigits
Or R2,R3 areplacedinR3.
MoveByteR3,PAOUT SenddigitstoPortA.
Branch START Readyfornexttest.
22

Chapter10–ComputerPeripherals
10.1.Revisedproblemstatement:Thetotaltimerequiredtoilluminateeachpixelon
thedisplayscreenofacomputermonitoris5ns.Thebeamisthenturnedoffand
movedtothenextpointtobeilluminated.Onaverage,movingthebeamfrom
onespottothenexttakes12ns.Whatisthemaximumpossibleresolutionof
thisdisplayifitistoberefreshed70timespersecond.
ForNpixelsweget
"!$#%&'%() *+
Hence,N=840000pixels
Acommercialstandardthatwouldnotexceedthisresolutionis1024
,768.
10.2.Eachsymbolcanhaveoneofeightpossiblevalues,whichmeansitrepresents
threebits.Therefore:
-
/.0123*
,54
67.028*
,&9":
"3;<1;"
4
+=!+
10.3.Inpreparingthisdesignwehaveassumedthefollowing:
>
ThecounterhasasynchronousClearsignal.Thatis,thecounteriscleared
to0ontheclockedgeattheendofaclockperiodduringwhichClear=1.
>
TheshiftregisterhasasynchronouscontrolsignalcalledShift.Thedata
valueatitsserialinputisshiftedintotheregisterontheclockedgeatthe
endofaclockperiodduringwhichShift=1.
WeuseaDip-opasasynchronizerfortheinputdata.Itsoutput,SData,
followstheinputdata,butissynchronizedwiththelocalclock.Itisconnected
totheserialinputoftheshiftregister.Boththeshiftregisterandthecounterare
drivenbythelocalclock.
WewillnowdescribethecontrollogicthatgeneratestheClearandShiftsignals.
StartingfromanidlestateinwhichSData=1,Clear=1,andShift=0,the
sequenceofeventsthatthecontrollogicneedstoimplementisasfollows:
(a)WhenSData=0changeClearto0.Thecounterstartstocount.
(b)Whencount=3(thefourthclockcycle),setClear=1foroneclockcycle.
Theclockedgeattheendofthiscycleisthemid-pointoftheStartbit.The
counterisclearedto0atthispoint,thenitstartstocountagain.
(c)Whencountreaches7,setbothClearandShiftto1foroneclockcycle.At
theendofthisclockcycle,therstdatabitisloadedintheshiftregister
andthecounterisagainclearedto0.Repeattwice.
(d)Whencount=7,setA=SDATAandB=
@?A"
.
1

Strt1 Shft1
Shft2Shft3
End
Idle
Stp
SData=0/
Clear
Cnt=3/
Clear
Cnt=7/
Clear, Shift
Cnt=7/
Clear, Shift
Cnt=7/
Clear, Shift
Cnt=7/
Clear, Set A&B
Cnt½<7/
Clear
Cnt½<3/
Clear
Cnt½<7/
Clear
Cnt½<7/
Clear
SData=1/
Clear
Cnt½<7/
Clear
SData=1/
Clear
SData=0/
Clear
(e)WaituntilSData=1thenreturntostep1.
Astatediagramforthecontrollogicisgivenbelow.Whennotspecied,outputs
areequaltozero.
10.4Eachdatabyterequires10bitstotransmit.Hence,theeffectivetransmissionrate
is38,800/10=3,800bytes/s.
10.5A:11000001, P:01010000, =:00111101, 5:1011
0101
2

10.6(Correction:Bit
BCistheDataSetReadysignal,CC).
WewillrefertotheregistergivenintheproblemasSTATUS.Theprogrambelow
dealswithanincomingcall.
BitSet #1,STATUS Enableautomaticanswering
RING BitTest#14,StatusWaitforringingsignal
Branch=0RING
*Atthispoint,theprogrammayalerttheuser(ortheoperating-system)ofanin-comingcall
Ready BitTest#7,Status WaitforDataSetReady
Branch=0Ready
BitSet #2,STATUS Enablesendcarrier
SENDC BitTest#13,STATUSWaitforconrmation
Branch=0SENDC
RECVC BitTest#12,STATUSWaitforreceivecarrier
Branch=0RECVC
*Programisnowreadytosendandreceivedata
3

Chapter11
ProcessorFamilies
11.1.ThemainideasofconditionalexecutionofARMinstructions(seeSections3.1.2
andB.1)andconditionalexecutionofIA-64instructions,calledpredication(see
Section11.7.2),areverysimilar.
Thedifferencesoccurinthewaythattheconditionsaresetandstoredinthe
processor,andinthewaythattheyarereferencedbytheconditionallyexecuted
instructions.
InARMprocessors,thestateisstoredinfourconventionalconditioncodeags
N,Z,C,andV(seeSection3.1.1).Theseagsareoptionallysetbytheresults
ofinstructionexecution.Theparticularcondition,whichmaybeafunctionof
morethanoneag,isnamedintheconditioneldofeachARMinstruction(see
FigureB.1andTableB.1).
IntheIA-64architecture,therearenoconventionalconditioncodeags.In-
stead,theresult(trueorfalse)ofexecutingaCompare<condition>instruction
isstoredinoneof64one-bitpredicateregisters,asdescribedinSection11.7.2.
Eachinstructioncannameoneofthesebitsinits6-bitpredicateeld;andthe
instructionisexecutedonlyifthebitis1(true).
1

11.2.AssumethatThumbarithmeticinstructionshavea2-operandformat,expressed
inassemblylanguageas
OPRdst,Rsrc
asdiscussedinSection11.1.1
AlsoassumethatasignedintegerDivideinstruction(DIV)isavailableinthe
Thumbinstructionsetwiththeassemblylanguageformat
DIVRdst,Rsrc
Thisinstructionperformstheoperation[Rdst]/[Rsrc].Itstoresthequotientin
RdstandstorestheremainderinRsrc.
Undertheseassumptions,apossibleThumbprogramwouldbe:
LDRR0,G
LDRR1,H
ADDR0,R1Leavesg+hinR0.
LDRR1,E
LDRR2,F
MULR1,R2LeavesefinR1.
DIVR1,R0Leaves(ef)=(g+h)inR1.
LDRR0,C
LDRR2,D
DIVR0,R2Leavesc=dinR0.
ADDR0,R1LeavesdenominatorinR0.
LDRR1,A
LDRR2,B
ADDR1,R2Leavesa+binR1.
DIVR1,R0LeavesresultinR1.
STRR1,WStoresresultinw.
Thisprogramrequires16instructionsascomparedto13instructionwords(some
combinedinstructions)intheHP3000.
2

11.3.Thefollowingtableshowssomeoftheimportantareasforsimilarity/difference
comparisons.
MOTOROLA680X0 INTEL80X86
8Dataregistersand8Address8Generalregisters(including
registers(includinga aprocessorstackregister)
processorstackregister)
CISCinstructionsetwith CISCinstructionsetwith
exibleaddressingmodes exibleaddressingmodes
Largeinstructionsetwith Largeinstructionsetwith
multiple-registerload/storemultiple-registerpush/pop
instructions instructions
Memory-mappedI/Oonly SeparateI/Ospaceaswellas
memory-mappedI/O
Flataddressspace Segmentedaddressspace
Big-endianaddressing Little-endianaddressing
Thereisroughlycomparablecapabilityandperformancebetweenpairsfrom
thesetwofamilies;thatis68000vs.8086,68020vs.80286,68030vs.80386,
and68040vs.80486.Thecacheandpipeliningaspectsforthehighendofeach
familyaresummarizedinSections11.2.2and11.3.3.
11.4.Aninstructioncacheissimplertoimplement,becauseitsentriesdonothaveto
bewrittenbacktothemainmemory.Adatacachemusthaveaprovisionforwrit-
inganychangedentriesbacktothememory,beforetheyareoverwrittenbynew
entries.¿Fromaperformancestandpoint,asinglelargerinstructioncachewould
beadvantageousonlyifthefrequencyofmemorydataaccesseswereverylow.
Auniedcachehasthepotentialperformanceadvantagethattheproportionsof
instructionsanddatavaryautomaticallyasaprogramisexecuted.However,if
separateinstructionanddatacachesareused,theycanbeaccessedinparallelin
apipelinedmachine;andthisisthemajorperformanceadvantage.
11.5.Memory-mappedI/Orequiresnospecializedsupportintermsofeitherinstruc-
tionsorbussignals.AseparateI/OspaceallowssimplerI/Ointerfacesand
potentiallyfasteroperation.ProcessorssuchasthoseintheIA-32family,that
haveaseparateI/Ospace,canalsousememory-mappedI/O.
3

11.6.MOTOROLA-TheAutoincrementandAutodecrementmodesfacilitatestack
implementationandaccessingsuccessiveitemsinalist.Signicantexibility
inaccessingstructuredlistsandarraysofaddressesanddataofdifferentsizes
isprovidedbythedisplacement,offset,andscalefactorfeatures,coupledwith
indirection.
INTEL-Relocatabilityinthephysicaladdressspaceisfacilitatedbytheway
inwhichbase,indexanddisplacementfeaturesareusedingeneratingvirtual
addresses.AsintheMotorolaprocessors,thesemultiple-componentaddress
featuresenableexibleaccesstoaddresslistsanddatastructures.
Inbothfamiliesofprocessors,byte-addressabilityenableshandlingofcharacter
strings,andtheIntelIA-32Stringinstructions(seeSections3.21.3andD.4.1)
facilitatemovementandprocessingofbyteanddoubleworddatablocks.The
MotorolaMOVEMandMOVEPinstructionsperformsimilaroperations.
11.7.Flataddressspace—Simplestcongurationfromthestandpointofasingleuser
programanditscompilation.
Oneormorevariable-lengthsegments—Efcientallocationofavailablemem-
oryspacetovariable-lengthuseroroperatingsystemprograms.
Pagedmemory—Facilitatesautomatedmemorymanagementbetweentherandom-
accessmainmemoryandasector-organizeddisksecondarymemory(seeChap-
ters5and10).Accessprivilegescanbecontrolledonapage-by-pagebasis
toensureprotectionamongusers,andbetweenusersandtheoperatingsystem
whenshareddataareinvolved.
Segmentationandpaging—Mostexiblearrangementformanagingmultiple
userandsystemaddressspaces,includingprotectionmechanisms.Thevirtual
addressspacecanbesignicantlylargerthanthephysicalmainmemoryspace.
4

11.8.ARMprogram:
AssumethatasignedintegerDivideinstructionisavailableintheARMinstruc-
tionset,andthatithasthesameformatastheMultiply(MUL)instruction(see
FigureB.4).TheassemblylanguageexpressionfortheDivide(DIV)instruction
is
DIVRd,Rm,Rs
anditperformstheoperation[Rm]/[Rs],loadingthequotientintoRmandthe
remainderintoRd.
LDRR0,C
LDRR1,D
DIVR2,R0,R1 Leavesc=dinR0.
LDRR1,G
LDRR2,H
ADDR1,R1,R2 Leavesg+hinR1.
LDRR2,F
DIVR3,R2,R1 Leavesf=(g+h)inR2.
LDRR3,E
MLAR1,R2,R3,R0LeavesdenominatorinR1.
LDRR0,A
LDRR2,B
ADDR0,R0,R2 Leavesa+binR0.
DIVR2,R0,R1 LeavesresultinR0.
STRR0,W Storesresultinw.
Thisprogramrequires15instructionsascomparedto13instructionwords(some
combinedinstructions)intheHP3000.
5

68000program(assume16-bitoperands):
MOVEG,D0
ADD H,D0Leavesg+hinD0.
MOVEE,D1
MULS F,D1LeavesefinD1.
DIVS D0,D1Leaves(ef)=(g+h)inD1.
MOVEC,D0
EXT.LD0 SeeNotebelow.
DIVS D,D0Leavesc=dinD0.
ADD D1,D0LeavesdenominatorinD0.
MOVEA,D1
ADD B,D1
EXT.LD1 SeeNotebelow.
DIVS D0,D1LeavesresultinD1.
MOVED1,WStoresresultinw.
Note:TheEXT.Linstructionsign-extendsthe16-bitdividendinthedestination
registerto32bits,arequirementoftheDivideinstruction.
Thisprogramcontains14instructions,ascomparedto13instructionwords
(somecombinedinstructions)intheHP3000.
IA-32program:
MOV EBX,G
ADD EBX,H Leavesg+hinEBX.
MOV EAX,E
IMUL EAX,F LeavesefinEAX.
CDQ SeeNotebelow.
IDIV EBX
MOV EBX,EAX Leaves(ef)=(g+h)inEBX.
MOVEEAX,C
CDQ SeeNotebelow.
IDIV D Leavesc=dinEAX.
ADD EBX,EAX LeavesdenominatorinEBX.
MOVEEAX,A
ADD EAX,B Leavesa+binEAX.
CDQ SeeNotebelow.
IDIV EBX LeavesresultinEAX.
MOV W,EAX Storesresultinw.
Note:TheCDQinstructionsign-extendsEAXintoEDX(seeSection3.23.1),a
requirementoftheDivideinstruction.
Thisprogramcontains16instructions,ascomparedto13instructionwords
(somecombinedinstructions)intheHP3000.
6

11.9.A4-waymultiplexerisrequired,asshowninthefollowinggure.
_ _
datapathout
low-orderbyte
32-bit
datapathin
4-waymultiplexer MUX
8 888
11.10.TherearenodirectcounterpartsofthememorystackpointerregistersSPandFP
intheIA-64architecture.TheregisterremappinghardwareinIA-64processors
allowsthemainprogramandanysequenceofnestedsubroutinestoalluselogi-
calregisteraddressesR32andupwardfortheirownlocalvariables,withtherst
partofthatregisterspacecontainingparameterspassedfromthecallingroutine.
AnexampleofthisisshowninFigure11.4.
Ifthe92registersofthestackedphysicalregisterspaceareusedupbyregister
allocationsforasequenceofnestedsubroutinecalls,thensomeofthosephysical
registersmustbespilledintomemorytocreatephysicalregisterspaceforany
additionalnestedsubroutines.Thememorypointerregisterusedbytheproces-
sorforthatmemoryareacouldbeconsideredasacounterpartofSP;butitisnot
actuallyusedasaTOSpointerbythecurrentroutine.Infact,itisnotvisibleto
userprograms.
7

11.11.Considertheexampleofamainprogramcallingasubroutine,asshowninFig-
ure11.4.Thephysicalregisteraddressesofregistersusedbythemainprogram
arethesameasthelogicalregisteraddressesusedinthemainprograminstruc-
tions.However,thelogicalregisteraddressesabove31usedbyinstructionsin
thesubroutinemusthave8addedtothemtogeneratethecorrectphysicalregister
addresses.
Thevalue8istherstoperandintheAlloc8,4instructionexecutedbythemain
program.Whenthatinstructionisexecuted,thevalue8isstoredinaprocessor
stateregisterassociatedwiththemainprogram.Afterthesubroutineisentered,
alllogicalregisteraddressesabove31issuedbyitsinstructionsmustbeadded,
inasmalladder,tothevalue(8)inthatregister.Theoutputofthisadderisthe
physicalregisteraddresstobeusedwhileinthesubroutine.
Theoperand7intheAlloc7,3instructionexecutedbythesubroutineisstored
inasecondprocessorstateregisterassociatedwiththesubroutine.Theoutputof
thatregisterisaddedinasecondaddertotheoutputoftherstadder.Afterthe
subroutinecallsasecondsubroutine,logicalregisteraddressesabove31issued
bythesecondsubroutinearesentintotherstadder.Theoutputofthesecond
adder(logicaladdress+8+7)isthephysicalregisteraddressusedwhileinthe
secondsubroutine.
Moreregister/adderpairsarecascadedontothisstructureasmoresubroutines
arecalled.Notethatlogicalregisteraddressesabove31arealwaysappliedto
therstadder;andtheoutputofthenthadderisthephysicalregisteraddress
tobeusedinthenthsubroutine.Allregistersandaddersareonly7bitswide
becausethelargestphysicalregisteraddressthatneedstobegeneratedis127.
8

11.12.Consideringcacheingeffectsonly,theaverageaccesstimeoverbothinstruction
anddataaccessesisafunctionofbothcachehitratesandmisspenalties(see
Sections5.6.2and5.6.3forgeneralexpressionsforaverageaccesstime).
Thehitratesinthe21264L1cacheswillbemuchhigherthaninthe21164L1
cachesbecausethe21264cachesareeighttimeslarger.Therefore,theaverage
accesstimeforaccessesthatcanbemadeon-chipwillbelargerinthe21164
becauseofthemisspenaltyingoingtoitson-chipL2cache.
Next,weneedtoconsidertheeffectonaverageaccesstimeofgoingtotheoff-
chipcachesineachsystem.Thetotalon-chipcachecapacity(112Kbytesin
the21164and128Kbytesinthe21264)isaboutthesameinboththesystems.
Therefore,wecanassumeaboutthesamehitrateforon-chipaccesses;sothe
effectonaverageaccesstimeofthemisspenaltiesingoingtotheoff-chipcaches
willbeaboutthesameineachsystem.
Finally,iftheoff-chipcacheshaveaboutthesamecapacity,theeffectonaverage
accesstimesofthemisspenaltiesingoingtothemainDRAMmemorieswillbe
aboutthesameineachsytem.
Thenetresultisthataverageaccesstimesinthe21264shouldbeshorterthanin
the21164,leadingtofasterprogramexecution,primarilybecauseofthedifferent
arrangementsoftheon-chipcaches.
11.13.HP3000program:
LOADA
LOADB
MPYM C
LOADD
MPYM E
ADD
LOADF
MPYM G
LOADH
MPYM I
DIV
DEL Combinedwithprevious
instruction.
ADD
MPY Combinedwithprevious
instruction.
STOR W
9

11.14.Procedureigenerates8wordsofdata,Procedurejgenerates10wordsofdata,
andProcedurekgenerates3wordsofdata.Then,thetopwordsinthestackhave
thefollowingcontents:
_ _
TOS
[SR]
i
Returnaddress
i
[Indexreg.]
i
12
14
DQ
i
DI
1-DI
8
[Indexreg.]
j
Returnaddress
j
[SR]
j
[Indexreg.]
k
Returnaddress
k
[SR]
k
DJ
1-DJ
10
12
14
DK
2
DK
3
DK
1



10

11.15.HP3000program:
LOADA
ADDM B
LOADC
ADDM D
MPY
LOADD
MPYM E
ADD
STOR W
ARMprogram:
LDRR0,A
LDRR1,B
ADDR0,R0,R1
LDRR1,C
LDRR2,D
ADDR1,R1,R2
LDRR3,E
MULR2,R2,R3
MLAR0,R0,R1,R2
STRR0,W
68000program(assume16-bitoperands):
MOVEA,D0
ADD B,D0
MOVEC,D1
ADD D,D1
MULS D1,D0
MOVED,D1
MULS E,D1
ADD D1,D0
MOVED0,W
11

IA-32program:
MOVEAX,A
ADD EAX,B
MOVEBX,C
ADD EBX,D
IMULEAX,EBX
MOVEBX,D
IMULEBX,E
ADD EAX,EBX
MOVW,EAX
11.16.Four
11.17.Fourandtwo
12

Chapter12–LargeComputerSystems
12.1.Apossibleprogramis:
LOOPMove 0,STATUS
Move CURRENT,R1
Move R1,R2
Move R1,Rnet
ShiftrightRnet
Add Rnet,R2 Addcurrentvaluefromleft
Move R1,Rnet
ShiftleftRnet
Add Rnet,R2 Addcurrentvaluefromright
Move R1,Rnet
Shiftup Rnet
Add Rnet,R2 Addcurrentvaluefrombelow
Move R1,Rnet
ShiftdownRnet
Add Rnet,R2 Addcurrentvaluefromabove
Divide 5,R2 Averageallvevalues
Move R2,CURRENT
SubtractR2,R1
AbsoluteR1
SubtractEPSILON,R1
Skipif0
Move 1,STATUS
fControlprocessorANDsallSTATUSagsandexitsLOOPifresultis1;
otherwise,LOOPisrepeated.g
END LOOP
12.2.Assumethateachbushas64addresslinesand64datalines.Therearetwocases
toconsider.
i)Foruncachedreads,eachreadwithasplit-transactionbusrequires2T,con-
sistingof1Ttosendtheaddresstomemoryand1Ttotransferthedatatothe
processor.Usingaconventionalbus,ittakes6Tbecauseofthe4Tdelayin
readingthecontentsofthememory.Therefore,3conventionalbuseswouldgive
approximatelythesameperformanceasthesplit-transactionbus.
ii)Forcachedreads,itisnecessarytoconsiderthesizeofthecacheblock.
Assumethatthissizeis64bytes;therefore,ittakes8clockcyclestotransferan
entireblockoverthebus.
1

Usingasplit-transactionbusitispossibletouseallcyclestotransfereitherread
requests(addresses)ordata;therefore,ittakes9Tperread(notinconsecutive
clockcycles!).Usingaconventionalbuseachreadtakes13T(consecutiveclock
cycles).Thus,4ofthese13cyclesarewastedwaitingforthememoryresponse.
Thismeansthatinthiscasealsoitwouldbenecessarytouse3conventional
busestoobtainapproximatelythesameperformance.
12.3.Theperformancewouldnotimprovebyafactorof4,becausesomebustransac-
tionsinvolveuncachedreadsandwrites.Sinceuncachedaccessesinvolveonly
onewordofdata,theyuseonlyonequarterofthe4-wordwidebus.Ofcourse,
theoverallperformancewoulddependontheratioofcachedanduncachedac-
cesses.
12.4.Assumenisapowerof2becauseoftheformoftheshufenetwork.
Crossbarcost=n
2
.
Shufenetworkcost=2(n=2)log
2
n.
Solvingforsmallestnsatisfyingn
2
5[2(n=2)log
2
n]wherenisapowerof2,
givesn5log
2n.Atn=16,inequalityisnotsatised.Atn=32,inequalityis
satised.Therefore,thesmallestnis32.
12.5.Thenetworkis
Notethatthedenitionoftheshufepatternmustbegeneralizedinsuchaway
thatforeachsourceinputthereisapath(infact,exactlyonepath)toeachdesti-
nationoutput.
Costofnetworkbuiltfrom22switchesis(n=2)log
2n.
Costofnetworkbuiltfrom44switchesis4(n=4)log
4
n=n(log
2
n=log
2
4)=
(n=2)log
2n.
Therefore,thecostofthetwotypesofnetworksisthesame.
2

Blockingprobability:The44switchisanonblockingcrossbar,andcanbe
builtfrom22switchesas
Butthisisablockingnetwork.Therefore,theblockingprobabilityofalarge
networkbuiltfrom44switchesislowerthanonebuiltfrom22switches.
12.6.Programstructure:
SequentialsegmentS1(ktimeunits)
PARsegmentP1(1timeunit)
SequentialsegmentS2(ktimeunits)
PARsegmentP2(1timeunit)
SequentialsegmentS3(ktimeunits)
T1=3k+2k
Tn=3k+2d(k=n)e
Speedup=(5k)=(3k+2d(k=n)e)
Limitingvalueforspeedupis5/3.Thisshowsthatthesequentialsegmentsofa
programseverelylimitthespeedupwhenthesequentialsegmentstakeaboutthe
sametimetoexecuteasthetimetakentoexecutethePARsegmentsonasingle
processor.
12.7.Then-dimensionalhypercubeissymmetricwithrespecttoanynode.Thedis-
tancebetweennodesxandyisthenumberofbitpositionsthataredifferent
intheirbinaryaddresses.Thenumberofnodesthatarekhopsawayfromany
particularnodeis(
n
k
).Therefore,theaveragedistanceamessagetravelsis
[
n
X
k=1
k(
n
k)]=(2
n
1)
whichsimpliesto[2
n1
n]=(2
n
1),andislessthan(1+n)=2,ascanbe
veriedbytryingafewvalues.Forlargen,theaveragedistanceapproachesn/2.
12.8.WhenaTest-and-Setinstruction“fails,”thatis,whenthelockwasalreadyset,
thetaskshouldcalltheoperatingsystemtohaveitstasknamequeuedandto
allowsomeothertasktoexecute.Whenthetaskholdingthelockwishesto
releasethelock(setitto0),thetaskcallstheoperatingsystemtodoso,andthen
theoperatingsystemdequeuesandrunsoneofthewaitingtaskswhichisthen
3

theoneowningthelock.Ifnotaskiswaiting,thelockiscleared(=0)tothefree
state.
12.9.Thedetailsofhoweitherinvalidationorupdatingcanbeimplementedarede-
scribedinSection12.6.2,andtheadvantages/disadvantagesofthetwotech-
niquescanbededuceddirectlyfromthatdiscussion.Ingeneral,itwouldseem
thatinvalidationandwrite-backofdirtyvariablesresultsinlessbustrafcand
eliminatespotentiallywastedcacheupdatingoperations.However,cachehit
ratesmaybeloweredbyusingthisstrategy.Updatingassociatedwithawrite
throughpolicymayleadtohigherhitratesandmaybesimplertoimplement,
butmaycauseunacceptablyhighbustrafcandwastedupdateoperations.The
detailsofhowreadsandwritesonsharedcachedblocks(lines)arenormally
interleavedfromdistinctprocessorsinsomeclassofapplicationswillactually
determinewhichcoherencestrategyismostappropriate.
12.10.No.Ifcoherencecontrolsarenotused,asharedvariableincacheBmaynotget
updated/invalidatedwhenitiswrittentoincacheAwhileA'sprocessorhasmu-
tuallyexclusiveaccess.Later,whenB'sprocessoracquiresmutuallyexclusive
access,thevariablewillbeincorrect.
12.11.InFigure12.18,boththreadscontinuouslywritethesamesharedvariabledotproduct;
hence,thisisdoneserially.InFigure12.19,eachthreadupdatesitslocalvariable
localdotproduct,whichisdoneinparallel.Therefore,ifverylargevectorsare
used(sothattheactualcomputationofthedotproductdominatestheprocessing
time),theprograminFigure12.19maygivealmosttwiceasgoodperformance
astheprograminFigure12.18.
12.12.Itisonlynecessarytocreate3newthreads(ratherthanjustoneinFigure12.19),
andassignprocessingofonequarterofeachvectortoeachthread.
12.13.Theonlysignicantmodicationisfortheprogramwithid=0tosendone
quarterofeachvectortoprogramswithid=1,2,3.Havingcompletedthedot-
productcomputation,eachprogramwithid>0sendstheresulttotheprogram
withid=0,whichthenprintsthenalresult.
12.14.Overheadincreatingathreadisthemostsignicantconsideration.Otherover-
headisencounteredinthelockandbarriermechanisms.Assumethatthethread
overheadis300timesgreaterthantheexecutiontimeofthestatementthatcom-
putesthenewvalueofthedotproductforagivenvalueofk.Also,assumethat
theoverheadforlockandbarriermechanismsisonly10timesgreater.
Then,asaroughapproximation,thevectorsmusthaveatleast3202=640
elementsbeforeanyspeedupwillbeachieved.
12.15.Thedominantfactorinmessagepassingistheoverheadofsendingandreceiving
messages.Assumethattheoverheadofeithersendingorreceivingamessageis
1000timesgreaterthantheexecutiontimeofthestatementthatcomputesthe
newvalueofthedotproductforagivenvalueofk.Then,sincethereare3
4

sendand3receivemessagesinvolved,thevectorswillhavetohaveatleast
10006=6000elementsbeforeanyspeedupisachieved.
Notethatwehaveassumedthattheoverheadof1000isindependentofthesize
ofthemessage–asarstorderapproximation.
12.16.Theshared-memorymultiprocessorcanemulatethemessage-passingmulticom-
putereasierthantheotherwayaround.Theactofmessage-passingcanbeim-
plementedbythetransferof(message)bufferpointersorcomplete(message)
buffersbetweenthetwocommunicatingprocessesthatotherwiseonlyoperatein
theirownassignedareaofmainmemory.Amulticomputersystemcanemulate
amultiprocessorbyconsideringtheaggregateofallofthelocalmemoriesofthe
individualcomputersasthesharedmemoryofthemultiprocessor.Accessfrom
acomputertoanonlocalcomponentofthesharedmemorycanbefacilitatedby
passingmessagesbetweenthetwocomputersinvolved.Thisisacumbersome
andslowprocess.
12.17.Thesituationdescribedispossible.ConsiderstationsA,B,andC,situatedatthe
leftend,middle,andrightendofthebus,respectively.StationAstartstosenda
messagepacketof0:25durationtodestinationstationBattimet0.Thepacket
isobservedandcopiedintostationBduringtheinterval[t0+0:5;t0+0:75].
Justbeforet0+,stationCbeginstotransmitapackettosomeotherstation.
ItimmediatelycollideswithA'spacket,andthegarbledsignalarrivesbackat
stationAjustbeforet0+2.
12.18.(a)TheF/Ebitistested.Ifitis1(denoting”full”),thenthecontentsofBOXLOC
areloadedintoregisterR0,F/Eissetto0(denoting“empty”),andexecution
continueswiththenextsequentialinstruction.Otherwise(i.e.,for[F/E]=0),
nooperationsareperformedandexecutioncontrolispassedtotheinstructionat
locationWAITREC.
(b)Inthemultiprocessorsystemwiththemailboxmemory,eachone-wordmes-
sageissentfromT1toT2byusingthesingleinstructions:
SENDPUTR0,BOXLOC,SEND (1)
and
REC GETR0,BOXLOC,REC (2)
intasksT1andT2,respectively,assumingthat[F/E]=0initially.
Inthesystemwithoutthemailboxmemory,replace(1)intaskT1withthese-
quence:
WLOCK TAS.B WRITE
BMI WLOCK
MOV.WR0,LOC
CLR.B READ
andreplace(2)intaskT2withthesequence:
5

RLOCK TAS.B READ
BMI RLOCK
MOV.WLOCK,R0
CLR.B WRITE
LetthenotationV(7)standforbitb7ofbytelocationV.Ordinarywordlo-
cationLOCrepresentsthedataeldofmailboxlocationBOXLOC,andthe
combinationofWRITE(7)andREAD(7)representstheF/Ebitassociatedwith
BOXLOC.
Inparticular,[WRITE(7)]=0meansthatLOCisempty,and[READ(7)]=0
meansthatLOCisfull.
Initially,whenLOCisempty,thesettingsmustbe[WRITE(7)]=0and[READ(7)]
=1.NotethatwhentheinstructionMOV.Wisbeingexecutedineithertask,we
have[WRITE(7)]=[READ(7)]=1,indicatingthatLOCiseitherbeinglledor
emptied.Alsonotethatitisneverthecasethat[WRITE(7)]=[READ(7)]=0.
Thissolutionworkscorrectlyforthegeneralcasewhereanumberoftaskspass
datathroughLOC.
Forthecasesuggestedintheproblem,withasingletaskT1andasingletask
T2,thefollowingsequencesaresufcient.InT1,use:
TESTW TST.BFULL
BNE TESTW
MOV.WR0,LOC
MOV.B#1,FULL
InT2use:
TESTRTST.BFULL
BEQ TESTR
MOV.WLOC,R0
CLR.B FULL
Inthiscase,FULLplaystheroleoftheF/EbitofBOXLOC(with[FULL]=0
initially),andtheTASinstructionisnotneeded.
6