Processor Organization and Architecture

18,390 views 130 slides May 08, 2018
Slide 1
Slide 1 of 130
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130

About This Presentation

The topic focuses on different aspects of processor organization and architecture such as architecture models, register organization, instruction formats, addressing modes etc.


Slide Content

MODULE 2
PROCESSOR
ORGANIZATION AND
ARCHITECTURE
PROCESSOR
ORGANIZATION AND
ARCHITECTURE
By Prof.VinitRaut

Classification of Processors
•Categorizedbymemoryorganization
Von-Neumannarchitecture
Harvardarchitecture
•Categorizedbyinstructiontype
CISC
RISC
VLIW
•Categorizedbymemoryorganization
Von-Neumannarchitecture
Harvardarchitecture
•Categorizedbyinstructiontype
CISC
RISC
VLIW

Von Neumann Model
•In1946,JohnvonNeumannandhiscolleagues
beganthedesignofanewstoredprogram
computerreferredtoastheIAS(Institutefor
AdvancedStudy)computer.
•Storesprogramanddatainsamememory.
•Itwasdesignedtoovercomethelimitationof
previousENIACcomputer.
•ThelimitationofENIACcomputer:-
Thetaskofenteringandalteringprogramsfor
theENIACwasextremelytedious.
•In1946,JohnvonNeumannandhiscolleagues
beganthedesignofanewstoredprogram
computerreferredtoastheIAS(Institutefor
AdvancedStudy)computer.
•Storesprogramanddatainsamememory.
•Itwasdesignedtoovercomethelimitationof
previousENIACcomputer.
•ThelimitationofENIACcomputer:-
Thetaskofenteringandalteringprogramsfor
theENIACwasextremelytedious.

Structure of IAS computer

Structure of IAS computer
IASconsistsof-
•Amainmemory,whichstoresbothdataand
instructions
•AnALUcapableofoperatingonbinarydata
•Acontrolunit,whichinterpretstheinstructionsin
memoryandcausesthemtobeexecuted
•I/Oequipmentoperatedbythecontrolunit
IASconsistsof-
•Amainmemory,whichstoresbothdataand
instructions
•AnALUcapableofoperatingonbinarydata
•Acontrolunit,whichinterpretstheinstructionsin
memoryandcausesthemtobeexecuted
•I/Oequipmentoperatedbythecontrolunit

IAS Memory Formats
The memory of IAS consists of1000 storage locations,
called words, of40 binary digits(bits)each.
Both data and instructions are stored there.
Each number is represented bya sign bit and a 39-bit
value.
01 39
Sign bit
A word may also contain two 20-bit instructions, with
each instruction consisting ofan 8-bit operation
code(opcode) specifying the operation to be performed
anda12-bit addressdesignating one of the words in
memory .
Left Instruction Right Instruction
0 8 20 28
39
Opcode OpcodeAddress Address
IAS Memory Formats
The memory of IAS consists of1000 storage locations,
called words, of40 binary digits(bits)each.
Both data and instructions are stored there.
Each number is represented bya sign bit and a 39-bit
value.
01 39
Sign bit
A word may also contain two 20-bit instructions, with
each instruction consisting ofan 8-bit operation
code(opcode) specifying the operation to be performed
anda12-bit addressdesignating one of the words in
memory .
Left Instruction Right Instruction
0 8 20 28
39

Harvard Architecture
•Physicallyseparatestorageandsignalpathwaysfor
instructionsanddata.
•OriginatedfromtheHarvardMarkIrelay-based
computer,whichstored
Instructionsonpunchedtape(24bitswide)
Datainelectro-mechanicalcounters
•Insomesystems,instructionscanbestoredinread-only
memorywhiledatamemorygenerallyrequiresread-write
memory.
•Insomesystems,thereismuchmoreinstruction
memorythandatamemory.
•UsedinMCS-51,MIPSetc.
•Physicallyseparatestorageandsignalpathwaysfor
instructionsanddata.
•OriginatedfromtheHarvardMarkIrelay-based
computer,whichstored
Instructionsonpunchedtape(24bitswide)
Datainelectro-mechanicalcounters
•Insomesystems,instructionscanbestoredinread-only
memorywhiledatamemorygenerallyrequiresread-write
memory.
•Insomesystems,thereismuchmoreinstruction
memorythandatamemory.
•UsedinMCS-51,MIPSetc.

Harvard Architecture

Register Organization
•CPUmusthavesomeworkingspace(temporary
storage)calledregisters.
•Acomputersystememploysamemoryhierarchy.
•Atthehighestlevelofhierarchy,memoryisfaster,
smallerandmoreexpensive.
•WithintheCPU,thereisasetofregisterswhichcanbe
treatedasamemoryinthehighestlevelofhierarchy.
•CPUmusthavesomeworkingspace(temporary
storage)calledregisters.
•Acomputersystememploysamemoryhierarchy.
•Atthehighestlevelofhierarchy,memoryisfaster,
smallerandmoreexpensive.
•WithintheCPU,thereisasetofregisterswhichcanbe
treatedasamemoryinthehighestlevelofhierarchy.

Register Organization
•TheregistersintheCPUcanbecategorizedintotwo
groups
1.User-visibleregisters:
–Theseenablesthemachine-orassembly-language
programmertominimizemainmemoryreferencebyoptimizing
useofregisters.
2.Controlandstatusregisters:
–Theseareusedbythecontrolunittocontroltheoperationofthe
CPU.
–Operatingsystemprogramsmayalsousetheseinprivileged
modetocontroltheexecutionofprogram.
•TheregistersintheCPUcanbecategorizedintotwo
groups
1.User-visibleregisters:
–Theseenablesthemachine-orassembly-language
programmertominimizemainmemoryreferencebyoptimizing
useofregisters.
2.Controlandstatusregisters:
–Theseareusedbythecontrolunittocontroltheoperationofthe
CPU.
–Operatingsystemprogramsmayalsousetheseinprivileged
modetocontroltheexecutionofprogram.

User-visible registers
•GeneralPurpose
•Data
•Address
•ConditionCodes
•GeneralPurpose
•Data
•Address
•ConditionCodes

1.GeneralPurposeRegisters:
Usedforavarietyoffunctionsbytheprogrammer.
Sometimesusedforholdingoperands(data)ofan
instruction.
Sometimesusedforaddressingfunctions(e.g.,register
indirect,displacement).
2.Dataregisters:
Usedtoholdonlydata.
Cannotbeemployedinthecalculationofanoperandaddress.
1.GeneralPurposeRegisters:
Usedforavarietyoffunctionsbytheprogrammer.
Sometimesusedforholdingoperands(data)ofan
instruction.
Sometimesusedforaddressingfunctions(e.g.,register
indirect,displacement).
2.Dataregisters:
Usedtoholdonlydata.
Cannotbeemployedinthecalculationofanoperandaddress.

3.Addressregisters:
Usedexclusivelyforthepurposeofaddressing.
Examplesincludethefollowing:
1.Segmentpointer:
–Inamachinewithsegmentaddressing,asegmentregister
holdstheaddressofthebaseofthesegment.
–Theremaybemultipleregisters,oneforthecodesegmentand
oneforthedatasegment.
2.Indexregisters:
–Theseareusedforindexedaddressingandmaybeauto
indexed.
3.Stackpointer:
–Adedicatedregisterthatpointstothetopofthestack.
–AutoincrementedorautodecrementedusingPUSHorPOP
operation
3.Addressregisters:
Usedexclusivelyforthepurposeofaddressing.
Examplesincludethefollowing:
1.Segmentpointer:
–Inamachinewithsegmentaddressing,asegmentregister
holdstheaddressofthebaseofthesegment.
–Theremaybemultipleregisters,oneforthecodesegmentand
oneforthedatasegment.
2.Indexregisters:
–Theseareusedforindexedaddressingandmaybeauto
indexed.
3.Stackpointer:
–Adedicatedregisterthatpointstothetopofthestack.
–AutoincrementedorautodecrementedusingPUSHorPOP
operation

4.ConditionCodesRegisters:
Setsofindividualbits
•e.g.resultoflastoperationwaszero
Canberead(implicitly)byprograms
•e.g.Jumpifzero
Cannot(usually)besetbyprograms

Control and status registers
•Fourregistersareessentialtoinstructionexecution:
1.ProgramCounter(PC):
–Containstheaddressofaninstructiontobefetched.
2.InstructionRegister(IR):
–Containstheinstructionmostrecentlyfetched.
3.MemoryAddressRegister(MAR):
–Containstheaddressofalocationofmainmemoryfromwhere
informationhastobefetchedorinformationhastobestored.
4.MemoryBufferRegister(MBR):
–Containsawordofdatatobewrittentomemoryortheword
mostrecentlyread.
•Fourregistersareessentialtoinstructionexecution:
1.ProgramCounter(PC):
–Containstheaddressofaninstructiontobefetched.
2.InstructionRegister(IR):
–Containstheinstructionmostrecentlyfetched.
3.MemoryAddressRegister(MAR):
–Containstheaddressofalocationofmainmemoryfromwhere
informationhastobefetchedorinformationhastobestored.
4.MemoryBufferRegister(MBR):
–Containsawordofdatatobewrittentomemoryortheword
mostrecentlyread.

ProgramStatusWord(PSW)
Conditioncodebitsarecollectedintooneormoreregisters,
knownastheprogramstatusword(PSW),thatcontains
statusinformation.
Commonfieldsorflagsincludethefollowing:
•Sign:Containsthesignbitoftheresultofthelastarithmetic
operation.
•Zero:Setwhentheresultiszero.
•Carry:Setifanoperationresultedinacarry(addition)intoor
borrow(subtraction)outofahighorderbit.
•Equal:Setifalogicalcompareresultisequal.
•Overflow:Usedtoindicatearithmeticoverflow.
•Interruptenable/disable:Usedtoenableordisable
interrupts.
Control and status registers
ProgramStatusWord(PSW)
Conditioncodebitsarecollectedintooneormoreregisters,
knownastheprogramstatusword(PSW),thatcontains
statusinformation.
Commonfieldsorflagsincludethefollowing:
•Sign:Containsthesignbitoftheresultofthelastarithmetic
operation.
•Zero:Setwhentheresultiszero.
•Carry:Setifanoperationresultedinacarry(addition)intoor
borrow(subtraction)outofahighorderbit.
•Equal:Setifalogicalcompareresultisequal.
•Overflow:Usedtoindicatearithmeticoverflow.
•Interruptenable/disable:Usedtoenableordisable
interrupts.

Register organization of INTEL
8086 processor

Register organization of INTEL
8086 processor
•16-bitflags,InstructionPointer
•GeneralRegisters,16bits
AX–Accumulator,favoredincalculations
BX–Base,normallyholdsanaddressofavariableorfunc
CX–Count,normallyusedforloops
DX–Data,normallyusedformultiply/divide
•Segment,16bits
SS–Stack,basesegmentofstackinmemory
CS–Code,baselocationofcode
DS–Data,baselocationofvariabledata
ES–Extra,additionallocationformemorydata
•16-bitflags,InstructionPointer
•GeneralRegisters,16bits
AX–Accumulator,favoredincalculations
BX–Base,normallyholdsanaddressofavariableorfunc
CX–Count,normallyusedforloops
DX–Data,normallyusedformultiply/divide
•Segment,16bits
SS–Stack,basesegmentofstackinmemory
CS–Code,baselocationofcode
DS–Data,baselocationofvariabledata
ES–Extra,additionallocationformemorydata

Register organization of INTEL
8086 processor
•Index, 16 bits
BP–Base Pointer, offset from SS for locating subroutines
SP–Stack Pointer, offset from SS for top of stack
SI–Source Index, used for copying data/strings
DI–Destination Index, used for copy data/strings
•Index, 16 bits
BP–Base Pointer, offset from SS for locating subroutines
SP–Stack Pointer, offset from SS for top of stack
SI–Source Index, used for copying data/strings
DI–Destination Index, used for copy data/strings

INSTRUCTION FORMAT
•Theoperationofthecomputersystemaredeterminedby
theinstructionsexecutedbythecentralprocessing
unit.
•Theseinstructionsareknownasmachineinstruction
andareintheformofbinarycodes.
•EachinstructionoftheCPUhasspecificinformation
fieldwhicharerequiredtoexecuteit.
•Theseinformationfieldofinstructionsarecalled
elementsofinstruction.
•Theoperationofthecomputersystemaredeterminedby
theinstructionsexecutedbythecentralprocessing
unit.
•Theseinstructionsareknownasmachineinstruction
andareintheformofbinarycodes.
•EachinstructionoftheCPUhasspecificinformation
fieldwhicharerequiredtoexecuteit.
•Theseinformationfieldofinstructionsarecalled
elementsofinstruction.

Elements of Instruction
1.OperationCode:
Binarycodethatspecifieswhichoperationtobeperformed.
2.Sourceoperandaddress:
Specifiesoneormoresourceoperands
3.Destinationoperandaddress:
TheoperationexecutedbytheCPUmayproduceresultwhichis
storedinthedestinationaddress.
4.Nextinstructionaddress:
TellstheCPUfromwheretofetchthenextinstructionafter
completionofexecutionofcurrentinstruction.
1.OperationCode:
Binarycodethatspecifieswhichoperationtobeperformed.
2.Sourceoperandaddress:
Specifiesoneormoresourceoperands
3.Destinationoperandaddress:
TheoperationexecutedbytheCPUmayproduceresultwhichis
storedinthedestinationaddress.
4.Nextinstructionaddress:
TellstheCPUfromwheretofetchthenextinstructionafter
completionofexecutionofcurrentinstruction.

Representation of Instruction
OpcodeOpcodeOperand address1Operand address1Operand address2Operand address2OpcodeOpcodeOperand address1Operand address1Operand address2Operand address2

Instruction Types According to
Number of Addresses

Three Address Instruction

Two Address Instruction

One Address Instruction

Zero Address Instruction
•Thelocationoftheoperandsaredefinedimplicitly
•Forimplicitreference,aprocessorregisterisusedandit
istermedasaccumulator(AC).
•E.g.CMA //complementsthecontentofaccumulator
•i.e.ACAC
•Thelocationoftheoperandsaredefinedimplicitly
•Forimplicitreference,aprocessorregisterisusedandit
istermedasaccumulator(AC).
•E.g.CMA //complementsthecontentofaccumulator
•i.e.ACAC

Instruction Format Design
Issues:
•Aninstructionconsistsofanopcodeandoneormoreoperands,
implicitlyorexplicitly.
•Eachexplicitoperandisreferencedusingoneoftheaddressing
modethatisavailableforthatmachine.
•Aninstructionformatisusedtodefinethelayoutofthebits
allocatedtotheseelementsofinstructions.
•Someofissueseffectinginstructiondesignare:
1.InstructionLength
2.Allocationofbitsfordifferentfieldsinaninstruction
3.Variablelengthinstruction
•Aninstructionconsistsofanopcodeandoneormoreoperands,
implicitlyorexplicitly.
•Eachexplicitoperandisreferencedusingoneoftheaddressing
modethatisavailableforthatmachine.
•Aninstructionformatisusedtodefinethelayoutofthebits
allocatedtotheseelementsofinstructions.
•Someofissueseffectinginstructiondesignare:
1.InstructionLength
2.Allocationofbitsfordifferentfieldsinaninstruction
3.Variablelengthinstruction

1. Instruction Length
•Alongerinstructionmeansmoretimeinfetchingan
instruction.
•Fore.g.aninstructionoflength32bitonamachinewith
wordsizeof16bitwillneedtwomemoryfetchtobring
theinstruction.
•Programmerdesires:
–Moreopcodeandoperandsinainstructionasitwillreducethe
programlength.
–Moreaddressingmodeforgreaterflexibilityinaccessing
varioustypesofdata.
•Alongerinstructionmeansmoretimeinfetchingan
instruction.
•Fore.g.aninstructionoflength32bitonamachinewith
wordsizeof16bitwillneedtwomemoryfetchtobring
theinstruction.
•Programmerdesires:
–Moreopcodeandoperandsinainstructionasitwillreducethe
programlength.
–Moreaddressingmodeforgreaterflexibilityinaccessing
varioustypesofdata.

Factors for deciding the
instruction length:
A.MemorySize
–Morebitsarerequiredinaddressfieldtoaccesslargermemory
range.
B.MemoryOrganization
–Ifthesystemsupportsvirtualmemorythenmemoryrangeis
largerthanthephysicalmemory.Hencerequiredthemore
numberofaddressingbits.
C.BusStructure
–Theinstructionlengthshouldbeequaltodatabuslengthor
multipleofit.
D.ProcessorSpeed
–Thedatatransferratefromthememoryshouldbeequaltothe
processorspeed.
A.MemorySize
–Morebitsarerequiredinaddressfieldtoaccesslargermemory
range.
B.MemoryOrganization
–Ifthesystemsupportsvirtualmemorythenmemoryrangeis
largerthanthephysicalmemory.Hencerequiredthemore
numberofaddressingbits.
C.BusStructure
–Theinstructionlengthshouldbeequaltodatabuslengthor
multipleofit.
D.ProcessorSpeed
–Thedatatransferratefromthememoryshouldbeequaltothe
processorspeed.

2.Allocation of Bits
•Moreopcodesobviouslymeanmorebitsintheopcode
field.
•Factorswhichareconsideredforselectionofaddressing
bitsare:
A.NumberofAddressingmodes:
•Moreaddressingmodes,morebitswillbeneeded.
B.NumberofOperands:
•Moreoperands–morenumberofbitsneeded
C.Registerversusmemory:
•Ifmoreandmoreregisterscanbeusedforoperand
referencethenthefewerbitsareneeded
•Asnumberofregisterarefarlessthanmemorysize.
•Moreopcodesobviouslymeanmorebitsintheopcode
field.
•Factorswhichareconsideredforselectionofaddressing
bitsare:
A.NumberofAddressingmodes:
•Moreaddressingmodes,morebitswillbeneeded.
B.NumberofOperands:
•Moreoperands–morenumberofbitsneeded
C.Registerversusmemory:
•Ifmoreandmoreregisterscanbeusedforoperand
referencethenthefewerbitsareneeded
•Asnumberofregisterarefarlessthanmemorysize.

D.NumberofRegisterSets:
•AssumethatAmachinehas16generalpurposeregisters,a
registeraddressrequire4bits.
•Howeverifthese16registersaredividedintotwogroups,then
oneofthe8registerofagroupwillneed3bitsforregister
addressing.
E.AddressRange:
•Therangeofaddressesthatcanbereferencedisrelatedtothe
numberofaddressbits.
•Withdisplacementaddressing,therangeisopeneduptothe
lengthoftheaddressregister.
F.AddressGranularity:
•Inasystemwith16-or32-bitwords,anaddresscanreference
awordorabyteatthedesigner'schoice.
2.Allocation of Bits
D.NumberofRegisterSets:
•AssumethatAmachinehas16generalpurposeregisters,a
registeraddressrequire4bits.
•Howeverifthese16registersaredividedintotwogroups,then
oneofthe8registerofagroupwillneed3bitsforregister
addressing.
E.AddressRange:
•Therangeofaddressesthatcanbereferencedisrelatedtothe
numberofaddressbits.
•Withdisplacementaddressing,therangeisopeneduptothe
lengthoftheaddressregister.
F.AddressGranularity:
•Inasystemwith16-or32-bitwords,anaddresscanreference
awordorabyteatthedesigner'schoice.

3.Variable length Instruction
•Insteadoflookingforfixedlengthinstructionformat,
designermaychoosetoprovideavarietyofinstructions
formatsofdifferentlengths.
•Addressingcanbemoreflexible,withvarious
combinationsofregisterandmemoryreferencesplus
addressingmodes.
•Disadvantage:anincreaseinthecomplexityofthe
CPU.
•Insteadoflookingforfixedlengthinstructionformat,
designermaychoosetoprovideavarietyofinstructions
formatsofdifferentlengths.
•Addressingcanbemoreflexible,withvarious
combinationsofregisterandmemoryreferencesplus
addressingmodes.
•Disadvantage:anincreaseinthecomplexityofthe
CPU.

Concept of Program Execution
•Theinstructionsconstitutingaprogramtobeexecuted
byacomputerareloadedinsequentiallocationsinits
mainmemory.
•Processorfetchesoneinstructionatatimeandperform
theoperationspecified.
•Instructionsarefetchedfromsuccessivememory
locationsuntilabranchorajumpinstructionis
encountered.
•Processorkeepstrackoftheaddressofthememory
locationcontainingthenextinstructiontobefetched
usingProgramCounter(PC).
•InstructionRegister(IR)
•Theinstructionsconstitutingaprogramtobeexecuted
byacomputerareloadedinsequentiallocationsinits
mainmemory.
•Processorfetchesoneinstructionatatimeandperform
theoperationspecified.
•Instructionsarefetchedfromsuccessivememory
locationsuntilabranchorajumpinstructionis
encountered.
•Processorkeepstrackoftheaddressofthememory
locationcontainingthenextinstructiontobefetched
usingProgramCounter(PC).
•InstructionRegister(IR)

Executing an Instruction
1.Fetchthecontentsofthememorylocationpointed
tobythePC.Thecontentsofthislocationare
loadedintotheIR(fetchphase).
IR←[[PC]]
2.Assumingthatthememoryisbyteaddressable,
incrementthecontentsofthePCby4(fetch
phase).
PC←[PC]+4
3.Carryouttheactionsspecifiedbytheinstructionin
theIR(executionphase).
1.Fetchthecontentsofthememorylocationpointed
tobythePC.Thecontentsofthislocationare
loadedintotheIR(fetchphase).
IR←[[PC]]
2.Assumingthatthememoryisbyteaddressable,
incrementthecontentsofthePCby4(fetch
phase).
PC←[PC]+4
3.Carryouttheactionsspecifiedbytheinstructionin
theIR(executionphase).

Addressing Modes
•Thetermaddressingmodereferstothemechanismemployedfor
specifyingoperands.
•Anoperandcanbespecifiedaspartoftheinstructionor
referenceofthememorylocationscanbegiven.
•AnoperandcouldalsobeanaddressofCPUregister.
•Themostcommonaddressingtechniquesare:
Immediate
Direct
Indirect
Register
RegisterIndirect
Displacement
Stack
•Thetermaddressingmodereferstothemechanismemployedfor
specifyingoperands.
•Anoperandcanbespecifiedaspartoftheinstructionor
referenceofthememorylocationscanbegiven.
•AnoperandcouldalsobeanaddressofCPUregister.
•Themostcommonaddressingtechniquesare:
Immediate
Direct
Indirect
Register
RegisterIndirect
Displacement
Stack

Addressing Modes
Toexplaintheaddressingmodes,weusethefollowingnotation:
A=contentsofanaddressfieldintheinstructionthatreferstoa
memory
R=contentsofanaddressfieldintheinstructionthatreferstoaregister
EA=actual(effective)addressofthelocationcontainingthereferenced
operand
(X)=contentsofmemorylocationXorregisterX
Toexplaintheaddressingmodes,weusethefollowingnotation:
A=contentsofanaddressfieldintheinstructionthatreferstoa
memory
R=contentsofanaddressfieldintheinstructionthatreferstoaregister
EA=actual(effective)addressofthelocationcontainingthereferenced
operand
(X)=contentsofmemorylocationXorregisterX

1. Immediate Addressing:
•Theoperandisactuallypresentintheinstruction
•OPERAND=A
•Thismodecanbeusedtodefineanduseconstantsor
setinitialvaluesofvariables.
•Theadvantageofimmediateaddressingisthatno
memoryreferenceotherthantheinstructionfetchis
requiredtoobtaintheoperand.
•e.g.MOVER0,300
•Theoperandisactuallypresentintheinstruction
•OPERAND=A
•Thismodecanbeusedtodefineanduseconstantsor
setinitialvaluesofvariables.
•Theadvantageofimmediateaddressingisthatno
memoryreferenceotherthantheinstructionfetchis
requiredtoobtaintheoperand.
•e.g.MOVER0,300
•Theoperandisactuallypresentintheinstruction
•OPERAND=A
•Thismodecanbeusedtodefineanduseconstantsor
setinitialvaluesofvariables.
•Theadvantageofimmediateaddressingisthatno
memoryreferenceotherthantheinstructionfetchis
requiredtoobtaintheoperand.
•e.g.MOVER0,300
•Theoperandisactuallypresentintheinstruction
•OPERAND=A
•Thismodecanbeusedtodefineanduseconstantsor
setinitialvaluesofvariables.
•Theadvantageofimmediateaddressingisthatno
memoryreferenceotherthantheinstructionfetchis
requiredtoobtaintheoperand.
•e.g.MOVER0,300
Immediate addressing

2. Direct Addressing
•Theaddressfieldcontainstheeffectiveaddressofthe
operand:EA=A
•Itrequiresonlyonememoryreferenceandnospecial
calculation.
•Here,'A'indicatesthememoryaddressfieldforthe
operand.
•e.g.MOVER1,1001
•Theaddressfieldcontainstheeffectiveaddressofthe
operand:EA=A
•Itrequiresonlyonememoryreferenceandnospecial
calculation.
•Here,'A'indicatesthememoryaddressfieldforthe
operand.
•e.g.MOVER1,1001
Direct addressing

3. Indirect Addressing
•Theeffectiveaddressoftheoperandisstoredinthe
memoryandtheinstructioncontainstheaddressofthe
memorycontainingtheaddressofthedata.
•Thisisknowasindirectaddressing:
EA=(A)
•Here'A'indicatesthememory
addressfieldoftherequired
Operands.
•E.g.MOVER0,(1000)
•Theeffectiveaddressoftheoperandisstoredinthe
memoryandtheinstructioncontainstheaddressofthe
memorycontainingtheaddressofthedata.
•Thisisknowasindirectaddressing:
EA=(A)
•Here'A'indicatesthememory
addressfieldoftherequired
Operands.
•E.g.MOVER0,(1000)
Indirect addressing

4. Register Addressing
•The instruction specifies the address of the register
containing the operand.
•The instruction contains the name of the a CPU register.
EA =R indicates a register where the operand is present.
•E.g. MOVE R1, 1010
•The instruction specifies the address of the register
containing the operand.
•The instruction contains the name of the a CPU register.
EA =R indicates a register where the operand is present.
•E.g. MOVE R1, 1010
Register addressing

5. Register Indirect Addressing
•Theeffectiveaddressoftheoperandisstoredina
registerandinstructioncontainstheaddressofthe
registercontainingtheaddressofthedata.
•EA=(R)
•Here'R'indicatesthememory
addressfieldoftherequired
Operands.
•E.g.MOVER0,(R1)
•Theeffectiveaddressoftheoperandisstoredina
registerandinstructioncontainstheaddressofthe
registercontainingtheaddressofthedata.
•EA=(R)
•Here'R'indicatesthememory
addressfieldoftherequired
Operands.
•E.g.MOVER0,(R1)
Register addressing
Register indirect
addressing

6. Displacement Addressing
•Acombinationofbothdirectaddressingandregister
indirectaddressingmodes.
EA=A+(R)
•Thevaluecontainedinoneaddressfield(value=A)is
useddirectly.Theotheraddressfieldreferstoaregister
whosecontentsareaddedtoAtoproducetheeffective
address.
•Acombinationofbothdirectaddressingandregister
indirectaddressingmodes.
EA=A+(R)
•Thevaluecontainedinoneaddressfield(value=A)is
useddirectly.Theotheraddressfieldreferstoaregister
whosecontentsareaddedtoAtoproducetheeffective
address.

6. Displacement Addressing
Threeofthemostcommonuseofdisplacementaddressing
are:
Relativeaddressing
Base-registeraddressing
Indexing
Threeofthemostcommonuseofdisplacementaddressing
are:
Relativeaddressing
Base-registeraddressing
Indexing

Relative addressing
•Forrelativeaddressing,theimplicitlyreferencedregisteristhe
programcounter(PC).
•Thecurrentinstructionaddressisaddedtotheaddressfieldto
producetheEA.
•Thus,theeffectiveaddressisadisplacementrelativetothe
addressoftheinstruction.
•e.g.1001 JCX1
1050X1:ADDR1,5
•X1=addressofthetargetinstruction-addressofthecurrentinstruction
=1050-1001=49
•Forrelativeaddressing,theimplicitlyreferencedregisteristhe
programcounter(PC).
•Thecurrentinstructionaddressisaddedtotheaddressfieldto
producetheEA.
•Thus,theeffectiveaddressisadisplacementrelativetothe
addressoftheinstruction.
•e.g.1001 JCX1
1050X1:ADDR1,5
•X1=addressofthetargetinstruction-addressofthecurrentinstruction
=1050-1001=49

Base Register Addressing
•Thebaseregister(referenceregister)containsamemory
address,andtheaddressfieldcontainsadisplacementfrom
thatbaseaddressspecifiedbythebaseregister.
EA=A+(B)

Indexing or Indexed
Addressing
•Usedtoaccesselementsanarraywhicharestoredin
consecutivelocationofmemory.
EA=A+(R)
•AddressfieldAgivesmainmemoryaddressandR
containspositivedisplacementwithrespecttobase
address.
•Thedisplacementcanbespecifiedeitherdirectlyinthe
instructionorthroughanotherregisters.
•E.g.MOVER1,(BR+5) MOVER0,(BR+R1)
•Usedtoaccesselementsanarraywhicharestoredin
consecutivelocationofmemory.
EA=A+(R)
•AddressfieldAgivesmainmemoryaddressandR
containspositivedisplacementwithrespecttobase
address.
•Thedisplacementcanbespecifiedeitherdirectlyinthe
instructionorthroughanotherregisters.
•E.g.MOVER1,(BR+5) MOVER0,(BR+R1)
Starting
address
Starting
address
Offset(index)
Offset(index)

Auto Indexing
•Generallyindexregisterareusedforiterativetasks,it
istypicalthatthereisaneedtoincrementordecrement
theindexregisteraftereachreferencetoit.
•Becausethisissuchacommonoperation,somesystem
willautomaticallydothisaspartofthesameinstruction
cycle.
•Thisisknownasauto-indexing.
•Twotypesofauto-indexing
1.auto-incrementing
2.auto-decrementing.
•Generallyindexregisterareusedforiterativetasks,it
istypicalthatthereisaneedtoincrementordecrement
theindexregisteraftereachreferencetoit.
•Becausethisissuchacommonoperation,somesystem
willautomaticallydothisaspartofthesameinstruction
cycle.
•Thisisknownasauto-indexing.
•Twotypesofauto-indexing
1.auto-incrementing
2.auto-decrementing.

a. Auto Increment Mode
•IfregisterRcontainstheaddressoftheoperand
•Afteraccessingtheoperand,thecontentsofregisterRis
incrementedtopointtothenextiteminthelist.
•Auto-indexing using increment can be depicted as
follows:
EA =A + (R) or EA = (R)+
(R) = (R) + 1
•E.g. MOVE R1,1010 /*starting Memory location 1010
is stored in R1*/
•ADD AC,(R1)+ /*contents of 1010 ML are added to AC and
the contents of R1 is incremented by 1*/
•IfregisterRcontainstheaddressoftheoperand
•Afteraccessingtheoperand,thecontentsofregisterRis
incrementedtopointtothenextiteminthelist.
•Auto-indexing using increment can be depicted as
follows:
EA =A + (R) or EA = (R)+
(R) = (R) + 1
•E.g. MOVE R1,1010 /*starting Memory location 1010
is stored in R1*/
•ADD AC,(R1)+ /*contents of 1010 ML are added to AC and
the contents of R1 is incremented by 1*/

b. Auto Decrement Mode
•Thecontentsofregisterspecifiedintheinstructionare
decrementedandthesecontentsarethenusedasthe
effectiveaddressoftheoperand.
•Auto-indexing using decrement can be depicted as
follows:
EA = A–(R) or EA=–(R)
(R)=(R)–1
•Thecontentsoftheregisteraretobedecremented
beforeusedastheeffectiveaddress.
•E.g. ADD R1,-(R2)
•Thecontentsofregisterspecifiedintheinstructionare
decrementedandthesecontentsarethenusedasthe
effectiveaddressoftheoperand.
•Auto-indexing using decrement can be depicted as
follows:
EA = A–(R) or EA=–(R)
(R)=(R)–1
•Thecontentsoftheregisteraretobedecremented
beforeusedastheeffectiveaddress.
•E.g. ADD R1,-(R2)

7. Stack Addressing
•Astackisalineararrayorlistoflocations.
•Sometimesreferredtoasapushdownlistorlast-in-
first-outqueue.
•Associatedwiththestackisapointerwhosevalueisthe
addressofthetopofthestack.
•Thestackpointerismaintainedinaregister.Thus,
referencestostacklocationsinmemoryareinfact
registerindirectaddresses.
•Thestackmodeofaddressingisaformofimplied
addressing.
•E.g.PUSHandPOP
•Astackisalineararrayorlistoflocations.
•Sometimesreferredtoasapushdownlistorlast-in-
first-outqueue.
•Associatedwiththestackisapointerwhosevalueisthe
addressofthetopofthestack.
•Thestackpointerismaintainedinaregister.Thus,
referencestostacklocationsinmemoryareinfact
registerindirectaddresses.
•Thestackmodeofaddressingisaformofimplied
addressing.
•E.g.PUSHandPOP

Basic Instruction Cycle
•Fetchcyclebasicallyinvolvesreadthenextinstructionfrom
thememoryintotheCPUandalongwiththatupdate
thecontentsoftheprogramcounter.
•Intheexecutionphase,itinterpretstheopcodeand
performtheindicatedoperation.
•Theinstructionfetchandexecutionphasetogetherknownas
instructioncycle.

Basic Instruction Cycle with
Interrupt
Aninstructioncycleincludesthefollowingsubcycles:
1.Fetch:Readthenextinstructionfrommemoryintothe
processor.
2.Execute:Interprettheopcodeandperformtheindicated
operation.
3.Interrupt:Ifinterruptsareenabledandaninterrupthas
occurred,savethecurrentprocessstateandservicethe
interrupt.
Aninstructioncycleincludesthefollowingsubcycles:
1.Fetch:Readthenextinstructionfrommemoryintothe
processor.
2.Execute:Interprettheopcodeandperformtheindicated
operation.
3.Interrupt:Ifinterruptsareenabledandaninterrupthas
occurred,savethecurrentprocessstateandservicethe
interrupt.

Basic Instruction Cycle with
Interrupt

The Indirect Cycle
•Theexecutionofaninstructionmayinvolveoneormore
operandsinmemory,eachofwhichrequiresamemory
access.
•Further,ifindirectaddressingisused,thenadditional
memoryaccessesarerequired.
•Forfetchingtheindirectaddressesasonemore
instructionssubcyclearerequired.
•Afteraninstructionisfetched,itisexaminedtodetermine
ifanyindirectaddressingisinvolved.Ifso,therequired
operandsarefetchedusingindirectaddressing.
•Theexecutionofaninstructionmayinvolveoneormore
operandsinmemory,eachofwhichrequiresamemory
access.
•Further,ifindirectaddressingisused,thenadditional
memoryaccessesarerequired.
•Forfetchingtheindirectaddressesasonemore
instructionssubcyclearerequired.
•Afteraninstructionisfetched,itisexaminedtodetermine
ifanyindirectaddressingisinvolved.Ifso,therequired
operandsarefetchedusingindirectaddressing.

The Indirect Cycle

Instruction Cycle State
Diagram

•Instructionaddresscalculation(iac):Determinetheaddressof
thenextinstructiontobeexecuted.Usually,thisinvolvesaddinga
fixednumbertotheaddressofthepreviousinstruction.
•Instructionfetch(if):Readinstructionfromitsmemorylocationinto
theprocessor.
•Instructionoperationdecoding(iod):Analyzeinstructionto
determinetypeofoperationtobeperformedandoperand(s)tobe
used.
•Operandaddresscalculation(oac):Iftheoperationinvolves
referencetoanoperandinmemoryoravailableviaI/O,then
determinetheaddressoftheoperand.
•Operandfetch(of):Fetchtheoperandfrommemoryorreaditin
fromI/O.
•Dataoperation(do):Performtheoperationindicatedinthe
instruction.
•Operandstore(os):WritetheresultintomemoryorouttoI/O.
•Instructionaddresscalculation(iac):Determinetheaddressof
thenextinstructiontobeexecuted.Usually,thisinvolvesaddinga
fixednumbertotheaddressofthepreviousinstruction.
•Instructionfetch(if):Readinstructionfromitsmemorylocationinto
theprocessor.
•Instructionoperationdecoding(iod):Analyzeinstructionto
determinetypeofoperationtobeperformedandoperand(s)tobe
used.
•Operandaddresscalculation(oac):Iftheoperationinvolves
referencetoanoperandinmemoryoravailableviaI/O,then
determinetheaddressoftheoperand.
•Operandfetch(of):Fetchtheoperandfrommemoryorreaditin
fromI/O.
•Dataoperation(do):Performtheoperationindicatedinthe
instruction.
•Operandstore(os):WritetheresultintomemoryorouttoI/O.

Instruction Interpretation and
Sequencing
•Everyprocessorhassomebasictypeofinstructionslike
datatransferinstruction,arithmeticandlogical
instruction,branchinstructionandsoon.
•Toperformaparticulartaskonthecomputeritis
programmersjobtoselectandwriteappropriate
instructionsoneaftertheother.Thisjobof
programmerisknownasinstructionsequencing.
•Twotypes:
1.Straightlinesequencing
2.Branchinstructon
•Everyprocessorhassomebasictypeofinstructionslike
datatransferinstruction,arithmeticandlogical
instruction,branchinstructionandsoon.
•Toperformaparticulartaskonthecomputeritis
programmersjobtoselectandwriteappropriate
instructionsoneaftertheother.Thisjobof
programmerisknownasinstructionsequencing.
•Twotypes:
1.Straightlinesequencing
2.Branchinstructon

1.Straight line sequencing
•ProcessorexecutesaprogramwiththehelpofProgram
Counter(PC)whichholdstheaddressofthenextinstruction
tobeexecuted.
•Tobeginexecutionofaprogram,theaddressoftheitsfirst
instructionisplacedintothePC.
•Theprocessorcontrolcircuitfetchesinstructionfromthe
memoryaddressspecifiedbythePCandexecutes
instruction,oneatatime.
•AtthesametimethecontentofPCisincrementedsoasto
pointtotheaddressofnextinstruction.
•Thisiscalledasstraightlinesequencing.
•ProcessorexecutesaprogramwiththehelpofProgram
Counter(PC)whichholdstheaddressofthenextinstruction
tobeexecuted.
•Tobeginexecutionofaprogram,theaddressoftheitsfirst
instructionisplacedintothePC.
•Theprocessorcontrolcircuitfetchesinstructionfromthe
memoryaddressspecifiedbythePCandexecutes
instruction,oneatatime.
•AtthesametimethecontentofPCisincrementedsoasto
pointtotheaddressofnextinstruction.
•Thisiscalledasstraightlinesequencing.

2. Branch Instruction
•Afterexecutingdecisionmakinginstruction,processor
havetofollowoneofthetwoprogramsequence.
•Branchinstructiontransfertheprogramcontrolfromone
straightlinesequencetoanotherstraightlinesequence
instruction.
•Inbranchinstruction,thenewaddresscalledtarget
addressorbranchtargetisloadedintoPCand
instructionisfetchedfromthenewaddress.
•Afterexecutingdecisionmakinginstruction,processor
havetofollowoneofthetwoprogramsequence.
•Branchinstructiontransfertheprogramcontrolfromone
straightlinesequencetoanotherstraightlinesequence
instruction.
•Inbranchinstruction,thenewaddresscalledtarget
addressorbranchtargetisloadedintoPCand
instructionisfetchedfromthenewaddress.

Figure 7.1.Single-bus organization of the datapath inside a processor.

Register Transfers
Yin
Riin
R
i
Riout
bus
Internal processor
BA
Z
ALU
Y
Zin
Zout
Constant 4
MUX
Figure 7.2.Input and output gating for the registers in Figure 7.1.
Select

Register Transfers
•Thedatatransferbetweenregistersandcommonbusisshownbya
linewitharrowheads.
•Butinactualpracticeeachregisterhasinputandoutputgatingand
thesegatesarecontrolledbycorrespondingcontrolsignal.
•Ri
inandRi
out:controlsignalsforinputandoutputgatingofregister
Ri.
•WhenRi
in=1,thedataavailableonthecommondatabusisloaded
intoregisterRi.
•WhenRi
out=1,thecontentsofRiareplacedonthecommondata
bus.
•ThesignalsRi
inandRi
outarecommonlyknownasinputenableand
outputenablesignalsofregisters,respectively.
•Thedatatransferbetweenregistersandcommonbusisshownbya
linewitharrowheads.
•Butinactualpracticeeachregisterhasinputandoutputgatingand
thesegatesarecontrolledbycorrespondingcontrolsignal.
•Ri
inandRi
out:controlsignalsforinputandoutputgatingofregister
Ri.
•WhenRi
in=1,thedataavailableonthecommondatabusisloaded
intoregisterRi.
•WhenRi
out=1,thecontentsofRiareplacedonthecommondata
bus.
•ThesignalsRi
inandRi
outarecommonlyknownasinputenableand
outputenablesignalsofregisters,respectively.

Register Transfers
•ConsiderthetransferofdatafromregisterR1toR2.This
canbedoneasfollows:
1.ActivateR1
out=1,thissignalplacesthecontentsof
registerR1onthecommonbus.
2.ActivateR2
in=1,thisloadsdatafromthecommondata
busintotheregisterR2.
•ConsiderthetransferofdatafromregisterR1toR2.This
canbedoneasfollows:
1.ActivateR1
out=1,thissignalplacesthecontentsof
registerR1onthecommonbus.
2.ActivateR2
in=1,thisloadsdatafromthecommondata
busintotheregisterR2.

Micro Operations
•Theprimaryfunctionofaprocessorunitistoexecute
sequenceofinstructionsstoredinamemory.
•Thesequenceofoperationinvolvedinprocessingan
instructionconstitutesaninstructioncycle,whichcanbe
dividedintothreemajorphases:fetchcycle,decode
cycleandexecutecycle.
•Toperformfetch,decodeandexecutecyclethe
processorunithastoperformsetofoperationscalled
Micro-operations.
•Theprimaryfunctionofaprocessorunitistoexecute
sequenceofinstructionsstoredinamemory.
•Thesequenceofoperationinvolvedinprocessingan
instructionconstitutesaninstructioncycle,whichcanbe
dividedintothreemajorphases:fetchcycle,decode
cycleandexecutecycle.
•Toperformfetch,decodeandexecutecyclethe
processorunithastoperformsetofoperationscalled
Micro-operations.

Micro-operation includes
•TransferawordofdatafromoneCPUregisterto
anotherortotheALU.
•Performanarithmeticoralogicoperationonthedata
fromCPUregisterandstoretheresultinaCPUregister.
•Fetchthecontentsofagivenmemorylocationandload
themintoaCPUregister.
•StoreawordofdatafromaCPUregisterintoagiven
memorylocation.
•TransferawordofdatafromoneCPUregisterto
anotherortotheALU.
•Performanarithmeticoralogicoperationonthedata
fromCPUregisterandstoretheresultinaCPUregister.
•Fetchthecontentsofagivenmemorylocationandload
themintoaCPUregister.
•StoreawordofdatafromaCPUregisterintoagiven
memorylocation.

1. Performing an Arithmetic or Logic
Operation
•TheALUisacombinationalcircuitthathasno
internalstorage.
•ALUgetsthetwooperandsfromMUXandbus.The
resultistemporarilystoredinregisterZ.
•Whatisthesequenceofoperationstoaddthe
contentsofregisterR1tothoseofR2andstorethe
resultinR3?
ControlSignals:
1.R1
out,Y
in
2.R2
out,SelectY,Add,Z
in
3.Z
out,R3
in
•TheALUisacombinationalcircuitthathasno
internalstorage.
•ALUgetsthetwooperandsfromMUXandbus.The
resultistemporarilystoredinregisterZ.
•Whatisthesequenceofoperationstoaddthe
contentsofregisterR1tothoseofR2andstorethe
resultinR3?
ControlSignals:
1.R1
out,Y
in
2.R2
out,SelectY,Add,Z
in
3.Z
out,R3
in

•TheProcessorloadsrequiredAddressintoMAR;atthesame
timeitissueReadsignal
•Whentherequesteddataisrecievedfromthememoryitis
storedintoMDR.
2. Fetching a Word from
Memory

2. Fetching a Word from
Memory
•ConsidertheinstructionMOVER2,(R1)
•Theprocessorwaitsuntilitreceivesanindicationthat
therequestedoperationhasbeencompleted(Memory-
Function-Completed,MFC).
•Theactionsneededtoexecutethisinstructionare:
MAR←[R1]
ActivatethereadcontrolsignaltoperformReadoperation
Ifmemoryisslow,activatewaitformemoryfunctioncomplete
(WMFC)
LoadMDRfromthememorybus
R2←[MDR]
•ConsidertheinstructionMOVER2,(R1)
•Theprocessorwaitsuntilitreceivesanindicationthat
therequestedoperationhasbeencompleted(Memory-
Function-Completed,MFC).
•Theactionsneededtoexecutethisinstructionare:
MAR←[R1]
ActivatethereadcontrolsignaltoperformReadoperation
Ifmemoryisslow,activatewaitformemoryfunctioncomplete
(WMFC)
LoadMDRfromthememorybus
R2←[MDR]

2. Fetching a Word from
Memory
•Thevariouscontrolsignalswhicharenecessaryto
activatetoperformgivenactionineachstep:
ControlSignals:
1.R1
out,MAR
in,Read
2.WMFC
3.MDR
out,R2
in
•Thevariouscontrolsignalswhicharenecessaryto
activatetoperformgivenactionineachstep:
ControlSignals:
1.R1
out,MAR
in,Read
2.WMFC
3.MDR
out,R2
in

3. Storing A Word In Memory:
Towriteawordofdataintoamemorylocationprocessor
hastoloadtheaddressofthedesiredmemorylocation
intheMAR,loadthedatatobewritteninmemoryin
MDRandactivateWritecontrolsignal.
ConsidertheinstructionMOVE(R2),R1.
Theactionsneededtoexecutethisinstructionare:
MAR←[R2]
MDR←[R1]
Activatethewritecontrolsignaltoperformwriteoperation
Ifmemoryisslow,activatewaitformemoryfunctioncomplete
(WMFC)
Towriteawordofdataintoamemorylocationprocessor
hastoloadtheaddressofthedesiredmemorylocation
intheMAR,loadthedatatobewritteninmemoryin
MDRandactivateWritecontrolsignal.
ConsidertheinstructionMOVE(R2),R1.
Theactionsneededtoexecutethisinstructionare:
MAR←[R2]
MDR←[R1]
Activatethewritecontrolsignaltoperformwriteoperation
Ifmemoryisslow,activatewaitformemoryfunctioncomplete
(WMFC)

3. Storing A Word In Memory:
Thevariouscontrolsignalswhicharenecessaryto
activatetoperformgivenactionineachstep:
ControlSignals:
1.R2
out,MAR
in
2.R1
out,MDR
in,Write
3.WMFC
Thevariouscontrolsignalswhicharenecessaryto
activatetoperformgivenactionineachstep:
ControlSignals:
1.R2
out,MAR
in
2.R1
out,MDR
in,Write
3.WMFC

Execution of a Complete
Instruction
•ADDR1,(R2)
ThisinstructionaddsthecontentsofregisterR1andthecontentof
memorylocationspecifiedbyregisterR2andstoreresultinregister
R1
1.Fetchtheinstruction
2.Fetchthefirstoperand(thecontentsofthe
memorylocationpointedtobyR2)
3.Performtheaddition
4.LoadtheresultintoR1
•ADDR1,(R2)
ThisinstructionaddsthecontentsofregisterR1andthecontentof
memorylocationspecifiedbyregisterR2andstoreresultinregister
R1
1.Fetchtheinstruction
2.Fetchthefirstoperand(thecontentsofthe
memorylocationpointedtobyR2)
3.Performtheaddition
4.LoadtheresultintoR1

Execution of a Complete
Instruction
1.PC
out,MAR
in, Read,
Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R2
out,MAR
in, Read
5.MDR
out, Y
in, WMFC
6.R1
out, Select Y, Add, Z
in
7.Z
out, R1
in, End
ADD R1,(R2)
Micro Instructions: Control Signals:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R2
MDR ← M(MAR)
Execute Cycle(5-7):
Y ← MDR
Z ← R1+Y
R1← Z
1.PC
out,MAR
in, Read,
Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R2
out,MAR
in, Read
5.MDR
out, Y
in, WMFC
6.R1
out, Select Y, Add, Z
in
7.Z
out, R1
in, End
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R2
MDR ← M(MAR)
Execute Cycle(5-7):
Y ← MDR
Z ← R1+Y
R1← Z

Execution of Branch
Instructions
•Abranchinstructionreplacesthecontentsof
PCwiththebranchtargetaddress,whichis
usuallyobtainedbyaddinganoffsetXgiven
inthebranchinstruction.
•TheoffsetXisusuallythedifference
betweenthebranchtargetaddressandthe
addressimmediatelyfollowingthebranch
instruction.
•Abranchinstructionreplacesthecontentsof
PCwiththebranchtargetaddress,whichis
usuallyobtainedbyaddinganoffsetXgiven
inthebranchinstruction.
•TheoffsetXisusuallythedifference
betweenthebranchtargetaddressandthe
addressimmediatelyfollowingthebranch
instruction.

Execution of Branch
Instructions
Thecontrolsequenceforunconditional
branchinstructionisasfollows:
1.PC
out,MAR
in,Read,Select4,Add,Z
in
2.Z
out,PC
in,Y
in,WMFC
3.MDR
out,IR
in
Thefirstthreestepsconstitutetheopcodefetchoperation
4.Offset_field_of_IR
out,SelectY,Add,Z
in
ThecontentsofPCandtheoffsetfieldofIRregisterareaddedand
resultisstoredinregisterZ.
5.Z
out,PC
in,End
Thecontrolsequenceforunconditional
branchinstructionisasfollows:
1.PC
out,MAR
in,Read,Select4,Add,Z
in
2.Z
out,PC
in,Y
in,WMFC
3.MDR
out,IR
in
Thefirstthreestepsconstitutetheopcodefetchoperation
4.Offset_field_of_IR
out,SelectY,Add,Z
in
ThecontentsofPCandtheoffsetfieldofIRregisterareaddedand
resultisstoredinregisterZ.
5.Z
out,PC
in,End

Execution of Branch
Instructions
Incaseofconditionalbranchinstructionthestatusofthe
conditioncodespecifiedintheinstructionischecked.
Ifthestatusspecifiedwithintheinstructionmatcheswith
thecurrentstatusofconditioncodes,thebranchtarget
addressisloadedinthePCbyaddingtheoffset
specifiedintheinstructiontothecontentsofPC
Otherwiseprocessorfetchesthenextinstructioninthe
sequence.
Incaseofconditionalbranchinstructionthestatusofthe
conditioncodespecifiedintheinstructionischecked.
Ifthestatusspecifiedwithintheinstructionmatcheswith
thecurrentstatusofconditioncodes,thebranchtarget
addressisloadedinthePCbyaddingtheoffset
specifiedintheinstructiontothecontentsofPC
Otherwiseprocessorfetchesthenextinstructioninthe
sequence.

Quiz
•Write control sequence
with micro-program for
ADDR1,R2
includingthe instruction
fetch phase? (Assume
single bus architecture)
•Write control sequence
for
SUB (R3)+,R1
Hint:
R1 = R1–(R3)
R3 = R3 + 1
lines
Data
Address
lines
bus
Memory
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
bus
IR
TEMP
R0
control
ALU
lines
Control signals
Rn1-( )
Instruction
decoder and
Internal processor
control logic
A B
Figure 7.1. Single-bus organization of the datapath inside a processor.
MUXSelect
Constant 4
•Write control sequence
with micro-program for
ADDR1,R2
includingthe instruction
fetch phase? (Assume
single bus architecture)
•Write control sequence
for
SUB (R3)+,R1
Hint:
R1 = R1–(R3)
R3 = R3 + 1
lines
Data
Address
lines
bus
Memory
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
bus
IR
TEMP
R0
control
ALU
lines
Control signals
Rn1-( )
Instruction
decoder and
Internal processor
control logic
A B
Figure 7.1. Single-bus organization of the datapath inside a processor.
MUXSelect
Constant 4

ADD R1,R2
Control Signals:
1.PC
out,MAR
in, Read, Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R1
out, Y
in, R2
out, Select Y, Add,Z
in
5.Z
out, R1
in, End
Micro Instructions:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch:
Not required
Execute Cycle(4,5):
Y ← R1
Z ← R2+Y
R1← Z
Control Signals:
1.PC
out,MAR
in, Read, Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R1
out, Y
in, R2
out, Select Y, Add,Z
in
5.Z
out, R1
in, End
Micro Instructions:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch:
Not required
Execute Cycle(4,5):
Y ← R1
Z ← R2+Y
R1← Z

SUB (R3)+,R1
1.PC
out,MAR
in, Read,
Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R3
out,MAR
in, Read
5.MDR
out, Y
in, WMFC
6.R1
out, Select Y, Sub, Z
in
7.Z
out, R1
in
8.R3
out,Select4, Add,Z
in
9.Z
out,R3
in, End
Micro Instructions: Control Signals:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R3
MDR ← M(MAR)
Execute Cycle(5-9):
Y ← MDR
Z ← R1-Y
R1← Z
R3 ← R3+1
1.PC
out,MAR
in, Read,
Select4, Add, Z
in
2.Z
out,PC
in, Y
in, WMFC
3.MDR
out,IR
in
4.R3
out,MAR
in, Read
5.MDR
out, Y
in, WMFC
6.R1
out, Select Y, Sub, Z
in
7.Z
out, R1
in
8.R3
out,Select4, Add,Z
in
9.Z
out,R3
in, End
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R3
MDR ← M(MAR)
Execute Cycle(5-9):
Y ← MDR
Z ← R1-Y
R1← Z
R3 ← R3+1

Arithmetic Logic Unit (ALU)
Inthisandthenextsection,wedealwithdetaileddesign
oftypicalALUsandshifters
DecomposetheALUinto:
•Anarithmeticcircuit
•Alogiccircuit
•Aselectortopickbetweenthetwocircuits
Arithmeticcircuitdesign
•Decomposethearithmeticcircuitinto:
Ann-bitparalleladder
AblockoflogicthatselectsfourchoicesfortheBinputtotheadder
Inthisandthenextsection,wedealwithdetaileddesign
oftypicalALUsandshifters
DecomposetheALUinto:
•Anarithmeticcircuit
•Alogiccircuit
•Aselectortopickbetweenthetwocircuits
Arithmeticcircuitdesign
•Decomposethearithmeticcircuitinto:
Ann-bitparalleladder
AblockoflogicthatselectsfourchoicesfortheBinputtotheadder

Arithmetic Circuit Design
Thereare only four functions of B to select as Y in G = A + Y:
•All 0’s
•B
•B
•All 1’s
C
in= 0 C
in= 1
G = A
G = A + 1
G = A–1
G = A + B
G = A
G = A + B
G = A + B + 1
G = A + B + 1
Transfer of A
Addition
Subtraction 1C
Decrement
Increment
Transfer of A
Subtraction 2C
Operation Operation
S1
S0
B
n
B input
logic
n
A
n
X
Cin
Y
n
G = X + Y +C
in
Cout
n-bit
parallel
adder

Logic Circuit
The text gives a circuit implemented using a multiplexer
plus gates implementing: AND, OR, XOR and NOT
Here we custom design a circuit for bit G
iby beginning
with a truth table organized as logic operation K-map
and assigning (S1, S0) codes to AND, OR, etc.
G
i= S
0A
iB
i+ S
1A
iB
i
+S
0A
iB
i+ S
1S
0A
i
Custom design better
S
1S
0ANDORXORNOT
The text gives a circuit implemented using a multiplexer
plus gates implementing: AND, OR, XOR and NOT
Here we custom design a circuit for bit G
iby beginning
with a truth table organized as logic operation K-map
and assigning (S1, S0) codes to AND, OR, etc.
G
i= S
0A
iB
i+ S
1A
iB
i
+S
0A
iB
i+ S
1S
0A
i
Custom design better
S
1S
0ANDORXORNOT
A
iB
i0 00 11 11 0
0 000 0 1
0 101 1 1
1 111 0 0
1 001 1 0

Arithmetic Logic Unit (ALU)
Thecustomcircuithasinterchangedthe(S
1,S
0)codesforXOR
andNOTcomparedtotheMUXcircuit.Topreservecompatibility
withthetext,weusetheMUXsolution.
Next,usethearithmeticcircuit,thelogiccircuit,anda2-way
multiplexertoformtheALU.
Theinputconnectionstothearithmeticcircuitandlogiccircuit
havebeenassignedtoprepareforseamlessadditionoftheshifter,
keepingtheselectioncodesforthecombinedALUandtheshifter
at4bits:
•Carry-inC
iandCarry-outC
i+1gobetweenbits
•A
iandB
iareconnectedtobothunits
•AnewsignalS
2performsthearithmetic/logicselection
•TheselectsignalenteringtheLSBofthearithmeticcircuit,C
in,
isconnectedtotheleastsignificantselectioninputforthelogic
circuit,S
0.
Thecustomcircuithasinterchangedthe(S
1,S
0)codesforXOR
andNOTcomparedtotheMUXcircuit.Topreservecompatibility
withthetext,weusetheMUXsolution.
Next,usethearithmeticcircuit,thelogiccircuit,anda2-way
multiplexertoformtheALU.
Theinputconnectionstothearithmeticcircuitandlogiccircuit
havebeenassignedtoprepareforseamlessadditionoftheshifter,
keepingtheselectioncodesforthecombinedALUandtheshifter
at4bits:
•Carry-inC
iandCarry-outC
i+1gobetweenbits
•A
iandB
iareconnectedtobothunits
•AnewsignalS
2performsthearithmetic/logicselection
•TheselectsignalenteringtheLSBofthearithmeticcircuit,C
in,
isconnectedtotheleastsignificantselectioninputforthelogic
circuit,S
0.

Arithmetic Logic Unit (ALU)
Thenextmostsignificantselectsignals,S
0forthearithmeticcircuit
andS
1forthelogiccircuit,arewiredtogether,completingthetwo
selectsignalsforthelogiccircuit.
TheremainingS
1completesthethreeselectsignalsforthearithmetic
circuit.
C
i C
i+1
One stage of
arithmetic
circuit
2-to-1
MUX0
1
S
A
i
B
i
S
0
S
1
C
i
G
i
A
i
B
i
S
0
S
1
A
Thenextmostsignificantselectsignals,S
0forthearithmeticcircuit
andS
1forthelogiccircuit,arewiredtogether,completingthetwo
selectsignalsforthelogiccircuit.
TheremainingS
1completesthethreeselectsignalsforthearithmetic
circuit.
One stage of
logic circuit
S
S
2
A
i
B
i
S
0
S
1
C
in

Shifters
Required for data processing, multiplication, division etc.
Direction: Left, Right
Number of positions with examples:
•Single bit:
1 position
0 and 1 positions
•Multiple bit:
1 to n–1 positions
0 to n–1 positions
Filling of vacant positions
•Many options depending on instruction set
•Here, will provide input lines or zero fill
Required for data processing, multiplication, division etc.
Direction: Left, Right
Number of positions with examples:
•Single bit:
1 position
0 and 1 positions
•Multiple bit:
1 to n–1 positions
0 to n–1 positions
Filling of vacant positions
•Many options depending on instruction set
•Here, will provide input lines or zero fill

4-Bit Basic Left/Right Shifter
Serial Inputs:
•I
Rfor right shift
•I
Lfor left shift
Serial Outputs
•R for right shift (Same as MSB input)
•L for left shift (Same as LSB input)
Shift Functions:
(S
1, S
0) = 00 Pass B unchanged
01 Right shift
10 Left shift
11 Unused
B3
IR IL
Serial
output L
Serial
output R
B2 B1 B0
S
M
U
X
012
S
M
U
X
012
S
M
U
X
012
S
M
U
X
012
Serial Inputs:
•I
Rfor right shift
•I
Lfor left shift
Serial Outputs
•R for right shift (Same as MSB input)
•L for left shift (Same as LSB input)
Shift Functions:
(S
1, S
0) = 00 Pass B unchanged
01 Right shift
10 Left shift
11 Unused
S
2
H0H1H2H3
X X X X

What is Pipelining?
Atechniqueusedinadvancedmicroprocessorswherethe
microprocessorbeginsexecutingasecondinstruction
beforethefirsthasbeencompleted.
APipelineisaseriesofstages,wheresomeworkisdone
ateachstage.Theworkisnotfinisheduntilithaspassed
throughallstages.
Withpipelining,thecomputerarchitectureallowsthenext
instructionstobefetchedwhiletheprocessoris
performingarithmeticoperations,holdingtheminabuffer
closetotheprocessoruntileachinstructionoperationcan
performed.
Atechniqueusedinadvancedmicroprocessorswherethe
microprocessorbeginsexecutingasecondinstruction
beforethefirsthasbeencompleted.
APipelineisaseriesofstages,wheresomeworkisdone
ateachstage.Theworkisnotfinisheduntilithaspassed
throughallstages.
Withpipelining,thecomputerarchitectureallowsthenext
instructionstobefetchedwhiletheprocessoris
performingarithmeticoperations,holdingtheminabuffer
closetotheprocessoruntileachinstructionoperationcan
performed.

How Pipeline Works?
Thepipelineisdividedintosegmentsandeach
segmentcanexecuteitsoperationconcurrently
withtheothersegments.Onceasegment
completesanoperation,itpassestheresultto
thenextsegmentinthepipelineandfetchesthe
nextoperationsfromtheprecedingsegment.
Thepipelineisdividedintosegmentsandeach
segmentcanexecuteitsoperationconcurrently
withtheothersegments.Onceasegment
completesanoperation,itpassestheresultto
thenextsegmentinthepipelineandfetchesthe
nextoperationsfromtheprecedingsegment.

Basic Ideas

Data Dependence

Advantages/Disadvantages
Advantages:
•Moreefficientuseofprocessor
•Quickertimeofexecutionoflargenumberof
instructions
Disadvantages:
•Pipelininginvolvesaddinghardwaretothechip
•Inabilitytocontinuouslyrunthepipelineatfullspeed
becauseofpipelinehazardswhichdisruptthesmooth
executionofthepipeline.
Advantages:
•Moreefficientuseofprocessor
•Quickertimeofexecutionoflargenumberof
instructions
Disadvantages:
•Pipelininginvolvesaddinghardwaretothechip
•Inabilitytocontinuouslyrunthepipelineatfullspeed
becauseofpipelinehazardswhichdisruptthesmooth
executionofthepipeline.

Instruction Pipelining
Pipeline can also occur in instruction stream as with data
stream
Consecutive instructions are read from memory while
previous instructions are executed in various pipeline
stages.
Difficulties
•Different execution times for different pipeline stages
•Some instructions may skip some of the stages. E.g. No need of
effective address calculation in immediate or register addressing
•Two or more stages require access of memory at same time.
E.g. instruction fetch and operand fetch at same time
Pipeline can also occur in instruction stream as with data
stream
Consecutive instructions are read from memory while
previous instructions are executed in various pipeline
stages.
Difficulties
•Different execution times for different pipeline stages
•Some instructions may skip some of the stages. E.g. No need of
effective address calculation in immediate or register addressing
•Two or more stages require access of memory at same time.
E.g. instruction fetch and operand fetch at same time

Pipeline Stages
Consider the following decomposition for processing the
instructions
Fetch instruction (FI)–Read into a buffer
Decode instruction (DI)–Determine opcode, operands
Calculate operands (CO)–Indirect, Register indirect, etc.
Fetch operands (FO)–Fetch operands from memory
Execute instructions (EI)-Execute
Write operand (WO)–Store result if applicable
Overlap these operations to make a 6 stage pipeline
Consider the following decomposition for processing the
instructions
Fetch instruction (FI)–Read into a buffer
Decode instruction (DI)–Determine opcode, operands
Calculate operands (CO)–Indirect, Register indirect, etc.
Fetch operands (FO)–Fetch operands from memory
Execute instructions (EI)-Execute
Write operand (WO)–Store result if applicable
Overlap these operations to make a 6 stage pipeline

Timing of Instruction Pipeline-
six stages
Reduction in
instruction
execution time
from54time
units to14time
units
Reduction in
instruction
execution time
from54time
units to14time
units

Data Dependencies
•Whentwoinstructionsaccessthesameregister.
•RAW:Read-After-Write
•Truedependency
•WAR:Write-After-Read
•Anti-dependency
•WAW:Write-After-Write
•False-dependency
•Keyproblemwithregularin-orderpipelinesis
RAW.
•Whentwoinstructionsaccessthesameregister.
•RAW:Read-After-Write
•Truedependency
•WAR:Write-After-Read
•Anti-dependency
•WAW:Write-After-Write
•False-dependency
•Keyproblemwithregularin-orderpipelinesis
RAW.

Pipeline Hazards
StructuralHazardorResourceConflict–Twoinstructions
needtoaccessthesameresourceatsametime.
DataHazardorDataDependency–Aninstructionuses
theresultofthepreviousinstructionbeforeitisready.
Ahazardoccursexactlywhenaninstructiontriesto
readaregisterinitsIDstagethatanearlierinstruction
intendstowriteinitsWOstage.
ControlHazardorBranchDifficulty–Thelocationofan
instructiondependsonpreviousinstruction
Conditionalbranchesbreakthepipeline
Stuffwefetchedinadvanceisuselessifwetakethebranch
Pipelineimplementationneedtodetectandresolve
hazards.
StructuralHazardorResourceConflict–Twoinstructions
needtoaccessthesameresourceatsametime.
DataHazardorDataDependency–Aninstructionuses
theresultofthepreviousinstructionbeforeitisready.
Ahazardoccursexactlywhenaninstructiontriesto
readaregisterinitsIDstagethatanearlierinstruction
intendstowriteinitsWOstage.
ControlHazardorBranchDifficulty–Thelocationofan
instructiondependsonpreviousinstruction
Conditionalbranchesbreakthepipeline
Stuffwefetchedinadvanceisuselessifwetakethebranch
Pipelineimplementationneedtodetectandresolve
hazards.

Structural Hazard
•WhenIFstagerequiresmemoryaccessforinstructionfetch,andMEM
stageneedmemoryaccessforoperandfetchatthesametime.
•Solutions:
Stalling(Waiting/Delaying)
Delayedbyoneclock
cycle
Splitcache
Separatecachefor
instructions(codecache)
andoperands(datacache)
•WhenIFstagerequiresmemoryaccessforinstructionfetch,andMEM
stageneedmemoryaccessforoperandfetchatthesametime.
•Solutions:
Stalling(Waiting/Delaying)
Delayedbyoneclock
cycle
Splitcache
Separatecachefor
instructions(codecache)
andoperands(datacache)

Data Hazard
•Datahazardsoccurwhendataisusedbeforeitisready.
Read After Write (RAW)
Instr
Jtries to read operand before Instr
Iwrites it
•Caused by a “Dependence” (in compiler nomenclature).
This hazard results from an actual need for
communication.
Execution Order is:
Instr
I
Instr
J
Read After Write (RAW)
Instr
Jtries to read operand before Instr
Iwrites it
•Caused by a “Dependence” (in compiler nomenclature).
This hazard results from an actual need for
communication.
I: addr1,r2,r3
J: sub r4,r1,r3

Data Hazard (cont.)
Write After Read (WAR)
Instr
Jtries to write operandbeforeInstr
Ireads it
–Gets wrong operand
–Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
Execution Order is:
Instr
I
Instr
J
I: sub r4,r1,r3
J: addr1,r2,r3
K:mulr6,r1,r7
Write After Read (WAR)
Instr
Jtries to write operandbeforeInstr
Ireads it
–Gets wrong operand
–Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
I: sub r4,r1,r3
J: addr1,r2,r3
K:mulr6,r1,r7

Data Hazard (cont.)
Write After Write (WAW)
Instr
Jtries to write operandbeforeInstr
Iwrites it
–Leaves wrong result (Instr
InotInstr
J)
•Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
Execution Order is:
Instr
I
Instr
J
I: subr1,r4,r3
J: addr1,r2,r3
K:mulr6,r1,r7
Write After Write (WAW)
Instr
Jtries to write operandbeforeInstr
Iwrites it
–Leaves wrong result (Instr
InotInstr
J)
•Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
I: subr1,r4,r3
J: addr1,r2,r3
K:mulr6,r1,r7

Data Hazard (cont.)
•Solutions for Data Hazards
–Stalling
–Forwarding:
»connect new value directly to next stage
–Reordering
•Solutions for Data Hazards
–Stalling
–Forwarding:
»connect new value directly to next stage
–Reordering

Data Hazard-Stalling
0 2 4 6 8 10 12
IF ID EX MEM
16
add$s0,$t0,$t1
STALL
18
sub $t2,$s0,$t3
IF EX MEM
STALL
BUBBLE BUBBLE BUBBLE BUBBLE
BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE
$s0
written
here
W
s0
WB
$s0 read
here
R
s0
BUBBLE
0 2 4 6 8 10 12
IF ID EX MEM
16
add$s0,$t0,$t1
STALL
18
sub $t2,$s0,$t3
IF EX MEM
STALL
BUBBLE BUBBLE BUBBLE BUBBLE
BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE
$s0
written
here
W
s0
WB
$s0 read
here
R
s0
BUBBLE

Data Hazard-Forwarding
•Key idea: connect new value directly to next stage
•Still read s0, but ignore in favor of new result
ID
0 2 4 6 8 10 12
IF ID EX MEM
16
add$s0,$t0,$t1
18
sub $t2,$s0,$t3 IF EX MEM
W
s0
WB
R
s0
new value
of s0
•Key idea: connect new value directly to next stage
•Still read s0, but ignore in favor of new result
ID
0 2 4 6 8 10 12
IF ID EX MEM
16
add$s0,$t0,$t1
18
sub $t2,$s0,$t3 IF EX MEM
W
s0
WB
R
s0
new value
of s0

Data Hazard
This is another representation of the stall.
LW R1, 0(R2) IF ID EX MEM WB
SUB R4, R1, R5 IF ID EX MEM WB
AND R6, R1, R7 IF ID EX MEM WBAND R6, R1, R7 IF ID EX MEM WB
OR R8, R1, R9 IF ID EX MEM WB
LW R1, 0(R2) IF ID EX MEM WB
SUB R4, R1, R5 IF ID stall EX MEM WB
AND R6, R1, R7 IF stall ID EX MEM WB
OR R8, R1, R9 stall IF ID EX MEM WB

Data Hazard-Reordering
•Consider a program segment
1.t = 5
2.x = y + z
3.p = x + 1
•Instruction 3 is dependent on instruction 2 but
Instruction 2 and 3 has no dependency on instruction 1
•So after reordering program segment will be
1.x = y + z
2.t = 5
3.p = x + 1
1 2 3 4 5 6
IFIDOFEX
IFIDxOFEX
Pipelined
execution
•Consider a program segment
1.t = 5
2.x = y + z
3.p = x + 1
•Instruction 3 is dependent on instruction 2 but
Instruction 2 and 3 has no dependency on instruction 1
•So after reordering program segment will be
1.x = y + z
2.t = 5
3.p = x + 1
123456
IFIDOFEX
IFIDOFEX
IFIDOFEX
Pipelined execution
after reordering

Control Hazard
•Causedbybranchinstructions–unconditionaland
conditionalbranches.
Unconditional–branchalways
Conditional–mayormaynotcausebranching
•Inpipelinedprocessorfollowingactionsarecritical:
Timelydetectionofabranchinstruction
Earlycalculationofbranchaddress
Earlytestingofbranchconditionforconditionalbranchinstructions
•Causedbybranchinstructions–unconditionaland
conditionalbranches.
Unconditional–branchalways
Conditional–mayormaynotcausebranching
•Inpipelinedprocessorfollowingactionsarecritical:
Timelydetectionofabranchinstruction
Earlycalculationofbranchaddress
Earlytestingofbranchconditionforconditionalbranchinstructions

Control Hazard (cont.)
•BranchNotTaken

Control Hazard (cont.)
•BranchinaPipeline–FlushedPipeline

Dealing with Branches
•Delayedbranches
•Branchprediction
•MultipleStreams
•PrefetchBranchTarget
•Loopbuffer
•Delayedbranches
•Branchprediction
•MultipleStreams
•PrefetchBranchTarget
•Loopbuffer

Delayed Branch
DelayedBranch–usedwithRISCmachines
Requiressomecleverrearrangementofinstructions
Burdenonprogrammersbutcanincreaseperformance
MostRISCmachines:Doesn’tflushthepipelineincaseofa
branch
CalledtheDelayedBranch
Thismeansifwetakeabranch,we’llstillcontinuetoexecute
whateveriscurrentlyinthepipeline,ataminimumthenextinstruction
Benefit:Simplifiesthehardwarequiteabit
Butweneedtomakesureitissafetoexecutetheremaining
instructionsinthepipeline
Simplesolutiontogetsamebehaviorasaflushedpipeline:Insert
NOP–NoOperation–instructionsafterabranch
CalledtheDelaySlot
DelayedBranch–usedwithRISCmachines
Requiressomecleverrearrangementofinstructions
Burdenonprogrammersbutcanincreaseperformance
MostRISCmachines:Doesn’tflushthepipelineincaseofa
branch
CalledtheDelayedBranch
Thismeansifwetakeabranch,we’llstillcontinuetoexecute
whateveriscurrentlyinthepipeline,ataminimumthenextinstruction
Benefit:Simplifiesthehardwarequiteabit
Butweneedtomakesureitissafetoexecutetheremaining
instructionsinthepipeline
Simplesolutiontogetsamebehaviorasaflushedpipeline:Insert
NOP–NoOperation–instructionsafterabranch
CalledtheDelaySlot

Delayed Branch (cont.)
NormalvsDelayedBranch
Onedelayslot-Nextinstructionisalwaysinthepipeline.
“Normal”pathcontainsanimplicit“NOP”instructionasthe
pipelinegetsflushed.DelayedbranchrequiresexplicitNOP
instructionplacedinthecode!
NormalvsDelayedBranch
Onedelayslot-Nextinstructionisalwaysinthepipeline.
“Normal”pathcontainsanimplicit“NOP”instructionasthe
pipelinegetsflushed.DelayedbranchrequiresexplicitNOP
instructionplacedinthecode!

Delayed Branch (cont.)
OptimizedDelayedBranch
Butwecanoptimizethiscodebyrearrangement!Noticewe
alwaysAdd1toAsowecanusethisinstructiontofillthedelay
slot

Branch Prediction
Predictnevertaken
Assumethatjumpwillnothappen
Alwaysfetchnextinstruction
68020&VAX11/780
VAXwillnotprefetchafterbranchifapagefaultwould
result
Predictalwaystaken
Assumethatjumpwillhappen
Alwaysfetchtargetinstruction
Studiesindicatebranchesaretakenaround60%ofthe
timeinmostprograms
Predictnevertaken
Assumethatjumpwillnothappen
Alwaysfetchnextinstruction
68020&VAX11/780
VAXwillnotprefetchafterbranchifapagefaultwould
result
Predictalwaystaken
Assumethatjumpwillhappen
Alwaysfetchtargetinstruction
Studiesindicatebranchesaretakenaround60%ofthe
timeinmostprograms

Branch Prediction (cont.)
PredictbyOpcode
Sometypesofbranchinstructionsaremorelikelytoresultina
jumpthanothers(e.g.LOOPvs.JUMP)
Cangetupto75%success
Taken/Nottakenswitch–1bitbranchpredictor
Basedonprevioushistory
Ifabranchwastakenlasttime,predictitwillbetakenagain
Ifabranchwasnottakenlasttime,predictitwillnotbetakenagain
Goodforloops
Coulduseasinglebittoindicatehistoryofthepreviousresult
Needtosomehowstorethisbitwitheachbranchinstruction
Couldusemorebitstorememberamoreelaboratehistory
PredictbyOpcode
Sometypesofbranchinstructionsaremorelikelytoresultina
jumpthanothers(e.g.LOOPvs.JUMP)
Cangetupto75%success
Taken/Nottakenswitch–1bitbranchpredictor
Basedonprevioushistory
Ifabranchwastakenlasttime,predictitwillbetakenagain
Ifabranchwasnottakenlasttime,predictitwillnotbetakenagain
Goodforloops
Coulduseasinglebittoindicatehistoryofthepreviousresult
Needtosomehowstorethisbitwitheachbranchinstruction
Couldusemorebitstorememberamoreelaboratehistory

Performance Measures
•Themostimportantmeasureoftheperformanceofa
computerishowquicklyitcanexecuteprogram.
•Theperformanceofacomputerisaffectedbythedesign
ofitshardware,thecomplieranditsmachinelanguage
instructions,instructionset,implementationlanguageetc.
•Thecomputeruserisalwaysinterestedinreducingthe
executiontime.
•Theexecutiontimeisalsoreferredasresponsetime.
•Reductioninresponsetimeincreasesthethroughput.
•Throughput:thetotalamountofworkdoneinagiventime.
•Themostimportantmeasureoftheperformanceofa
computerishowquicklyitcanexecuteprogram.
•Theperformanceofacomputerisaffectedbythedesign
ofitshardware,thecomplieranditsmachinelanguage
instructions,instructionset,implementationlanguageetc.
•Thecomputeruserisalwaysinterestedinreducingthe
executiontime.
•Theexecutiontimeisalsoreferredasresponsetime.
•Reductioninresponsetimeincreasesthethroughput.
•Throughput:thetotalamountofworkdoneinagiventime.

1. The System Clock
•Processorcircuitsarecontrolledbyatimingsignalcalled,
aclock.
•Theclockdefinesregulartimeintervals,calledclock
cycles.
•Toexecuteamachineinstruction,theprocessordivides
theactiontobeperformedintoasequenceofbasicsteps,
suchthateachstepcanbecompletedinoneclockcycles.
•Theconstantcycletime(innanoseconds)isdenotedbyt.
•Theclockrateisgivenbyf=1/twhichismeasuredincycle
persecond(CPS).
•TheelectricalunitforthismeasurementofCPSis
hertz(Hz).
•Processorcircuitsarecontrolledbyatimingsignalcalled,
aclock.
•Theclockdefinesregulartimeintervals,calledclock
cycles.
•Toexecuteamachineinstruction,theprocessordivides
theactiontobeperformedintoasequenceofbasicsteps,
suchthateachstepcanbecompletedinoneclockcycles.
•Theconstantcycletime(innanoseconds)isdenotedbyt.
•Theclockrateisgivenbyf=1/twhichismeasuredincycle
persecond(CPS).
•TheelectricalunitforthismeasurementofCPSis
hertz(Hz).

2. Instruction Execution Rate

2. Instruction Execution Rate

3. Speedup
•Increaseinspeedduetoparallelsystemcomparedtouni-
processorsystem
Where,
T(N) represents the execution time taken by program running on N processors.
and
T(1) represents time taken by best serial implementation of a program
measured on one processor.

4. Efficiency
Speedup(N) = speedup measured on N processors

5. Throughput (ω
p)

6. Amdahl’s Law
•Gene Amdahl [AMDA67]
•Potential speed up of program using
multiple processors
•Concluded that:
—Code needs to be parallelizable
—Speed up is bound, giving diminishing returns
for more processors
•Task dependent
—Servers gain by maintaining multiple
connections on multiple processors
—Databases can be split into parallel tasks
•Gene Amdahl [AMDA67]
•Potential speed up of program using
multiple processors
•Concluded that:
—Code needs to be parallelizable
—Speed up is bound, giving diminishing returns
for more processors
•Task dependent
—Servers gain by maintaining multiple
connections on multiple processors
—Databases can be split into parallel tasks

6. Amdahl’s Law

6. Amdahl’s Law
•Forprogramrunningonsingleprocessor
—Fraction f of code infinitely parallelizable with no
scheduling overhead
—Fraction (1-f) of code inherently serial
—T is total execution time for program on single processor
—N is number of processors that fully exploit parallel
portions of code
•Conclusions
—fsmall, parallel processors has little effect
—N->∞, speedup bound by 1/(1 –f)
–Diminishing returns for using more processors
•Forprogramrunningonsingleprocessor
—Fraction f of code infinitely parallelizable with no
scheduling overhead
—Fraction (1-f) of code inherently serial
—T is total execution time for program on single processor
—N is number of processors that fully exploit parallel
portions of code
•Conclusions
—fsmall, parallel processors has little effect
—N->∞, speedup bound by 1/(1 –f)
–Diminishing returns for using more processors

6. Amdahl’s Law
Fig. Speed-up vs number of processors for Amdahl’s law

6. Amdahl’s Law Exercise

6. Amdahl’s Law Exercise (?)

Thank YouThank You
Tags