Maximizing CPU Efficiency: A Comprehensive Exploration of Pipelining in Computer Architecture.pdf
haseebali10701
21 views
79 slides
May 29, 2024
Slide 1 of 79
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
About This Presentation
Pipelining in computer architecture is a technique used to increase CPU throughput by allowing multiple instructions to be processed simultaneously. Instead of completing one instruction at a time, the CPU divides the execution of instructions into several stages and overlaps the execution of differ...
Pipelining in computer architecture is a technique used to increase CPU throughput by allowing multiple instructions to be processed simultaneously. Instead of completing one instruction at a time, the CPU divides the execution of instructions into several stages and overlaps the execution of different instructions. Each stage performs a different operation on the instruction, such as fetching, decoding, executing, and writing back the results. By breaking down the execution into smaller stages and processing multiple instructions concurrently, pipelining reduces the overall execution time of a program.
Increasing Process Speed
▪Cooking Process
Process 1 :
Material Preparing
Meat/Veggie Chop
5 mins
Process 3 :
Garnishing Food
5 mins
Process 4 :
Serving
3 mins
Process 2 :
Cooking
10 mins
It takes 23 mins to finish 1 dish !!!
Parallel Processing
3 times speed up
Average
23/3 = 7.67 min/dish
Chef 1
Chef 2
Chef 3
Parallel Processing Limitation
▪The processors (Chefs) must be an expert of all process from 1-4.
▪You need to have many expert chefs to speed up the process.
▪Every processor (Chef) must have the same skills
Pipelining Process
Time
Dish2
Dish1
Dish3
Dish4
10 mins
10 mins
10 mins
10 mins
10 mins
10 mins
4 Processes
Average
10 mins/dish
Balancing Pipeline stages
▪Since the bottleneck is Process 2 (10 mins), we can break it into 2
subprocess
Time
Dish2
Dish1
Dish3
Dish4
Dish5
5 mins
5 Processes
Average
5 mins/dish
What is pipelining
▪Pipelining is a set of data processing elements connected in series, where
the output of one element is the input of the next one.
▪The elements of a pipeline are often executed in parallel or in time-sliced
fashion.
▪Some amount of buffer storage is often inserted between elements.
Pipelining and Maintaining Correctness
•We can break the execution of instructions into stages and overlap the
executionof different instructions.
•We need to ensure that the results produced by our new pipelined processor
are nodifferent to the unpipelined one.
•What sort of situations could cause problems once we pipeline our
processor?
Basic Processor Architecture
Program
Counter
(PC)
R
E
G
0
R
E
G
N
……
C
T
R
L
Data bus
Address bus
ALU
Data Registers
I
R
Z
C
N
V
FLAG
Control Bus
5-stage Pipelined Processor
PC
+
+4
Program
Memory
Regfile
(R0-R14)
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr Read
Inst
Addr
Read
Data
Write
Data
Stage 1
Fetch
Stage 2
Decode
Stage 3
Execute
Stage 4
Memory
Stage 5
Writeback
#imm
Signals from CTRL Box
Stage 1 : Fetch (Read Instructions)
•The next instruction is fetched from the memory address
that is currently stored in the program counter (PC) and
stored into the instruction register.
•At the end of the fetch operation, the PC points to the next
instruction that will be read at the next cycle.
•The next instruction might be from branch address.
Stage 1 : Fetch
1.The CPU sends the contents of the PC to address bus
and sends a read command on the control bus
2.In response to the read command (with address equal to
PC), the program memory returns the data stored at the
memory location indicated by the PC on the data bus
3.The CPU copies the data from the data bus into its
Instruction Register (IR)
4.The PC is incremented, so that it points to the next
instruction. This step prepares the CPU for the next cycle.
Stage 2 : Decode (Inst Interpretation)
▪During this stage, the encoded
instruction presented in the instruction
register is interpreted by the decoder.
▪Register values and immediate
constants are also read in this
stage
Stage 3 : Execute (Calculation)
▪The CPU sends the decoded instruction
as a set of control signals to the
corresponding computer components.
▪If the instruction involves arithmetic or
logic, the ALU is utilized and data to be
computed.
▪This is the only stage of the instruction
cycle that is compute data. Flags are
updated in this stage.
▪The branch address (for jump) is also
processed here.
Stage 4 : Memory (Access RAM)
▪This stage access the data memory
(LDR/STR)
▪This process must be after Execute stage
because address can be pre-indexed.
▪The address from ALU calculation will be
sent to the address bus.
▪At the end of stage, the data memory
return/store 32-bit data to/from the data
bus
Stage 5 : Writeback (Write Output)
▪This stage, it writes the data
register (R0-R12).
▪Results from Execute and
Memory stage are stored in the
register selected by the
instruction (Destination register)
Regfile
(R0-R14)
Rn
Rm
Vn
Vm
Rs Vs
Vd
Vidx
Decode
Execute
Memory
Writeback
LDR R2, [R0,#4]
+
+4
Program
Memory
Regfile
(R0-R14)
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x10
0x10
LDR R2, [R0,#4]
0x14
0x10
Stage 3
Execute
R0
imm=4
ADD
R0+4
Stage 2
Decode
Rn=0
imm=4
LDR R2, [R0,#4]
Vn=R0
Stage 5
Writeback
Mem[R0+4]
R2=Mem[R0+4]
Stage 4
Memory
Mem[R0+4]
R0+4
PC
LDR R4, [R0], #4
+
+4
Program
Memory
Regfile
(R0-R14)
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x14
0x14
LDR R4, [R0],#4
0x18
0x14
Stage 3
Execute
R0
imm=4
BYPASS
R0
Stage 2
Decode
Rn=0
imm=4
LDR R4, [R0],#4
Vn=R0
Stage 4
Memory
Mem[R0]
R0
R0+4
Stage 5
Writeback
Mem[R0]
R4=Mem[R0] R0+4
R0=R0+4
PC
STR R3, [R0,R2]!
+
+4
Program
Memory
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x18
0x18
STR R3, [R0,R2]!
0x1C
0x18
Stage 5
Writeback
R0+R2
R0=R0+R2
Stage 2
Decode
Rn=0
Rm=2
STR R3, [R0,R2]!
Vn=R0
Rs=3
Vm=R2
Vs=R3
Stage 3
Execute
R0
R3
ADD
R0+R2
R2
Stage 4
Memory
R3
R0+R2
R0+R2
0
PC
STR R5, [R1], R9
+
+4
Program
Memory
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x1C
0x1C
STR R5, [R1], R9
0x20
0x1C
Stage 5
Writeback
R1+R9
R1=R1+R9
Stage 2
Decode
Rn=1
Rm=9
STR R5, [R1], R9
Vn=R1
Rs=5
Vm=R9
Vs=R5
Stage 3
Execute
R1
R5
BYPASS
R1
R9
Stage 4
Memory
R5
R1
R1+R9
R9
PC
The pipeline flow
FE DE EX ME WB
CLK 1 INST 0x00
CLK 2 INST 0x04 INST 0x00
CLK 3 INST 0x08 INST 0x04 INST 0x00
CLK 4 INST 0x0C INST 0x08 INST 0x04 INST 0x00
CLK 5 INST 0x10 INST 0x0C INST 0x08 INST 0x04 INST 0x00
The pipeline in Action
Clock
Cycle 1 2 3 4 5 6 7 8 9
FETCH
FETCH
FETCH
FETCH
EXEDEC
DEC
DEC
DEC EXE
EXE
EXE
MEM
MEM
MEM
MEM
WB
WB
WB
WB
FETCH DEC EXE MEM WB
On clock cycle 5, the pipeline contains all the
instructions listed in different pipeline stages
1.LDR R1, [R2]
2. ADD R2, R2, R3
3. STR R4, [R5], #4
4. SUB R0, R0, #1
5. LDR R5, [R3]
BEQ JUMP1 (Condition Meet)
+
+4
Program
Memory
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x24
0x24
BEQ JUMP1
0x28
0x24
Stage 2
Decode
BEQ JUMP1
0x28
(Offset*4)+4
Stage 3
Execute
0x38
PC
Stage 4
Memory
MEET
0x38
0x38
MEET
0x38
0x28
0x38
SUB R7, R7, #1
0x3C
BEQ JUMP1 (Condition Not Meet)
+
+4
Program
Memory
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
Data
Memory
+
+offset
Flag
Vd
+
Vidx
Addr
Read
Inst
Addr
Read
Data
Write
Data
#imm
Instruction
Register
Stage 1
Fetch
0x24
0x24
BEQ JUMP1
0x28
0x24
Stage 2
Decode
BEQ JUMP1
0x28
(Offset*4)+4
Stage 3
Execute
0x38
PC
Stage 4
Memory
NOT MEET
0x38
0x38
NOT MEET
0x28
0x28
0x28
ADD R7, R7, #1
0x2C
Data Hazard
Clock
Cycle 1 2 3 4 5 6 6
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3
R1 is updated here
R1 = R1+R3 is
executed before LDR
instruction is written
The result of ADD R1, R1, R3 is wrong.
This is Called “Data Hazard”
Data Dependencies - True Data Dependencies
Clock
Cycle 1 2 3 4 5 6 7 8
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3
R1 updated here
STALL STALL
Wait R1 Updated
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1. LDR R1, [R2]
3. ADD R1, R1, R3
R1 updated here
STALL
Wait R1 Updated
2. ADD R2, R2, R3
FETCH DEC EXE MEM WB
3. ADD R2, R2, R3 FETCH DECSTALL EXE MEMSTALL
Data Dependencies
True Data Dependencies
True data dependence (green arrow). Also
known as a Read-After-Write (RAW)
dependence.
If we imagine two instructions,ifollowed by j
If j uses the result of i, we say that jis
data dependent on i. This is an example of
a true data dependence (a read after a
write).
This type of data dependency causes the
pipeline hazards
SUB R1, R1, #2
ADD R1, R1, R3
ADD R2, R2, R3
Write
Read
Data dependencies
Name dependencies may also
exist when two instructions refer to
the same register. Unlike true data
dependencies, it doesn’t affect the
pipeline stall.
▪Output dependencies (red arrow)
▪Anti-dependence (green arrow)
SUB R1, R1, #2
ADD R1, R1, R3
LDR R3, [R2]
SUB R2, R3, #1
Write
Write
Data dependencies
▪Output dependence (red
arrow). Also known as a Write-
After-Write (WAW) dependence
▪We need to ensure we don’t
reorder writes to the same
register. This would mean
subsequent instructions could
receive the wrong data value.
▪This type doesn’t cause data
hazard, if the execution order
doesn’t change
SUB R1, R1, #2
ADD R1, R1, R3
Write
Write
Data dependencies
▪Anti-dependence(green arrows).
Also known as a Write-After-Read
(WAR) dependence
▪we need to be careful not to
overwrite a register whose current
value is still required by an earlier
instruction.
▪This type doesn’t cause data
hazard, if the execution order
doesn’t change
ADD R1, R1, R3
LDR R3, [R2]
SUB R2, R3, #1
Write
Read
Write
Read
Data Hazards
▪Data Hazards occur when an instruction depends on the result of previous
instruction and that result of instruction has not yet been computed. whenever
two different instructions use the same storage. the location must appear as if
it is executed in sequential order.
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3
R1 updated here
STALL STALL
Wait R1 Updated
3. ADD R2, R2, R3 FETCH DECSTALL EXE MEMSTALL
An ideal Pipeline
In the ideal case, when our pipeline never stalls, our CPI will equal 1 (and IPC = 1).
If we need to stall the pipeline, our CPI will increase, e.g.:
If we must stall for 1 cycle for 20% of instructions, and 3 cycles for 5% of instructions, our new
CPI would be:
Pipeline CPI = ideal pipeline CPI + pipeline stalls per instruction
= 1 + 1*0.20 + 3*0.05 = 1.35
Note: to make the best use of pipelining, we should avoid stalling as much as possible. (without
increasing our clock period!) Remember:
Time = instructions executed ×Clocks Per Instruction (CPI) ×clock period
Hazard Elimination Methods
1. Pipeline bubbling/ Stall
Instructions are fetched, control logic determines whether a hazard
could/will occur. If this is true, then the control logic inserts “STALL”
into the pipeline. Thus, before the next instruction (which would
cause the hazard) executes, the prior one will have had sufficient
time to finish and prevent the hazard.
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3
R1 updated here
STALL STALL
Wait R1 Updated
3. ADD R2, R2, R3 FETCH DECSTALL EXE MEMSTALL
Hazard Elimination Methods
2. Out-of-Order Execution
We can avoid data hazard by reorder instructions. However, the reorder
program must not break the data dependencies
▪True data dependencies – Read after Write (RAW)
▪Output Dependencies – Write after Write (WAW)
▪Anti-Dependencies – Write After Read (WAR)
Note – Read after Read (RAR) reordering does not affect data
dependencies.
Hazard Elimination Methods
3. Data forwarding
a technique to resolve the data hazards by passing a value directly
from one stage to an earlier stage in order to avoid stalling.
FE DE EX ME WB
Data forwarding from ME to DE
No Data forwarding (Memory)
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3
R1 updated here
STALL STALL
Wait R1 Updated
FE DE EX ME WB
Clock
Cycle 1 2 3 4 5 6 7 8
No Data forwarding (Memory)
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.LDR R1, [R2]
2. ADD R1, R1, R3 STALL
Wait MEM Read
Clock
Cycle 1 2 3 4 5 6 7 8
FE DE EX ME WB
Data forwarding from ME to DE
No Data forwarding (Execute)
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.MVN R1, R1
2. ADD R1, R1, #1
R1 updated here
STALL STALL
Wait R1 Updated
FE DE EX ME WB
Clock
Cycle 1 2 3 4 5 6 7 8
Data forwarding (Execute)
FETCH
FETCH EXEDEC
DEC EXE MEM
MEM
WB
WB
1.MVN R1, R1
2. ADD R1, R1, #1
FE DE EX ME WB
Clock
Cycle 1 2 3 4 5 6 7 8
Data forwarding from ME to DE
Data forwarding Disadvantage
▪Data forwarding reduces the pipeline stall. However, the forwarding path increase the logic
delay significantly.
▪Bad data forwarding will reduce the overall clock speed due to the longer path.
▪Processor designer must compromise between data forwarding and Pipeline hazard to
optimize the processor performance.
DE EXDE EX
DelayDelay
Delay
Exercise 5.1
▪Draw Pipeline stage diagram of these instructions with pipeline
stall to avoid data hazard.
MOV R0, #0x30
MOV R1, #0x54
MOV R2, #0xAA
ADD R3, R1, R2
SUB R4, R0, #1
Exercise 5.3
▪Draw Pipeline stage diagram of these instructions with pipeline
stall after reordering instructions to avoid data hazard.
MOV R0, #0x41000000
MOV R1, #0x12
MOV R2, #0x45
STRB R1, [R0], #1
STRB R2, [R0], #1
ADD R3, R3, R1
SUB R4, R4, R2
Exercise 5.3
1 2 3 4 5 6 7 8 9 1011121314151617
Control Hazard (Branching Hazard)
▪Control hazard occurs whenever the pipeline makes decision to
jump or not.
CMP R0, #0
BEQ ZERO
ONE:ADD R1, R1, #1
ZERO:ADD R2, R2, #1
(R0==0)
Jump to
ZERO
Jump to
ONE
Yes No
Control Hazard (Branching Hazard)
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
1. CMP R0, #0
2. BEQ ZERO
3. ZERO ADD R2, R2, #1 FETCH
Clock
Cycle 1 2 3 4 5 6 7 8
CASE R==0
1. CMP R0, #0
2. BEQ ZERO
3. ONE ADD R1, R1, #1
CASE R!=0
STALLSTALL
DEC
EXE MEMDEC
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCHSTALLSTALL
DEC
EXE MEMDEC
Control Hazard
What happens immediately after we fetch a conditional branch
instruction?
1.We must determine if the branch is taken or not.
2.If the branch is taken, we must compute and branch to the target
address.
(R0==0)
Jump to
ZERO
Jump to
ONE
Yes No
How Would We Like to Handle Branches?
-Inapipelinedprocessor,wewouldliketocalculatethenext
valueoftheprogramcounter(PC)inparallelwithreadingour
instructionmemory.
-Inpractice,thisisnon-trivialtosupportasbranchesmaynotbe
decodedandevaluateduntillaterinthepipeline.
FE DE EX ME WB
PC
+
+4
Program
Memory
Regfile
(R0-R14)
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
+
+offset
Flag
Vd
Vidx
Addr
Read
Inst
Stage 1
Fetch
Stage 2
Decode
Stage 3
Execute
Stage 4
Memory
#imm
How Would We Like to Handle Branches?
How to speed up the branch instruction
•Option 1: Assume a branch is not taken.
•Option 2: Evaluate the branch earlier in the pipeline.
•Option 3: Branch prediction.
Assume the Branch Is Not Taken
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
1. CMP R0, #0
2. BEQ ZERO
3. ONE ADD R1, R1, #1
FETCH
Clock
Cycle 1 2 3 4 5 6 7 8
Fetch the next
Instruction before
branch decision
1. CMP R0, #0
2. BEQ ZERO
3. ZERO ADD R2, R2, #1
If Branch is taken,
Flush the pipeline
DEC
EXE MEMDEC
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCHFLUSHFLUSH
DEC
EXE MEMDEC
Assume a Branch Is Not Taken
If we evaluate the branch in the execute stage, we wouldlose two
cycles every time we encountered a taken branch.
If we assume 20% of instructions are branches, and 60% are taken,
and an otherwise perfect CPI of 1:
CPI contribution from taken branches = 0.2*2 * 0.6 = 0.24
Our new CPI = 1.0 + branch stalls = 1.24
Why don’t we assume branch is taken ?
MOV R3, #0
LOOP ADD R3, R3 ,#1
ADD R1, R1, #2
SUB R2, R2, #4
CMP R3, #10
BNE LOOP
CMP R3, #10
BNE LOOP
LOOP SUB R1, R1, R0
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCHSTALLSTALL
DEC
EXE MEMDEC
We don’t know
where to jump to
until offset address
is calculated.
- From the left code, it seems that
branch is taken more than not taken
(9:1)
- Why don’t we assume branch is
taken instead ?
Evaluate the branch earlier in the pipeline
▪The same idea as “data forwarding”, the branch is evaluated early
in DEC stage and Flag is bypassed
FE DE EX ME WB
Flag Calculate
Branch Address
Evaluate the branch earlier in the pipeline
FE DE EX ME WB
Flag Calculate
CMP R3, #10
BNE LOOP
LOOP ADD R3, R3, R1
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCHSTALL
DEC
EXE MEMDEC
Branch Address
Evaluate the branch earlier in the pipeline
-This technique causes the longer delay in the circuit. It affects the clock speed
of the whole system and consider to be not worth.
FE DE EX
Flag Calculate
Branch Address
Dynamic Branch Prediction
▪Predict branch direction before branch is executed.
▪Prediction based on branch history
▪Start fetching instruction at predicted direction
PC
+
+4
Program
Memory
Regfile
(R0-R14)
CTRL
Box
Rn
Rm
Vn
Vm
Rs Vs
Shift ALU
+
+offset
Flag
Vd
Vidx
Addr
Read
Inst
Stage1 Fetch Stage2 Decode Stage3 Execute Stage4 Memory
#imm
How Would We Like to Handle Branches?
BPB Jump Add
Decision
1-Bit Branch Prediction (Not Taken)
0x00000040 : CMP R0, #0
0x00000044 : BEQ ZERO
0x00000048 : ONE ADD R1, R1, #1
0x0000004C: ZEROADD R2, R2, #1
0x44 = 0b01000100
0001
0 = Not Taken
1 = Taken
0
*
*
BPB
0x0000004C
*
*
Jump Address
1-Bit Branch Prediction
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
1. CMP R0, #0
2. BEQ ZERO
3. ONE ADD R1, R1, #1
FETCH
Clock
Cycle 1 2 3 4 5 6 7 8
Fetch the next
Instruction before
the prediction
DEC
EXE MEMDEC
1. CMP R0, #0
2. BEQ ZERO
3. ZERO ADD R2, R2, #1
If Prediction is
Wrong
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCH
DEC
EXE MEMDEC
PC+4
B_addr
FETCH DEC
PC+4
1-Bit Branch Prediction (Taken)
0x00000040 : CMP R0, #0
0x00000044 : BEQ ZERO
0x00000048 : ONE ADD R1, R1, #1
0x0000004C: ZEROADD R2, R2, #1
0x44 = 0b01000100
0001
0 = Not Taken
1 = Taken
0
*
*
BPB
0x0000004C
*
*
Jump Address
1-Bit Branch Prediction
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
1. CMP R0, #0
2. BEQ ZERO
3. ZERO ADD R2, R2, #1
Clock
Cycle 1 2 3 4 5 6 7 8
Fetch the next
Instruction after
the prediction
DEC
1. CMP R0, #0
2. BEQ ZERO
3. ONE ADD R1, R1, #1
If Prediction is
Wrong
FETCH
FETCH EXE
DEC EXE MEM
MEM
WB
WB
FETCH
DEC
EXE MEMDEC
FETCH
0x4C
PC+4
EXE MEM WBDEC
FETCH
0x4C
DEC
Exercise 5.4
▪Draw Pipeline stage diagram of these instructions with pipeline
stall using Branch Not-Taken method to avoid branch hazard.
MOV R0, #0x41000000
MOV R1, #0x0
MOV R2, #0x82
B1 ADD R1, R1, #1
ADD R3, R3, R0
SUB R4, R4, R2
CMP R1, #0x2
BNE B1
NOP
Exercise 5.5
▪Draw Pipeline stage diagram of these instructions with pipeline
stall using Branch Prediction to avoid branch hazard.
MOV R0, #0x41000000
MOV R1, #0x0
MOV R2, #0x82
B1 ADD R1, R1, #1
ADD R3, R3, R0
SUB R4, R4, R2
CMP R1, #0x2
BNE B1
NOP