pipeline_ppt_1 computer organisation design

anuprekshasahula 13 views 25 slides Sep 16, 2025
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

It is pipelining in computer organisation design


Slide Content

Pipeline

2
W + D + R
T
For N clothes, time T
1= N.T
SupposewehavebuiltamachineMthatcan
wash(W),dry(D),andiron(R)clothes,one
clothatatime.
TotaltimerequiredisT.
A Real-life Example
Wants to speed up the process by factor 3?
Approach 1:
W + D + R
T
W + D + R
T
W + D + R
T
Number of machine: 3
Cost : 3x

Asanalternative,wesplitthemachineintothreesmaller
machinesM
w(onlywash),M
D(onlyDry),andM
R(onlyIron)
,which
canperformthespecifictaskonly.
TimerequiredbyeachofthesmallermachinesisT/3(say)
3
W D R
T/3 T/3 T/3
For N clothes, time T3 = (2+N).T/3
Wants to speed up the process by factor 3? Approach2
Suppose we have N (large number, say 1000) cloths to wash
Time required to complete:
Approach 1: T1 = N.T
Approach2: T3 = (2+N).T/3.
Which is approximately T1/3, if N is large, speed up by 3
Cost not increases by 3x (only marginal increment of the cost)
W + D + R
T
For N clothes, time T
1= N.T
Approach 1
Approach 2

4
How does the pipeline work?

Space –time diagram / Reservation Table
5
Time required to fill the pipe
Output is available from here

What is Pipelining?
6
Amechanismforoverlappedexecutionofseveralinputsetsbypartitioning
somecomputationintoasetofk-subcomputations(orstages).
Verynominalincreaseinthecostofimplementation.
Verysignificantspeed-up(ideallyk).

Extending the concept to processor pipeline
7

Example -1
•Suppose we want to perform the following task
A
i * B
i + C
i wherei = 1,2,…..7
8
Non pipeline
2 stage Pipeline
Table1: Content of registers in pipeline version
1 clock cycle delay

9
Model of a Synchronous k-stage pipeline
clock
S
1
S
2 S
k
Stage 1 Stage 2 Stage k
Input
Output
Latch/Register (buffer) to store intermediate results
Total task is :: S
1+S
2+ … S
k

Minimum clock period??
10
Task
(1+2)
Clock (f
clk)
t = 10nS
task1
Clock (f
clk)
t = 4nStask2
t = 6nS
task1
Clock (f
clk)
t = 5nStask2
t = 5nS

11
Clock Period
??????=clock period of the pipeline system
�
??????= time delay of the circuit in stage �
??????t
1 t
2 t
k
??????
�= delay of a latch
Maximum stage delay ??????
??????= ??????????????????{�
??????}
Thus??????=??????
??????+ ??????
�
Pipeline frequency ??????=
1
??????
If one result is expected to come out of pipeline every clock cycle, fwill be the
maximum throughput of the pipeline.
clock
S
1
S
2 S
k
Stage 1 Stage 2 Stage k
Input Output

12
Speedup Calculation
The total time to process N data sets by a pipeline system (k stages) is given by
�
??????=??????−1+??????.??????
??????−1.??????times required to fill the pipeline;
1 result every ??????time after that is ??????.??????
For an equivalent non-pipelined processor (i.e., one stage) the total time to process N data
sets is given by
�
1=N.k.τ(Ignoring the latch delay)
we assume that the time it takes to process a task is the same in the pipeline and non-pipeline
processor.
Speed-up of the k-stage pipeline over equivalent non-pipelined processor.
�
??????=
�1
�
??????
=
??????.??????.??????
[(??????−1)+??????].??????
=
??????.??????
�+(??????−1)
If ??????→∞, �
??????→??????(Maximum speed up possible ≡ no of stages in the pipeline(k))

Problem-1
13

14
Pipeline Efficiency
How close is the performance to its ideal
value
??????
??????=
�
??????
??????
=
??????.??????
??????+??????−1
×
1
??????
=
??????.??????
�+(??????−1)
Pipeline Throughput
Number of operations computed per unit time
??????
??????=
??????
�
??????
=
??????
??????+??????−1.??????
Latency
It is the length of time required to perform a single task or Time required to get the first
result.

15
How does the pipeline work?
Latency : 3 x T/3 = T

Critical path delay (t
critical) = The path through the logic which determines
the ultimate speed of the structure.
Maximum clock frequency (f
clk) =
t
latch= delay due to latch.
Latency = Length of time required to perform a single task =10nS.
(t
latch= ignored).
throughput = Numbers of tasks executed per unit time. i.e.
= 1 output/10nS.
Sampling rate = 1/10nS = 100 Msamples/sec1
()tt
criticallatch

task
Clock (f
clk)
t = 10nS
02 1345
10nS
Non-Pipeline
16

•T
clk= max {t1, t2, t3}, i.e. T
clk= 4nS.
(t
latch= ignored).
•f
clk= 1/T
clk= 250 MHz.
•Speedup =
•throughput = 1 output/4nS
•latency = Nos. of stages ×T
clk
= 3 ×4nS = 12nS.
•Sampling Rate = 250 Msamples/sec 1/4
1/10
outputnS
outputnS
P1
P2
P3
clock
t1= 3nS
t2=4nS
t3=3nS
Utilization
¾ or 75%
1 or 100%
¾ or 75%
02 1345
4nS
Pipeline
P3
P2
P1
0 4 8 1216
T1
T1
T1
T2
T2
T2
T3
T3
T3
Space
Time (ns)
20
17

18
Problem-2
Suppose that the time delays of the four segments of a pipeline systems are t
1= 60ns, t
2= 70ns, t
3= 100ns
and t
4= 80ns, and interface registers have a delay of t
r= 10ns. Calculate the speed-up of the pipelined system
over the equivalent non pipelined system.

19
Instruction Pipeline

20
Four segment CPU pipeline
clock
FI DA EX
Input
Output
Latch/Register
FO

Problem 3
21
Location/Address
0
1
021
022
023
Write a program using the instructions mentioned below, which can perform 5-6?
5
6
result
PC

22
Arithmetic Pipeline
Floating Point Addition
A = 2.381 ×10
38
B = 5.113 ×10
37
Exponent Equalization:
23.81 ×10
37
5.113 ×10
37
Mantissa Addition
23.81 ×10
37
5.113 ×10
37
28.923 ×10
37
Renormalization (if necessary)
2.892 ×10
38

23
Pipelined implementation of a Floating Point Adder

24
Multiplier
Adder
A
i B
i
C
i
R1 R2
R5
R3 R4

25
Thank you
Tags