Design pipeline architecture for various stage pipelines

Sogive 2,074 views 38 slides Dec 17, 2017
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

Design pipeline architecture for various stage pipelines


Slide Content

Design pipeline architecture for various stage pipelines. WELLCOME TO OUR PRESENTATION Slide 1

Mahmudul Hasan [email protected] Group Members Slide 2

Content Pipeline Instruction Design Diagram Performance Example Advantage Disadvantage Conclusion Slide 3

Before there was pipelining… Single-cycle control: hardwired Low CPI (1) Long clock period (to accommodate slowest instruction) Multi-cycle control: micro-programmed Short clock period High CPI Single-cycle Multi-cycle insn0.(fetch,decode,exec) insn1.(fetch,decode,exec) insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec time Slide 3

Pipelining Start with multi-cycle design When insn0 goes from stage 1 to stage 2 … insn1 starts stage 1 Each instruction passes through all stages … but instructions enter and leave at faster rate Multi-cycle in s n0.f e t c h insn1.dec insn0.dec insn0.e x ec insn1. f e t c h insn1.e x ec t i me Pipelined in s n0 . f e tch insn0.dec insn1.f e t c h insn0.exec insn1.dec insn2.f e t c h insn1.exec insn2.dec insn2.exec Can have as many insns in flight as there are stages Slide 4

Pipelining A Pipelining is a series of stages, where some work is done at each stage in parallel . The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. Slide 5

Pipeline categories Linear pipelines A linear pipeline processor is a series of processing stages and memory access. Non-linear pipelines A non-linear pipelining (also called dynamic pipeline) can be configured to perform various functions at different times. In a dynamic pipeline, there is also feed-forward or feed-back connection. A non-linear pipeline also allows very long instruction words. Slide 6

Processor Pipeline Review Slide 7

Instruction Pipeline Instruction pipeline has six operations, Fetch instruction (FI) Decode instruction (DI) Calculate operands (CO) Fetch operands (FO) Execute instructions (EI) Write result (WR) Overlap these operations Slide 8

Stage 1: Fetch Diagram Instructi o n bits IF / ID Pipeline register Instructi o n Cache PC en en 1 + M U X PC + 1 Decode tar g et Slide 9

Stage 1: Instructions Fetch Fetch an instruction from memory every cycle Use PC to index memory Increment PC (assume no branches for now) Write state to the pipeline register (IF/ID) The next stage will read this pipeline register Slide 10

Stage 2: Decode Diagram ID / EX Pipeline register regA contents regB contents Register File r egA regB en Instruction bits IF / ID Pipeline register PC + 1 PC + 1 Cont r ol signals F etch E x ecute destReg d ata tar g et Slide 11

Stage 2: Instruction Decode Decodes opcode bits Set up Control signals for later stages Read input operands from register file Specified by decoded instruction bits Write state to the pipeline register (ID/EX) Opcode Register contents PC+1 (even though decode didn’t use it) Control signals (from insn) for opcode and destReg Slide 12

Stage 3: Execute Diagram ID / EX Pipeline register regA contents regB contents ALU r esu l t EX/Mem Pipeline register PC + 1 Cont r ol signals Cont r ol signals PC+1 + o ffset + regB contents A L U M U X Decode Mem o r y destReg d ata target Slide 13

Stage 3: Execution Perform ALU operations Calculate result of instruction Control signals select operation Contents of regA used as one input Either regB or constant offset (from insn) used as second input Calculate PC-relative branch target PC+1+(constant offset) Write state to the pipeline register (EX/ Mem ) ALU result, contents of regB, and PC+1+offset Control signals (from insn) for opcode and destReg Slide 14

Stage 4: Memory Diagram ALU r e s ult Mem/WB Pipeline register ALU r esu l t EX/Mem Pipeline register Cont r ol signals PC+1 + o ffset regB contents Loaded data Control signals E x ecute Write-back in_data in_addr Data Cache en R/W destReg d ata tar g et Slide 15

Stage 4: Memory Perform data cache access ALU result contains address for LD or ST Opcode bits control R/W and enable signals Write state to the pipeline register (Mem/WB) ALU result and Loaded data Control signals (from insn) for opcode and destReg Slide 16

Stage 5: Write-back Diagram ALU r esu l t Mem/WB Pipeline register Cont r ol signals Loaded data M U X d ata destReg M U X Mem o r y Slide 17

Stage 5:Write Back Writing result to register file (if required) Write Loaded data to destReg for LD Write ALU result to destReg for arithmetic insn Opcode bits control register write enable signal Slide 18

Putting It All Together PC Inst Cache Register file M U X A L U 1 Data Cac he + M U X IF/ID E X / M em Me m /WB M U X dest op ID/ E X offset PC+1 PC+1 + t arget ALU r esult dest op valB dest op ALU result mdata e q ? instructi o n valA valB R0 R1 R2 R3 R4 R5 R6 R7 r egA regB data dest M U X Slide 19

Six Stage Instruction Pipeline Slide 20

Designing a Pipelined Processor Let see and examine our datapath and control diagram. Associated resources with states. Ensure that flows do not conflict, or figure out how to resolve Assert control in appropriate stage. Slide 21

5 Steps of MIPS Datapath Register Memory: All ALU operation will be performed in register Memory. Instruction & Memory: Two kind of Memory System:- 1)instruction Memory 2)Data Memory Only instruction can access through memory as ALU operation will perform in Register operand. 5 Instruction cycle have to complete to execute the operation

5 Steps of MIPS Datapath M emo r y Access Wr i te Back I n s t r u c t i on Fetch Instr. Decode Reg . Fetch Execute Addr. Calc L M D A L U M U X Memory Reg File M U X M U X D a t a Memory M U X S i g n Extend 4 Adder Zero? Next SEQ PC Address Next PC WB Data I n st R D R S 1 R S 2 I m m What do we need to do to pipeline the process ? Slide 22

5 Steps of MIPS/DLX Datapath M emory Access W r i t e Back I n s t r u c ti on Fetch Instr. Decode Reg . Fetch Execute Addr. Calc Memory M UX M UX D a t a Memory Zero? IF / I D ID/EX Reg File M U X MEM/WB E X/ M E M ALU 4 Adder Next SEQ PC Next SEQ PC RD RD RD WB Data Data stationary control – local decode for each instruction phase / pipeline stage Next PC Address R S 1 R S 2 Sign Imm Extend M U X Slide 23

Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths Graphically Representing Pi p e li n e s Slide 24

Visualizing Pipelining I n s t r . O r d e r Time (clock cycles) g A L U DMem If e t c h R e Reg R e g AL U DM e m Ifetch R e g R e g AL U D M em Ifetch R e g R e g AL U DM e m Ifetch R e g Cycle 6 Cycle 7 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Slide 25

Conventional Pipelined Execution Representation IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow T i m e Slide 26

S i n gl e C yc l e , M u l t ip l e C y c l e , vs . Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Me m W r Mem Wr Ifetch R e g Exec Multiple Cycle Implementation: Load Ifetch Reg Exec St ore Exec Me m W r Clk Single Cycle Implementation: Load St ore Wa s t e Mem Ifetch R - t y pe R e g E x ec Me m W r Pipeline Implementation: Load Ifet c h R e g E x ec Store Ifetch Reg R-type Ifetch Cycle 2 Cycle 1 Slide 27

Why pipelining Suppose we execute 100 instructions Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns Ideal pipelined machine 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns Slide 28

Pipeline Performance Pipelining – At best no impact on latency • Still need to wait “n” stages (cycles) for completion of instruction – Improves “throughput” • No single instruction executes faster but overall throughput is higher • Average instruction execution time decreases • Successive instructions complete in each successive cycle (no 5 cycle wait between instructions) – Reality • Clock determined by slowest stage • Pipeline overhead – Clock skew – Register delay – Pipeline fill and drain Slide 29

Pipeline Performance: Example Consider a non pipelined machine with 6 execution stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50 ns, and 50 ns.   -  Find the instruction latency on this machine.        -  How much time does it take to execute 100 instructions? Solution:   Instruction latency = 50+50+60+60+50+50= 320 ns   Time to execute 100 instructions = 100*320 = 32000 ns   Suppose we introduce pipelining on this machine. Assume that when introducing pipelining, the clock skew adds 5ns of overhead to each execution stage.       - What is the instruction latency on the pipelined machine?        - How much time does it take to execute 100 instructions? Solution:   Remember that in the pipelined implementation, the length of the pipe stages must all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to: The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = 60 + 5 = 65 ns   Instruction latency =  65 ns   Time to execute 100 instructions = 65*6*1 + 65*1*99  = 390 + 6435 = 6825 ns Slide 30

Problems with Pipeline processors? Slide 31 Three types of pipeline hazards Structural hazard – They arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution.  – Data hazard – They arise when an instruction depends on the result of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline.  – Control hazard – They arise from the pipelining of branches and other instructions that change the PC.

Advantages Pipelining makes efficient use of resources. Quicker time of execution of large number of instructions The parallelism is invisible to the programmer . Slide 32

Disadvantage Pipelining involves adding hardware to the chip. Inability to continuously run the pipeline at full speed because of pipeline hazards which disrupt the smooth execution of the pipeline. Slide 33

conclusion The pipelining concept takes lot of advantages in many of the systems. The pipelining has some of the hazards. The pipelining of instructions which reduce the CPI, increases the speed of execution or operation, and also increase throughput of Overall system. This is basic concept of any system and lot of improvement can be done in pipelining concept to increase the speed of system. The future scope is to apply the concept to different embedded systems and we can see how the performance increases by this concept. Slide 34

Referrance https://en.wikipedia.org/wiki/Pipeline_(computing) https://en.wikipedia.org/wiki/Instruction_pipelining https://docs.marklogic.com/guide/cpf/pipelines#id_50300 https://www.tutorialspoint.com/computer_organization/instruction_pipeline_architecture.asp Slide 36