Unit IV Fundamentals of Computer Organization.pptx

U23ITTC01 Digital design and system architecture

UNIT-IV Fundamentals of Computer Organization Block diagram of Digital Computer Organization and Design: Instruction codes, Registers, Instruction cycle, Memory Reference Instructions, Input-Output and Interrupt, ALU design, Execution of a complete instruction-Multiple bus organization, Hardwired CO4 control Microprogrammed control, Pipelining: Basic concepts, Data hazards, Instruction hazards, Parallel and Vector Processors.

Computer organization What is CO? Computer Organization is the study of internal working, structuring and implementation of a computer system

Computer Architecture Computer Architecture is a functional description of requirements and design implementation for the various parts of computer. It deals with functional behavior of computer system.

Computer organization Computer Organization is how operational attribute are linked together and contribute to realize the architectural specification. Computer Organization deals with structural relationship.

difference between CA and CO COMPUTER ARCHITECTURE COMPUTER ORGANIZATION Architecture describes what the computer does. Organization describes how it does it. Computer Architecture deals with functional behavior of computer system. Computer Organization deals with structural relationship . For designing a computer, its architecture is fixed first. For designing a computer, organization is decided after its architecture Computer Architecture is also called as instruction set architecture Computer Organization is frequently called as micro architecture. Architecture coordinates between the hardware and software of the system Computer Organization handles the segments of the network in a system.

Block diagram (or) internal logic diagram of computer

Basic components of computer A computer has five functionally independent main parts: Input unit Memory unit Arithmetic and logic unit Output unit Control unit

INPUT UNIT Input unit contains input devices. Input devices are used to give information to the computer. The process of entering data and instructions into the computer system. Eg Mouse, Keyboard, scanners, touchpad and light pens etc. BASIC COMPONENTS OF COMPUTER

OUTPUT DEVICES Output devices are used to get the result of task performed by the computer e.g Monitor, Printer, Projector and speaker etc BASIC COMPONENTS OF COMPUTER

Arithmetic and Logic Unit (ALU) ALU performs arithmetic and logical operations on data in order to convert them into useful information. Arithmetic operations are addition, subtraction, Multiplication and division. Logical operations are AND, OR and NOT BASIC COMPONENTS OF COMPUTER

Control Unit The control unit used to co-ordinate the operations of memory, ALU, input, output units. Memory Unit The function of the memory unit is to store programs and data BASIC COMPONENTS OF COMPUTER

MEMORY TYPES

Primary memory (Main memory) (RAM/ROM) It is the fastest memory. Program must be stored in main memory while they are being executed. Data and program instructions are stored in RAM while the computer is in operation. RAM: (Random Access Memory) is a volatile memory (information is lost when you switch off the computer) BASIC COMPONENTS OF COMPUTER

Primary memory (Main memory) ROM: Read only memory holds the software that can be read but not written by user. It is a small area of permanent memory that provides startup instruction when computer is turned on. Eg Network card and video card BASIC COMPONENTS OF COMPUTER

Secondary memory It is used when large amount of data and many programs have to be stored and accessed Eg. Hard disk, floppy disk, CD-ROM, flash drive. BASIC COMPONENTS OF COMPUTER

Let us discuss Some basic terms Program : A program is a set instructions that specify operations, operands(data) and the sequence by which processing has to occur "1 + 2" the "1" and "2" are the operands (data) , “+” is the operator (symbol). computer instruction : computer instruction is a binary code that specifies a sequence of micro operations for the computer The computer reads each instruction from memory and places it in a control register then it is executed INSTRUCTION CODES

Let us discuss Some basic terms Register : A processor register (CPU register) is one of a small set of data holding places that are part of the computer processor. A register may hold an instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). Interrupt : An interrupt is a request from I/O device for service by processor. Processor provides requested service. INSTRUCTION CODES

Let us discuss Some basic terms Assembler – is a software which converts Assembly Language Program to MLL (Machine Level Language) Compiler - Convert High Level Language(HLL) to MLL, does this job by reading source program at once Interpreter – Converts HLL to MLL, does this job statement by statement INSTRUCTION CODES

An Instruction code is a group of bits that instruct the computer to perform specific operation. ie , group of bits(0101..) tells the processor to perform specific task Example INSTRUCTION CODES

Instruction code format with two parts 1.Opcode 2. Address Opcode specifies operation to be performed and the Address part specifies an address INSTRUCTION CODES

Operation code (Opcode) of an instruction is a group of bits that define operations such as add, subtract, multiply, divide Unique binary code is assigned to every Opcode The number of bits required for the operation code of an instruction depends on the total number of operations available in the computer. The operation code must contain of at least n bits for a given 2 n Or less distinct operations INSTRUCTION CODES

INSTRUCTION CODES

The diagram shows type of organization for 16 bit memory address Instructions are stored in one section of memory and data in anouther . The memory unit with 4096 words ( 2 12) If we store instruction code in 16 bit memory we have 4 bit ( 2 4) for operation code and 12 bit for address ( 2 12) it uses 12 bit address to read operand data Stored program organization

INSTRUCTION CODES

Direct and indirect addressing

INSTRUCTION CODES

A processor register (CPU register) is one of a small set of data holding places that are part of the computer processor. A register may hold an instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). It is not a part of the main memory and is located in the CPU in the form of registers, which are the smallest data holding elements. Registers hold a small amount of data around 32 bits to 64 bits. COMPUTER REGISTER

Computer instructions are normally stored in consecutive memory locations and are executed sequentially one at a time. The control reads an instruction from a specific address in memory and executes it. It then continues by reading the next instruction in sequence and executes it and so on. This type of instruction sequencing needs to calculate the address of the next instruction after current execution is completed. It is necessary to provide a register in the control unit for storing the instruction code after it is read from memory. The computer needs processor register for holding a memory address. WHY COMPUTER REGISTER NEEDED?

Data Register (DR) Address Register(AR) The Accumulator Register (AC) Instruction Register(IR) Program Counter(PC) Temporary Register(TR) Input Register(INPR) Output Register(OUTR) DIFFERENT TYPES OF REGISTER

DIFFERENT TYPES OF REGISTER

Program counter (PC) : Program counter has 12 bits and it holds address of the next instruction to be executed from memory. Address Register (AR): It has 12 bits and holds address for memory Instruction Register (IR) : Holds Instruction codes and it has 16 bits Temporary Register(TR): Holds temporary data and it has 16 bits . A temporary register is the only register that can be read and written more than once in a single instruction DIFFERENT TYPES OF REGISTERS

Input Register (INPR) :Holds input character and it has 8 bits The Accumulator Register (AC) : Store result of operation (called as processor Register). It has 16 bits Output register (OUTR): Holds output character and it has 8 bits Data Register (DR): It has 16 bits and holds memory for operand DIFFERENT TYPES OF REGISTERS

DIFFERENT TYPES OF REGISTERS

BASIC COMPUTER INSTRUCTIONS TYPES TYPES OF COMPUTER INSTRUCTIONS A basic computer has three instruction code formats which are: Memory - reference instruction Register - reference instruction Input-Output instruction

1) MEMORY - REFERENCE INSTRUCTION Memory Reference instructions are mainly useful to perform operations which located on memory 2) REGISTER - REFERENCE INSTRUCTION Register Reference instructions are mainly useful in order to perform operation on register 3) INPUT-OUTPUT INSTRUCTION Input- output instructions perform operations on input output registers TYPES COMPUTER INSTRUCTIONS

MEMORY - REFERENCE INSTRUCTION FORMAT

REGISTER - REFERENCE INSTRUCTION FORMAT Register operation part size is 12 bit (0 to 11) Opcode size is 3 bit (12 to 14) and rage is 111 0 indicates direct address

REGISTER - REFERENCE INSTRUCTIONS

INPUT-OUTPUT INSTRUCTION FORMAT I/O operation part size is 12 bit (0 to 11) Opcode size is 3 bit (12 to 14) and rage is 111 1 indicates indirect address

INPUT-OUTPUT INSTRUCTIONS

INPUT-OUTPUT INSTRUCTION FORMAT

A program residing in the memory unit of a computer consists of a sequence of instructions. These instructions are executed by the processor by going through a cycle for each instruction. In a basic computer, each instruction cycle consists of the following phases: Fetch instruction from memory. Decode the instruction. Read the effective address from memory. Execute the instruction INSTRUCTION CYCLE

INSTRUCTION CYCLE

STEPS IN FLOW CHAT STEP 1: start SC 0 (sequence counter to zero) STEP 2: T0 : AR PC During the timing sequence T0, Program counter (PC) will be placed on Address Register (AR) , ie , the address in PC will be placed in AC, then address of next instruction will be stored in PC INSTRUCTION CYCLE FLOW CHAT

STEPS IN FLOW CHAT STEP 3: T1: IR M[AR], PC PC+1 In the Timing signal the content of memory address of AR will be placed on IR(instruction register). Now IR contains opcode (operation code ie , data) then PC increment by 1, ie , next instruction will be stored in PC STEP 2 & STEP3 indicates fetching process INSTRUCTION CYCLE FLOW CHAT

STEPS IN FLOW CHAT STEP 4: T2: Decode opcode IR (12-14 bits ) , AR IR (0-11 bits) I IR(15th bit) At T2, after fetching the instruction from memory is decoded, ie that instruction from memory is converted. The opcode present in IR will be decoded then 0-11 bit which holds the address that will be placed in AR. The I indicates 15 th bit indicates direct (0) or indirect address (0).. STEP 4 denotes Decode phase INSTRUCTION CYCLE FLOW CHAT

STEPS IN FLOW CHAT STEP 5: T3: decision (Direct or indirect address), D7 will decide type of reference instruction (memory or register or I/O) After decode, the instruction is executed before that it will check direct or indirect address INSTRUCTION CYCLE FLOW CHAT

INSTRUCTION CYCLE FLOW CHAT

MEMORY REFERENCE INSTRUCTION

MEMORY REFERENCE INSTRUCTION The effective address of the instruction is in AR and was placed there during timing signal T 2 when I = 0, or during timing signal T3 when I = 1 Memory cycle is assumed to be short enough to be completed in a CPU cycle The execution of MR Instruction starts with T 4

MEMORY REFERENCE INSTRUCTION

INPUT-OUTPUT CONFIGURATION In computer architecture, input-output devices act as an interface between the machine and the user. Instructions and data stored in the memory must come from some input device.The results are displayed to the user through some output device. INPUT-OUTPUT INTERRUPT

68 Input-Output and Interrupt Computational results must be transmitted to the user through some output device For the system to communicate with an input device, serial information is shifted into the input register INPR To output information, it is stored in the output register OUTR The following block diagram shows the input-output configuration for a basic computer.

INSTRUCTION CYCLE

INPUT-OUTPUT CONFIGURATION The input-output terminals send and receive information. The amount of information transferred will always have eight bits of an alphanumeric code. The information generated through the keyboard is shifted into an input register 'INPR'. The information for the printer is stored in the output register 'OUTR‘. INPUT-OUTPUT INTERRUPT

INPUT-OUTPUT CONFIGURATION Registers INPR and OUTR communicate with a communication interface serially and with the AC in parallel. The transmitter interface receives information from the keyboard and transmits it to INPR. The receiver interface receives information from OUTR and sends it to the printer serially. INPUT-OUTPUT INTERRUPT

72 Input-Output and Interrupt cont. INPR and OUTR communicate with a communication interface serially and with the AC in parallel. They hold an 8-bit alphanumeric information I/O devices are slower than a computer system  we need to synchronize the timing rate difference between the input/output device and the computer. FGI: 1-bit input flag (Flip-Flop) aimed to control the input operation

73 Input-Output and Interrupt cont. FGI is set to 1 when a new information is available in the input device and is cleared to 0 when the information is accepted by the computer FGO: 1-bit output flag used as a control flip-flop to control the output operation If FGO is set to 1, then this means that the computer can send out the information from AC. If it is 0, then the output device is busy and the computer has to wait!

74 Input-Output and Interrupt cont. The process of input information transfer: Initially, FGI is cleared to 0 An 8-bit alphanumeric code is shifted into INPR (Keyboard key strike) and the input flag FGI is set to 1 As long as the flag is set, the information in INPR cannot be changed by another data entry The computer checks the flag bit; if it is 1, the information from INPR is transferred in parallel into AC and FGI is cleared to 0

75 Input-Output and Interrupt cont. Once the flag is cleared, new information can be shifted into INPR by the input device (striking another key) The process of outputting information: Initially, the output flag FGO is set to 1 The computer checks the flag bit; if it is 1, the information from AC is transferred in parallel to OUTR and FGO is cleared to 0 The output accepts the coded information (prints the corresponding character)

76 Input-Output and Interrupt cont. When the operation is completed, the output device sets FGO back to 1 The computer does not load a new data information into OUTR when FGO is 0 because this condition indicates that the output device is busy to receive another information at the moment!!

77 Input-Output Instructions Needed for: Transferring information to and from AC register Checking the flag bits Controlling the interrupt facility The control unit recognize it when D 7 =1 and I = 1 The remaining bits of the instruction specify the particular operation Executed with the clock transition associated with timing signal T 3 Input-Output instructions are summarized next

78 D 7 IT 3 = p IR(i) = B i , i = 6, …, 11 INP pB 11 : AC(0-7)  INPR, FGI  0 Input char. to AC OUT pB 10 : OUTR  AC(0-7), FGO  0 Output char. from AC SKI pB 9 : if(FGI = 1) then (PC  PC + 1) Skip on input flag SKO pB 8 : if(FGO = 1) then (PC  PC + 1) Skip on output flag ION pB 7 : IEN  1 Interrupt enable on IOF pB 6 : IEN  0 Interrupt enable off Input-Output Instructions

Design of ALU An arithmetic logic unit ( ALU ) Performs arithmetic and logic operations A fundamental building block of the Central Processing Unit (CPU) of a computer Even the simplest microprocessors contain one for purposes such as maintaining timers A combinational logic circuit

Complex ALU

Simple ALU The S input is controlled by the processor based on the op code

82 full adder from a previous lecture Adder

Adder/Subtractor

Textbook ALU

Execution of a complete instruction Fundamental Concepts Processor fetches one instruction at a time and perform the operation specified. Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered. Processor keeps track of the address of the memory location containing the next instruction to be fetched using Program Counter (PC). Instruction Register (IR)

Executing an Instruction Fetch the contents of the memory location pointed to by the PC. The contents of this location are loaded into the IR (fetch phase). IR ← [[PC]] Assuming that the memory is byte addressable, increment the contents of the PC by 1 (fetch phase). PC ← [PC] + 1 Carry out the actions specified by the instruction in the IR (execution phase).

Executing an Instruction Transfer a word of data from one processor register to another or to the ALU. Perform an arithmetic or a logic operation and store the result in a processor register. Fetch the contents of a given memory location and load them into a processor register. Store a word of data from a processor register into a given memory location.

Register Transfers B A Z Y ALU Z in R i in R i R i out Y in bus Internal processor Constant 4 MUX Z out Figure 7.2. Input and output gating for the registers in Figure 7.1. Select

Register Transfers ● All operations and data transfers are controlled by the processor clock. Bu s D Q Q Clo c k 1 R i o u t R i in

Performing an Arithmetic or Logic Operation The ALU is a combinational circuit that has no internal storage. It performs arithmetic and logic operations on the two operands applied to its A and B inputs one of the operands is the output of the multiplexer MUX and the other operand is obtained directly from the bus The result produced by the ALU is stored temporarily in register Z. Therefore, a sequence of operations to add the contents of reg- ister RI to those of register R2 and store the result in register R3 is ● ● ●

Performing an Arithmetic or Logic Operation

Fetching a Word from Memory ● To fetch a word of information from memory, the pro- cessor has to specify the address of the memory loca- tion where this information is stored and request a Read operation. The processor transfers the required address to the MAR, whose output is connected to the address lines of the memory bus. At the same time, the processor uses the control lines of the memory bus to indicate that a Read operation is needed. When the requested data are received from the memory they are stored in register MDR, from where they can be transferred to other registers in the processor. ● ●

Fetching a Word from Memory ● The connections for register MDR has four control sig- nals: MDR in , and MD out , control the connection to the internal bus, and MDR in E and MDR out E control the connection to the external bus ● The input is selected when MDR in = 1

Fetching a Word from Memory Address into MAR; issue Read operation; data into MDR. MDR M e m o ry- b u s d a t a li n e s I n t e r n a l p r o ce s o r bu s MDR o u t MDR o u tE MDR in MDR in E

Fetching a Word from Memory Figure 7.4. Connection and control signals for register MDR.

Fetching a Word from Memory The response time of each memory access varies (cache miss, memory- mapped I/O,…). To accommodate this, the processor waits until it re- ceives an indication that the requested operation has been completed (Memory-Function-Completed, MFC). Move (R1), R2 ➢ ➢ ➢ ➢ ➢ MAR ← [R1] Start a Read operation on the memory bus Wait for the MFC response from the memory Load MDR from the memory bus R2 ← [MDR]

Fetching a Word from Memory

Timing 1 2 Clo ck Ad d res MR MF C R ea d MDR in E MDR o u t R2 ← [MDR] Fig ur e 7 . 5T. i m i ng o f a m e m or y Re a d op e r a t i o n . S tep 3 MAR in Assume MAR is always available on the address lines of the memory bus. MAR ← [R1] Start a Read operation on the memory bus Da t a Wait for the MFC response from the memory Load MDR from the memory bus

Storing a word in memory Writing a word into a memory location follows a similar procedure. The desired address is loaded into MAR. Then, the data to be & writ- ten ate loaded into MDR, and a Write com- mand is issued

Storing a word in memory

Execution of a Complete In- struction Let us now put together the sequence of el- ementary operations required to execute one instruction. Consider the instruction Add (R3),R1 which adds the content of a memory location pointed to by R3 to register R1

Execution of a Complete In- struction Executing this instruction requires the following actions: Add (R3), R1

Execution of a Complete In- struction St ep A ct i on 1 2 3 4 5 6 7 PC , M AR , Read,Sel e ct A 4 d , d, Z o u t in in Z o u , t PC in , Y in , W M F C M D o R u , t I R in R3 o u t , M AR in , Read R1 o u t , Y in , W M F C M D o R u , t Sel ect Y A , d d , Z in o u t in Z , R1 , End Fi gur 7 e . 6. Cont r olsequencfeorexecut i o n f t hei nst r uct i A o d n d ( R3) , R Data lin es Ad d res lin es Memo ry b u s Cary- in ALU P C MAR MDR Y Z Ad d XOR S u b In tern alp ro ceso r b u s IR TE MP R0 ALU co n tro l lin es Co n tro lsig n als R C n - 1 3 In stru ctio n d eco d eran d co n tro log ic A B Fig ur e 7 . 1 . Si n gl e - bu s o r g a n i z a t i o n of t h e da t a p MUX S elect Co n stan t4 Add (R3), R1

Branch Instructions A branch instruction replaces the contents of PC with the branch target address, which is usually obtained by adding an offset X given in the branch instruction. The offset X is usually the difference between the branch target address and the address immediately following the branch instruction. Conditional branch

Branch Instructions The gives a control sequence that implements an unconditional branch instruction. Processing starts, as usual, with the fetch phase. This phase ends when the instruction is loaded into the IR in step 3. The offset value is extracted from the IR by the instruction decoding circuit, which will also perform sign extension if required.

Execution of Branch Instruc- tions Since the value of the updated PC is already available in register Y, the Offset X is gated onto the bus in step 4, and an addition opera- tion is performed. The result, which is tire branch target address, is loaded into the PC in step 5.

Execution of Branch Instructions StepAction 1 2 3 4 5 PC out , MAR in , R ea d , S e l ec t 4 A , dd, Z in Z out , PC in , Y in , WMF C MDR out , IR in Offset- field- of- I o R u t , Add, Z in Z out , PC in , End Figure 7.7. Control sequence for an unconditional branch instruction.

Multiple- Bus Organization a three- bus structure used to connect the reg- isters and the ALU of a processor. All general- purpose registers are combined into a single block called the register file. In VLSI technology, the most efficient way to implement a number of registers is in the form of an array of memory cells similar to those used in the implementation of random- access memories (RAMs)

Multiple- Bus Organization There are two outputs, allowing the contents of two different registers to be accessed simultaneously and have their contents placed on buses A and B. The third port allows the data on bus C to be loaded into a third register during the same clock cycle.

Multiple- Bus Organization Buses A and B are used to transfer the source operands to the A and B inputs of the ALU, where an arithmetic or logic operation may be performed. The result is trans- ferred to the destination over bus C. If needed, the ALU may simply pass one of its two input operands unmodified to bus C. We will call the ALU con- trol signals for such an operation R=A or R=B. The three- bus arrangement obviates the need for registers Y and Z ● ●

Multiple- Bus Organization .

Multiple- Bus Organization Bu s A Bu s B Bu s C Memo ry b u s d atalin es Fig u r e 7 . 8T.h r e e - ubs ogar n i z a t i o n o f t h e da t a pa t h. In stru ctio n d eco d er PC Reg ister fi le Co n stan t4 MDR A ALU R B MUX In cremen ter Ad d res lin es MAR IR

Multiple- Bus Organization Add R4, R5, R6 S t e p A cti o n 1 PC out , R=B, MAR in , Read, IncPC 2 WMFC 3 4 MDR outB , R=B, IR in R4 outA , R5 outB , SelectA,Add, R6 in , End Figure 7.9. Control sequence for the instruction. Add R4,R5,R6, for the three- bus organization in Figure 7.8.

Hardwired Control

The Hardwired Control organization involves the control logic to be implemented with gates, flip- flops, decoders, and other digital circuits. The following image shows the block diagram of a Hardwired Control organization. Hardwired Control

Hardwired Control A Hard- wired Control consists of two decoders, a sequence counter, and a number of logic gates. An instruction fetched from the memory unit is placed in the instruction register (IR). The component of an instruction register in- cludes; I bit, the operation code, and bits through 11. The operation code in bits 12 through 14 are coded with a 3 x 8 decoder

The outputs of the decoder are designated by the symbols D0 through D7. The operation code at bit 15 is transferred to a flip- flop designated by the symbol I. The operation codes from Bits through 11 are applied to the control logic gates. The Sequence counter (SC) can count in bi- nary from 0 through 15. Hardwired Control

Hardwired Control To execute instructions, the processor must have some means of generating the control signals needed in the proper sequence. Two categories: hardwired control and microprogrammed control Hardwired system can operate at high speed; but with little flexibility.

Hardwired Control

Control Unit Organization Figure 7.10. Control unit organization. CLK Clock IR Decoder/ encoder Control step counter Control signals Condition codes External inputs

Microprogrammed Control

Control The Microprogrammed Control organization is implemented by using the programming approach. (ie, control signals are implemented with the help of programs /software) In Microprogrammed Control, the micro- operations are performed by executing a pro- gram consisting of micro- instructions. It acts as a midware between hardware and software.

Control It is easy to design, test and implement and also flexible to modify The following image shows the block diagram of a Microprogrammed Control organization

Control

The Control memory address register specifies the address of the micro- instruction. The Control memory is assumed to be a ROM, within which all control information is permanently stored. The control register holds the microinstruction fetched from the memory. Control

Control The micro- instruction contains a control word that specifies one or more micro- operations for the data processor. While the micro- operations are being executed, the next address is computed in the next ad- dress generator circuit and then transferred into the control address register to read the next microinstruction. The next address generator is often referred to as a micro- program sequencer, as it determines the address sequence that is read from control memory.

Control Micro programmed control unit consists of Control signal Control variables Control word Control memory Micro instructions Micro programs

Control ➢ Control signal : Group of bits used to select the path (whether go to multiplexer/ALU) Control variables : a binary variable specifies micro operation Control word :string of ones and zeros represents in control variable Control memory :contains control word Micro instructions : specifies execution of microoperations Micro programs : sequence of micro instructions ➢ ➢ ➢ ➢ ➢

Control

● The previous organization cannot handle the situation when the control unit is required to check the status of the condition codes or external in- puts to choose between alternative courses of action. Use conditional branch microinstruction. ● Addr e s M s i cr o i n s t r u c t i on 1 2 3 PC out , MAR in , Read,Select4,Add, Z in Z out , PC in , Y in , WMFC MDR out , IR in B r a n c h t o s t a rt i n g add r e s s o f app r o p r i a t m e icroroutine . ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... .. 25 26 27 If N=0, t h e n b r an c h t o m ic r o i nst r u c t i o n Offset-field- of- I R o ut ,SelectY,Add, Z in Z out , PC in , End Figure 7.17. Microroutine for the instruction Branch<0.

Pipelining

Overview Pipelining is a technique where multiple instructions are overlapped during execution. It allows storing and executing instructions in an orderly process It is also known a pipeline processing . Pipeline is divided into stages and these stages are connected with one another to form a pipe like structure. Instructions enter from one end and exit from another end.

Overview Pipelining increases the overall instruction throughput. In pipeline system, each segment consists of an input register followed by a combinational circuit. The register is used to hold data and combina- tional circuit performs operations on it. The output of combinational circuit is applied to the input register of the next segment.

Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.

Overview

Basic Concepts

Making the Execution of Programs Faster Use faster circuit technology to build the processor and the main memory. Arrange the hardware so that more than one operation can be performed at the same time. In the latter way, the number of operations performed per second is increased even though the elapsed time needed to perform any one operation is not changed.

Traditional Pipeline Concept Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes A B C D

Traditional Pipeline Concept A B C D 30 40 20 30 40 20 30 40 20 30 40 20 Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? 6 PM 7 8 9 10 11 Midnight Time

Traditional Pipeline Concept Pipelined laundry takes 3.5 hours for 4 loads A B C D 6 PM 7 8 9 10 11 Midnight T a s k O r d e r Time 30 40 40 40 40 20

Traditional Pipeline Concept Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using differ- ent resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for Dependences A B C D 6 PM 7 8 9 T a s k O r d e r Time 30 40 40 40 40 20

Use the Idea of Pipelining in a Computer F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 (a) Sequential execution Instruction fetch unit Execution unit Interstage buffer B1 (b) Hardware organization T ime F 1 E 1 F 2 E 2 F 3 E 3 Instruction I 1 I 2 I 3 (c) Pipelined execution Figure 8.1. Basic idea of instruction pipelining. Clock cycle 1 2 3 4 Time Fetch + Execution

Use the Idea of Pipelining in a Computer Clo ck cycle 1 2 3 4 5 6 Time 7 In stru ctio n I 1 F 1 D 1 E 1 W 1 I 2 F 2 D 2 E 2 W 2 I 3 F 3 D 3 E 3 W 3 I 4 F 4 D 4 E 4 W 4 Fig ur e 8 . 2A. 4 - s t a g e pi pe l i n e . (a )In stru c tio n e xe c u tio n d iv id ed in to fo u rste p s F :F etch in stru ctio n D:Deco d e in stru ctio n an d fetch o p eran d s E:E x ecu te o p eratio n W:Write resu lts In terstageb u f ers (b )Ha rd wa re o rg a n iza tio n B1 B2 B3 Fetch + Decode + Execution + Write

Role of Cache Memory Each pipeline stage is expected to complete in one clock cycle. The clock period should be long enough to let the slowest pipeline stage to complete. Faster stages can only wait for the slowest one to complete. Since main memory is very slow compared to the execution, if each instruction needs to be fetched from main memory, pipeline is almost useless. Fortunately, we have cache.

Pipeline Performance The potential increase in performance resulting from pipelining is proportional to the number of pipeline stages. However, this increase would be achieved only if all pipeline stages require the same time to complete, and there is no interruption throughout program execution. Unfortunately, this is not true.

Pipeline Performance The previous pipeline is said to have been stalled for two clock cycles. Any condition that causes a pipeline to stall is called a hazard. Data hazard – any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline. So some operation has to be de- layed, and the pipeline stalls. Instruction (control) hazard – a delay in the availability of an in- struction causes the pipeline to stall. Structural hazard – the situation when two instructions require the use of a given hardware resource at the same time. ● ● ● ● ●

Pipeline Performance Again, pipelining does not result in individual instruc- tions being executed faster; rather, it is the through- put that increases. Throughput is measured by the rate at which in- struction execution is completed. Pipeline stall causes degradation in pipeline perfor- mance. We need to identify all hazards that may cause the pipeline to stall and to find ways to minimize their impact.

Hazards

Hazards hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the clock cycle, and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control hazards (branching hazards).

Data Hazards Data Hazards occur when an instruction depends on the result of previous instruction and that result of instruction has not yet been computed. whenever two different instructions use the same storage. the location must appear as if it is executed in sequen- tial order. There are four types of data dependencies: Read after Write (RAW), Write after Read (WAR), Write after Write (WAW), and Read after Read (RAR). These are explained as follows below.

Data Hazards ● Read after Write (RAW) : It is also known as True dependency or Flow depen- dency. It occurs when the value produced by an instruc- tion is required by a subsequent instruction. For exam- ple, ADD R1, --, --; SUB --, R1, -- ; ● Write after Read (WAR) : It is also known as anti dependency. These hazards oc- cur when the output register of an instruction is used right after read by a previous instruction. For example, ADD --, R1, --; SUB R1, --, -- ;

Data Hazards ● Write after Write (WAW) : It is also known as output dependency. These hazards occur when the output register of an instruction is used for write after written by previous instruction. For example, ADD R1, --, --; SUB R1, --, -- ; ● Read after Read (RAR) : It occurs when the instruction both read from the same register. For example, ADD --, R1, --; SUB --, R1, -- ;

Data Hazards ● Handling Data Hazards : These are various methods we use to handle hazards: Forwarding, Code recording, and Stall insertion. These are explained as follows below. Forwarding : It adds special circuitry to the pipeline. This method works because it takes less time for the required values to travel through a wire than it does for a pipeline seg- ment to compute its result.

Data Hazards Code reordering : We need a special type of software to reorder code. We call this type of software a hardware- dependent com- piler. Stall Insertion : it inserts one or more installs (no- op instructions) into the pipeline, which delays the execution of the current in- struction until the required operand is written to the regis- ter file, but this method decreases pipeline efficiency and throughput. ● ●

Data Hazards ● ● ● ● We must ensure that the results obtained when instructions are executed in a pipelined processor are identical to those obtained when the same instructions are executed sequentially. Hazard occurs A ← 3 + A B ← 4 × A No hazard A ← 5 × C B ← 20 + C When two operations depend on each other, they must be executed sequentially in the correct order. Another example: Mul R2, R3, R4 Add R5, R4, R6

Operand Forwarding Instead of from the register file, the second instruction can get data directly from the out- put of ALU after the previous instruction is completed. A special arrangement needs to be made to “forward” the output of ALU to the input of ALU.

Handling Data Hazards in Software Let the compiler detect and handle the hazard: I1: Mul R2, R3, R4 NOP NOP I2: Add R5, R4, R6 The compiler can reorder the instructions to perform some useful work during the NOP slots.

Side Effects ● ● The previous example is explicit and easily detected. Sometimes an instruction changes the contents of a register other than the one named as the destination. When a location other than one explicitly named in an instruction as a destination operand is affected, the instruction is said to have a side effect. (Example?) Example: conditional code flags: Add R1, R3 AddWithCarry R2, R4 Instructions designed for execution on pipelined hardware should have few side effects. ● ● ●

Instruction Haz ards

Instruction Hazards ● The instruction fetch unit of the CPU is responsible for providing a stream of instructions to the execution unit. The instructions fetched by the fetch unit are in consecutive memory locations and they are executed. ● However the problem arises when one of the instructions is a branching instruction to some other memory location. Thus all the instruction fetched in the pipeline from consecu- tive memory locations are invalid now and need to removed(also called flushing of the pipeline).This induces a stall till ory new instructions are again fetched from the mem- address specified in the branch instruction

The effect of branch instructions and the techniques that can be used for mitigating their tional impact are discussed with uncondi- branches and conditional branches. Instruction Hazards

Unconditional Branches E k+ 1 I k + 1 F + k 1 E x ec u tio n u n itid le Time I 1 F 1 E 1

Unconditional Branches F 2 I 2 ( B r a n c h ) I 3 I k E 2 F 3 F k E k F k+ 1 E k+ 1 I k + 1 Fig ur e 8 . 8 A . n i dl yec lce c a u s e d by a br a n c h i n s t E x ec u tio n u n itid le 1 2 3 4 5 Clo c k c y c l e Time F 1 In s t r u c t i o n I 1 E 1 6 X .

Instruction Queue and Prefetching F : Fetch instruction E : Execute instruction W : Write results D : Dispatch/ Instruction queue Instruction fetch unit Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2 b . Decode unit

Conditional Braches A conditional branch instruction introduces the added hazard caused by the dependency of the branch condition on the result of a preceding instruction. The decision to branch cannot be made until the execution of that instruction has been completed. Branch instructions represent about 20% of the dynamic instruction count of most programs.

Conditional Braches A programming instruction that directs the computer to another part of the program based on the results of a compare Conditional branch is happened based on some condition like if condition in C. Transfer of control of the program will depend on the outcome of this condition.

Delayed Branch The instructions in the delay slots are always fetched. Therefore, we would like to arrange for them to be fully executed whether or not the branch is taken. The objective is to place useful instructions in these slots. The effectiveness of the delayed branch ap- proach depends on how often it is possible to reorder instructions.

Delayed Branch LOOP Shift_left R1 Decrement R2 NEXT Branch=0 Add LOOP R1,R3 (a) Original program loop LOOP Decrement Branch=0 R2 LOOP NEXT Shift_left Add R1 R1,R3 (b) Reordered instructions Figure 8.12. Reordering of instructions for a delayed branch.

Branch Prediction ● ● To predict whether or not a particular branch will be taken. Simplest form: assume branch will not take place and continue to fetch instructions in sequential address order. Until the branch is evaluated, instruction execution along the predicted path must be done on a speculative basis. Speculative execution: instructions are executed before the pro- cessor is certain that they are in the correct execution sequence. Need to be careful so that no processor registers or memory locations are updated until it is confirmed that these instructions should indeed be executed. ● ● ●

Branch Prediction Better performance can be achieved if we arrange for some branch instructions to be predicted as taken and others as not taken. Use hardware to observe whether the target address is lower or higher than that of the branch instruction. Let compiler include a branch prediction bit. So far the branch prediction decision is always the same every time a given instruction is executed – static branch prediction.

Parallel processing Parallel processing is a computing technique when multiple streams of calculations or data processing tasks co-occur through numerous central processing units (CPUs) working concurrently. Parallel processing uses two or more processors or CPUs simultaneously to handle various components of a single activity

Parallel processing W:Write r e s u lts D i s p a t c h u n it In s t r u c ti o n q u e u e F lo a ti ng - p o in t u n it In t e g e r u n it F i g ur e 8 . 1 9 A . pr oc e s s o r wi t h t wo e x e c u t i on u ni t s . F :In s t r u c ti o n f e t c h u n it

VECTOR PROCESSING a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors . Vector processing is a computer method that can process numerous data components at once. It operates on every element of the entire vector in one operation, or in parallel, to avoid the overhead of the processing loop.

vector processing W:Write r e s u lts D i s p a t c h u n it In s t r u c ti o n q u e u e F lo a ti ng - p o in t u n it In t e g e r u n it F i g ur e 8 . 1 9 A . pr oc e s s o r wi t h t wo e x e c u t i on u ni t s . F :In s t r u c ti o n f e t c h u n it

Unit IV Fundamentals of Computer Organization.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Unit IV Fundamentals of Computer Organization.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77