Department of Computer Science and Engineering Welcome… Slide Sources: Patterson & Hennessy COD book website (copyright Morgan Kaufmann) adapted and supplemented
Syllabus Unit Titles: Unit I Basic Structure of a Computer System Unit II Arithmetic for Computers Unit III Processor and Control Unit Unit IV Parallelism Unit V Memory & I/O Systems
Syllabus – Unit I UNIT-I BASIC STRUCTURE OF A COMPUTER SYSTEM Functional Units – Basic operational concepts –– Instructions: Operations, Operands – Instruction representation – Instruction Types – MIPS addressing, Performance
Syllabus – Unit II UNIT-II ARITHMETIC FOR COMPUTERS Addition and Subtraction – Multiplication – Division – Floating Point Representation – Floating Point Addition and Subtraction.
Syllabus – Unit III UNIT-III PROCESSOR AND CONTROL UNIT A Basic MIPS implementation – Building a Datapath – Control Implementation Scheme – Pipelining – Pipelined datapath and control – Handling Data Hazards & Control Hazards.
Syllabus – Unit IV UNIT-IV PARALLELISM Introduction to Multicore processors and other shared memory multiprocessors – Flynn’s classification: SISD, MIMD, SIMD, SPMD and Vector – Hardware multithreading – GPU architecture.
Syllabus – Unit V UNIT-V MEMORY & I/O SYSTEMS Memory Hierarchy – memory technologies – Cache Memory – Performance Considerations , Virtual Memory,TLB’s – Accessing I/O devices – Interrupts – Direct Memory Access – Bus Structure – Bus operation.
Text Books Book 1: Name: Computer Organization and Design: The Hardware/Software Interface Authors: David A. Patterson and John L. Hennessy Publisher: Morgan Kaufmann / Elsevier Edition: Fifth Edition, 2014 Book 2: Name: Computer Organization and Embedded Systems Interface Authors: Carl Hamacher , Zvonko Vranesic , Safwat Zaky and Naraig Manjikian Publisher: Tata McGraw Hill Edition: Sixth Edition, 2012
Introduction What is mean by Computer Architecture? Hardware parts Instruction set Interface between hardware & software
Instruction Set Architecture (ISA) ISA: The interface or contact between the hardware and the software Rules about how to code and interpret machine instruc tions: Execution model (program counter) Operations (instructions) Data formats (sizes, addressing modes) Processor state (registers) Input and Output (memory, etc.)
Introduction What is meant by Computer Architecture? Computer architecture encompasses the specification of an instruction set and the functional behavior of the hardware units that implement the instructions.
Introduction
Technology Evolution
UNIT-I BASIC STRUCTURE OF A COMPUTER SYSTEM Topics: Functional Units Basic operational concepts Instructions: Operations, Operands Instruction representation Instruction Types MIPS addressing mode Performance
Functional Units Also called as Datapath
Functional Units
Functional Units Input unit Output unit Memory unit Arithmetic Logic unit Control unit
Functional Units Input unit
Functional Units Output unit
Functional Units Memory unit
Functional Units
Functional Units
Functional Units Arithmetic & Logic unit and Control unit
Basic Operational Concepts Unit I
Connection between the processor and the main memory Code Snippet: Load R2, LOC Add R4, R3, R2 Store LOC, R4
IR & PC Instruction Register: The instruction register (IR) holds the instruction that is currently being executed. Program Counter: The program counter (PC) contains the memory address of the next instruction to be fetched and executed.
Memory Locations and Addresses
Examples of encoded information in a 32-bit word.
Instructions
Steps in program translation ISA
Translations
Machine vs Assembly Language Machine Language Assembly Language A particular set of instructions that the CPU can directly execute – but these are ones and zeros Ex: 0100001010101 Assembly language is a symbolic version of the equivalent machine language Ex: add a,b
Mnemonics
Instructions Instruction Set: The vocabulary of commands understand by a given architecture. Some ISA: ARM Intel x86 IBM Power MIPS SPARC Different CPUs implement different set of instructions.
MIPS MIPS - M icroprocessor with I nterlocked P ipeline S tages Features: five-stage execution pipeline: fetch, decode, execute, memory-access, write-result regular instruction set, all instructions are 32-bit three-operand arithmetical and logical instructions 32 general-purpose registers of 32-bits each only the load and store instruction access memory flat address space of 4 GBytes of main memory (2^32 bytes)
MIPS Assembly Language Categories: Arithmetic – Only processor and registers involved (sum of two registers) Data transfer – Interacts with memory (load and store) Logical - Only processor and registers involved (and, sll ) Conditional branch – Change flow of execution (branch instructions) Unconditional Jump – Change flow of execution (jump to a subroutine)
MIPS Registers
Arithmetic
Data Transfer
Load & Store Instructions Load: Transfer data from memory to a register Store: Transfer a data from a register to memory Memory address must be specified by load and store Processor Memory STORE LOAD
Logical
Conditional
Unconditional Jump
MIPS Arithmetic All MIPS arithmetic instructions have 3 operands Operand order is fixed (e.g., destination first) Example : C code: A = B + C MIPS code: add $s0, $s1, $s2 compiler’s job to associate variables with registers
MIPS Arithmetic Design Principle 1 : simplicity favors regularity . Translation : Regular instructions make for simple hardware ! Simpler hardware reduces design time and manufacturing cost. Of course this complicates some things... C code: A = B + C + D; E = F - A; MIPS code add $t0, $s1, $s2 (arithmetic): add $s0, $t0, $s3 sub $s4, $s5, $s0 Performance penalty: high-level code translates to denser machine code. Allowing variable number of operands would simplify the assembly code but complicate the hardware.
MIPS Arithmetic a b c f g h i j $s0 $s1 $s2 $s3 $s4 $s5 $s6 $s7 a = b - c; f = ( g+h )–( i+j ); sub $s0, $s1, $s2 add $t0, $s4, $s5 add $t1, $s6, $s7 sub $s3, $t0, $t1 19 /67 Try: f = g + (h – 5) f = ( i + j) – (k – 20)
Registers vs. Memory Arithmetic instructions operands must be in registers MIPS has 32 registers Compiler associates variables with registers What about programs with lots of variables (arrays, etc.)? Use memory , load/store operations to transfer data from memory to register – if not enough registers spill registers to memory MIPS is a load/store architecture Processor I/O Control Datapath Memory Input Output
Memory Organization Viewed as a large single-dimension array with access by address A memory address is an index into the memory array Byte addressing means that the index points to a byte of memory, and that the unit of memory accessed by a load/store is a byte 1 2 3 4 5 6 ... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data
Memory Organization Bytes are load/store units, but most data items use larger words For MIPS, a word is 32 bits or 4 bytes. 2 32 bytes with byte addresses from 0 to 2 32 -1 2 30 words with byte addresses 0, 4, 8, ... 2 32 -4 i.e., words are aligned what are the least 2 significant bits of a word address? 4 8 12 ... 32 bits of data 32 bits of data 32 bits of data 32 bits of data Registers correspondingly hold 32 bits of data
The Endian Question Big Endian 31 MIPS can also load and store 4-byte words and 2-byte halfwords. The endian question: when you read a word, in what order do the bytes ap pear? Little Endian: Intel, DEC, et al. Big Endian: Motorola, IBM, Sun, et al. MIPS can do either SPIM adopts its host’s c onvention byte byte 1 byte 2 byte 3 Little Endian 31 byte 3 byte 2 byte 1 byte 32 /67
The Endian Question x = 0x01234567
Load/Store Instructions Load and store instructions Example : C code: A[8] = h + A[8]; MIPS code (load): lw $t0, 32($s3) (arithmetic): add $t0, $s2, $t0 (store): sw $t0, 32($s3) Load word has destination first, store has destination last Remember MIPS arithmetic operands are registers, not memory locations therefore, words must first be moved from memory to registers using loads before they can be operated on; then result can be stored back to memory offset address value
So far we’ve learned: MIPS loading words but addressing bytes arithmetic on registers only Instruction Meaning add $s1, $s2, $s3 $s1 = $s2 + $s3 sub $s1, $s2, $s3 $s1 = $s2 – $s3 lw $s1, 100($s2) $s1 = Memory[$s2+100] sw $s1, 100($s2) Memory[$s2+100]= $s1 Try:Find the assembly code of B[8]=A[ i ]+A[j]; A and B available in $s6 and $s7 respectively $so-$s5 consists of the values f-j
Exercise Q: For the following C statement, what is the corresponding MIPS assembly code? Assume that the variables f, g, h, and i are given and could be considered 32-bit integers as declared in a C program. Use a minimal number of MIPS assembly instructions. f = g + (h − 5); Solution: f -> $s1, g -> $s2, h -> $s3 addi $t0, $s3,-5 add $s1, $s2, $t0
Representing Instructions in the Computer Instruction format: A form of representation of an instruction composed of fields of binary numbers. All MIPS instructions are 32 bit long. Three types of instruction formats: R-type (for register) or R-format I-type (for immediate) or I-format J-type (for jump) or J-format
R-type (for register) MIPS fields: op: Basic operation of the instruction ( opcode ) rs : The first register source operand rt : The second register source operand rd : The register destination operand shamt : Shift amount funt : Function. It selects the specific variant of the operation in the op filed. (function code) Ex: add $t0, $s1, $s2
I-type (for immediate) MIPS fields: op: Basic operation of the instruction ( opcode ) rs : The register source operand rt : destination register, which receives the result of the load constant or address: It contains 16 bit constant or address value.
Translating a MIPS Assembly Instruction into a Machine Instruction Given instruction: add $t0,$s1,$s2 Solution: Identify the type instruction format: R-type Format: Operation rd , rs , rt rs -> $s1, rt -> $s2, rd -> $t0, shamt – NA Op -> , funct -> Decimal representation: Binary representation: op rs rt rd shamt funct 17 18 8 32 op rs rt rd shamt funct 000000 10001 10010 01000 00000 100000
Exercise Q: Translate the following MIPS Assembly code into binary code. sub $t3,$s4,$s5 op rs rt rd Shamt Funct 20 21 11 34 000000 10100 10101 01011 00000 100010
Exercise Q: Translate the following MIPS Assembly code into binary code. sub $t3,$s4,$s5 000000 10100 10101 01011 00000 100010
Translating a MIPS Assembly Instruction into a Machine Instruction Given instruction: lw $t0,32($s3) Solution: Identify the type instruction format: I-type Format: Operation rt , addr .( rs ) rs -> $s3, rt -> $to, immediate -> 32 Decimal representation: Binary representation: op rs rt address 35 19 8 32 op rs rt 100011 10011 01000 0000 0000 0010 0000
Exercise Q: Translate the following MIPS Assembly code into binary code. sw $t2,58($s5) 101011 10101 01010 0000 0000 0011 1010
Translating High level Language into Machine Language Q: Consider the following high level statement A[300] = h + A[300]; If $t1 has the base of the array A and $s2 corresponds to h, What is the MIPS machine language code?
Logical Operations
Shift operations Shift allow bits to be moved around inside of a register. Shift left logical Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits Machine Code: op rs rt rd shamt funct 000000 00000 10000 01010 00100 000000
Shift Left Logical( sll ) Example: sll $t2,$s0,4 # reg $t2 = reg $s0 << 4 bits If $s0=10 Value of $t2=???
Shift Right Logical( srl ) Example: srl $t5,$s3,2 # reg $t5 = reg $s3 >> 2 bits If $s3=12 Value of $t5=???
Logical Operations – AND, OR & NOT A logical bit-by-bit operation with two operands. EX: and $t0,$t1,$t2 # reg $t0 = reg $t1 & reg $t2 or $t0,$t1,$t2 # reg $t0 = reg $t1 | reg $t2 nor $t0,$t1,$t3 # reg $t0 = ~ ( reg $t1 | reg $t3)
Example
Instructions for Making Decisions Sequences that allow programs to execute statements in order one after another. Branches that allow programs to jump to other points in a program. Loops that allow a program to execute a fragment of code multiple times. MIPS Instructions: beq register1, register2, L1 bne register1, register2, L1 beq and bne are mnemonics Conditional branches
Instructions for Making Decisions Q: In the following code segment, f , g , h , i , and j are variables. If the five variables f through j correspond to the five registers $s0 through $s4 , what is the compiled MIPS code for this C if statement? if (i == j) f = g + h; else f = g - h;
Instructions for Making Decisions Solution:
Instructions for Making Decisions High level code: if (i == j) f = g + h; else f = g - h; MIPS code: bne $s3,$s4,Else # go to Else if i ≠ j add $s0,$s1,$s2 # f = g + h (skipped if i ≠ j) j Exit # go to Exit Else: sub $s0,$s1,$s2 # f = g - h (skipped if i = j) Exit:
Compiling a while Loop in C while (save[i] == k) i += 1; Assume that i and k correspond to registers $s3 and $s5 and the base of the array save is in $s6 . What is the MIPS assembly code corresponding to this C segment?
Compiling a while Loop in C while (save[i] == k) i += 1; load save[i] into a temporary register add i to the base of array save to form the address performs the loop test go to Exit if save[i] ≠ k adds 1 to I back to the while test at the top of the loop Exit
while (save[i] == k) i += 1; Assume that i and k correspond to registers $s3 and $s5 and the base of the array save is in $s6 . What is the MIPS assembly code corresponding to this C segment? Solution: Loop: sll $t1,$s3,2 # Temp reg $t1 = i * 4 add $t1,$t1,$s6 # $t1 = address of save[i] lw $t0,0($t1) # Temp reg $t0 = save [i] bne $t0,$s5, Exit # go to Exit if save[i] ≠ k addi $s3,$s3,1 # i = i + 1 j Loop # go to Loop Exit:
MIPS Addressing Mode The different ways for specifying the locations of instruction operands are known as addressing mode. The MIPS addressing modes are the following: Immediate addressing mode Register addressing mode Base or displacement addressing mode PC-relative addressing mode Pseudodirect addressing mode
Immediate addressing mode Def : the operand is a constant within the instruction itself Ex: addi $s1, $s2, 20 #$s1=$s2+20 Ilustration :
Register addressing mode Def : source and destination operands are registers which are available in processor registers. Direct addressing mode Ex: add $s1, $s2, $s3 #$s1=$s2+$s3 Ilustration :
Base or displacement addressing mode Def : the operand is at the memory location whose address is the sum of a register and a constant in the instruction Indirect addressing mode Ex: lw $s1, 20 ($s3) #$s1= Memory[$s3+20] Ilustration :
PC-relative addressing mode Def : the branch address is the sum of the PC and a constant in the instruction Ex: bne $s4, $s5, 25 # if ($s4 != $s5), go to pc=12+4+100 Ilustration :
Pseudodirect addressing mode Def : the jump address is the 26 bits of the instruction concatenated with the upper bits of the PC Ex: j 1000 Ilustration :
Decoding Machine Code Q: What is the assembly language statement corresponding to this machine instruction? 00af8020hex Solution: converting hexadecimal to binary Binary instruction format Assembly instruction
Translating Machine Language to Assembly Language Translate the following machine language code into assembly language. 0x02F34022
Performance Performance is the key to understanding underlying motivation for the hardware and its organization Measure, report, and summarize performance to enable users to make intelligent choices see through the marketing hype! Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance?
Computer Performance: TIME, TIME, TIME!!! Response Time ( elapsed time , latency ): how long does it take for my job to run? how long does it take to execute (start to finish) my job? how long must I wait for the database query? Throughput : how many jobs can the machine run at once? what is the average execution rate? how much work is getting done? If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase? Individual user concerns… Systems manager concerns…
Execution Time Elapsed Time counts everything ( disk and memory accesses, waiting for I/O, running other programs, etc. ) from start to finish a useful number, but often not good for comparison purposes elapsed time = CPU time + wait time (I/O, other programs, etc.) CPU time doesn't count waiting for I/O or time spent running other programs can be divided into user CPU time and system CPU time (OS calls) CPU time = user CPU time + system CPU time elapsed time = user CPU time + system CPU time + wait time Our focus: user CPU time ( CPU execution time or, simply, execution time ) time spent executing the lines of code that are in our program
Definition of Performance For some program running on machine X: Performance X = 1 / Execution time X If there are two machines X and Y if the performance of X is greater than performance of Y, Performance X > Performance Y ie ., 1 / Execution time X > 1 / Execution time Y X is n times faster than Y means: Performance X / Performance Y = n Performance X / Performance Y = Execution time Y / Execution time X = n
Q: If computer A runs a program in 10 sec and computer B runs the same program in 15 secs, how much faster is A than B We know that, Performance A / Performance B = Execution time B / Execution time A = n Thus the performance ratio is, Execution time B / Execution time A = 15 / 10 = 1.5 ie ., Performance A / Performance B = 1.5 Therfore Peformance of A 1.5 times faster than Performance of B
Clock Cycles Instead of reporting execution time in seconds, we often use cycles . In modern computers hardware events progress cycle by cycle: in other words, each event, e.g., multiplication, addition, etc., is a sequence of cycles Clock ticks indicate start and end of cycles: cycle time = time between ticks = seconds per cycle clock rate ( frequency ) = clock cycles per second (1 Hz. = 1 cycle/sec, 1 MHz. = 10 6 cycles/sec) Example : A 200 Mhz. clock has a cycle time of ???? time cycle tick tick
Performance Equation I So, to improve performance one can either: reduce the number of cycles for a program, or reduce the clock cycle time, or, equivalently, increase the clock rate CPU execution time CPU clock cycles Clock cycle time for a program for a program = equivalently Also, CPU execution time CPU clock cycles / Clock cycle rate for a program for a program
Our favorite program runs in 10 seconds on computer A, which has a 2 GHz clock. We are trying to help a computer designer build a computer, B, which will run this program in 6 seconds. The designer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for this program. What clock rate should we tell the designer to target? CPU time A = CPU Clock cycles A / clock rate A 10 sec = CPU Clock cycles A / 2*10 9 cycles/sec CPU Clock cycles A = 10 sec * 2*10 9 cycles/sec = 20 *10 9 cycles CPU time B = 1.2 * CPU Clock cycles A / clock rate B 6 secs = 1.2 * 20 *10 9 cycles / clock rate B clock rate B = 1.2 * 20 *10 9 cycles / 6 sec= 4 * 10 9 Hz To run the program in 6 secs, B must be 4 * 10 9 Hz
Instruction Performance No reference to no of instructions in previous equation The execution time depends on the number of instructions in the program Clock cycles per instruction (CPI) Average number of clock cycles per instruction for a program or program fragment
Suppose we have two implementations of the same instruction set architecture. Computer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program, and computer B has a clock cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program and by how much? Same number of instructions are instructions are executed
Instruction Performance CPU execution time = Instruction count * average CPI * Clock cycle time for a program for a program Or CPU execution time = Instruction count * average CPI / Clock rate for a program for a program
Instruction Performance
Which code sequence executes the most? Sequence 1 executes, 2 + 1 + 2 = 5 instructions Sequence 2 executes, 4+ 1 + 1 = 6 instructions Sequence 2 executes most no of instructions
Which will be faster? So code sequence 2 is faster
What is the CPI for each sequence? Sequence 2 has lower CPI as it takes fewer clock cycles but has more instructions
Basic components of Performance
Factors affecting Peformance
Operations and Operands Operands are definite elements of computer instruction that show what information is to be operated on. The most important general categories of data are Addresses Numbers Characters Logical data
Addresses Addresses are nothing but a form of data. Here some calculations must be performed on the operand reference in an instruction, which is to determine the physical address of an instruction.
Numbers All machine languages include numeric data types. Even in non-numeric data processing, numbers are needed to act as counters, field widths, etc. An important difference between numbers used in ordinary mathematics and numbers stored in a computer is that the latter is limited. Thus, the programmer is faced with understanding the consequences of rounding, overflow and underflow.
Integer or fixed point Fixed point representation is used to store integers, the positive and negative whole numbers (… -3, -2, -1, 0, 1, 2, 3, …). However, the programmer assigns a radix point location to each number and tracks the radix point through every operation. High-level programs, such as C and BASIC usually allocate 16 bits to store each integer.
Floating point A Floating Point number usually has a decimal point, which means 0, 3.14, 6.5, and -125.5 are Floating Point The term floating point is derived from the fact that there is no fixed number of digits before and after the decimal point, which means the decimal point can float. There are also representations in which the number of digits before and after the decimal point is set, called fixed-point representations. In general, floating-point representations are slower and less accurate than fixed-point representations, but they can handle a larger range of numbers.
Decimal number The decimals are an extension of our number system. We also know that decimals can be considered fractions with 10, 100, 1000, etc. The numbers expressed in the decimal form are called decimal numbersor decimals. For example:1, 4.09, 13.83, etc. A decimal number has two parts, and a dot separates these parts (.) called the decimal point . Whole number part: The digits lying to the left of the decimal point form the whole number part. The places begin with ones, tens, hundreds, thousands and so on. Decimal part: The decimal point and the digits laying on the right of the decimal point form the decimal part. The places begin with tenths, hundredths, thousandths and so on.
Characters A common form of data is text or character strings. While textual data are most convenient for humans. But computers work in binary. So, all characters, whether letters, punctuation or digits, are stored as binary numbers. All of the characters that a computer can use are called character set s. Here are the two common standards, such as: American Standard Code for Information Interchange (ASCII) Unicode ASCII uses seven bits, giving a character set of 128 characters. The characters are represented in a table called the ASCII table. The 128 characters include: 32 control codes (mainly to do with printing) 32 punctuation codes, symbols, and space 26 upper-case letters 26 lower-case letters numeric digits 0-9