UNIT IV Computer architecture Analysis.pptx

rajesshs31r 23 views 66 slides Jul 31, 2024
Slide 1
Slide 1 of 66
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66

About This Presentation

CA


Slide Content

UNIT IV MEMORY AND I/O INTERFACING Memory Technologies – Basics of Caches – Measuring and Improving Cache Performance – Virtual Memory – Transaction- Look aside Buffer (TLB) - Memory Hierarchy - Input/output system, programmed I/O, DMA and interrupts, I/O processors. 1

The four primary technologies used in memory hierarchies are Static Random Access Memory(SRAM) Dynamic Random Access Memory(DRAM) ROM and Flash Memory Magnetic Disk. 2 MEMORY TECHNOLOGIES

Static Random Access Memory(SRAM) Part of RAM It is main memory and located very close to the processor. These are simply IC’s that are memory array with single access port which provides Read / Write. SRAM’s don’t need to refresh so the access time is very close to the cycle time. It uses 6 to 8 transistors per bit to prevent the information from being disturbed when read. SRAM needs only minimal power to retain the charge in standby mode. 3 MEMORY TECHNOLOGIES

Dynamic Random Access Memory ( D RAM) In a dynamic RAM (DRAM), the value kept in a cell is stored as a charge in a capacitor. A single transistor is used to access this stored charge, either to read the value or to overwrite the charge stored there. DRAMs use only one transistor per bit of storage, they are much denser and cheaper per bit than SRAM. As DRAMs store the charge on a capacitor, it cannot be kept indefinitely and must periodically be refreshed. That is why this memory structure is called dynamic, in contrast to the static storage in an SRAM cell. 4 MEMORY TECHNOLOGIES

Dynamic Random Access Memory ( D RAM) 5 MEMORY TECHNOLOGIES

Dynamic Random Access Memory ( D RAM) 6 MEMORY TECHNOLOGIES Modern DRAMs are organized in banks, typically four for DDR3. Each bank consists of a series of rows. Sending a PRE ( precharge ) command opens or closes a bank. A row address is sent with an Act (activate), which causes the row to transfer to a buffer. When the row is in the buffer, it can be transferred by successive column addresses at whatever the width of the DRAM is (typically 4, 8, or 16 bits in DDR3) or by specifying a block transfer and the starting address. Each command, as well as block transfers, is synchronized with a clock . To improve the interface to processors further, DRAMs added clocks with it called synchronous DRAMs or SDRAMs . The fastest version is called Double Data Rate (DDR) SDRAM .

Flash Memory Flash memory is a type of Electrically Erasable Programmable Read-Only Memory (EEPROM ). Unlike disks and DRAM, but like other EEPROM technologies, writes can wear out flash memory bits. To cope with such limits, most flash products include a controller to spread the writes by remapping blocks that have been written many times to less trodden blocks. This technique is called wear leveling. With wear leveling, personal mobile devices are very unlikely to exceed the write limits in the flash. Such wear leveling lowers the potential performance of flash, but it is needed unless higher-level software monitors block wear. Flash controllers that perform wear leveling can also improve yield by mapping out memory cells that were manufactured incorrectly. 7 MEMORY TECHNOLOGIES

Disk Memory A magnetic hard disk consists of a collection of platters, which rotate on a spindle at 5400 to 15,000 revolutions per minute. The metal platters are covered with magnetic recording material on both sides, similar to the material found on a cassette or videotape. To read and write information on a hard disk, a movable arm containing a small electromagnetic coil called a read-write head is located just above each surface. 8 MEMORY TECHNOLOGIES

Disk Memory Each disk surface is divided into concentric circles, called tracks. There are typically tens of thousands of tracks per surface. Each track is in turn divided into sectors that contain the information; each track may have thousands of sectors. Sectors are typically 512 to 4096 bytes in size. The term cylinder is used to refer to all the tracks under the heads at a given point on all surfaces. 10 MEMORY TECHNOLOGIES

Disk Memory To access data, the operating system must direct the disk through a three-stage process. position the head over the proper track. This operation is called a seek, and the time to move the head to the desired track is called the seek time . Once the head has reached the correct track, we must wait for the desired sector to rotate under the read/write head. This time is called the rotational latency or rotational delay. The last component of a disk access, transfer time , is the time to transfer a block of bits 11 MEMORY TECHNOLOGIES

Magnetic Disk and Semi conductor Memory Technology Magnetic Disk Memory Technology Semi conductor Memory Technology It is Non – Volatile It is Volatile It has slower access time It has more faster access time

DRAM and SRAM Memory Technology DRAM SRAM Single transistor is used to access the stored charge It uses 6 to 8 transistors per bit It is much denser and cheaper per bit than SRAM It is high cost per bit It needs periodic refresh It needs not to be refreshed The value cannot be kept indefinitely The value can be kept indefinitely

CACHE Cache Memory  is a special very high-speed memory that acts as a buffer between RAM and the CPU. Cache memory is costlier than main memory or disk memory but economical than CPU registers. It holds frequently requested data and instructions so that they are immediately available to the CPU when needed.

Memory Hierarchy

CACHE BASICS Cache was the name chosen to represent the level of the memory hierarchy between the processor and main memory. When the CPU finds a requested data item in the cache it is called a cache hit . When the CPU does not find a data item it needs in the cache is called a cache miss . A cache miss is handled by hardware and causes processor in-order execution or stall until the data are available. The time required for the cache miss depends on both the latency and bandwidth of the memory. Latency determines the time to retrieve the first word of the block. Bandwidth determines the time to retrieve the rest of this block. 16

Cache Performance When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache. If the processor finds that the memory location is in the cache, a  cache hit  has occurred and data is read from cache If the processor  does not  find the memory location in the cache, a  cache miss  has occurred. For a cache miss, the cache allocates a new entry and copies in data from main memory, then the request is fulfilled from the contents of the cache.

Cache Performance The performance of cache memory is frequently measured in terms of a quantity called  Hit ratio. Hit ratio = hit / (hit + miss) = no. of hits/total accesses We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache .

CACHE MEMORY

20

Basic of cache Cache: a safe place for hiding or storing things Direct-mapped cache: each memory location is mapped to exactly one location in cache Mapping rule: (block address) modulo (number of cache block in the cache) 0b00000 0b00001 0b00010 0b00011 0b00100 0b00 101 0b00110 0b00111 0b01000 0b01010 0b01011 0b01100 0b01 101 0b01110 0b01111 0b10000 0b01001 0b000 0b001 0b010 0b011 0b100 0b101 0b110 0b111 0b10001 cache Main memory Observation: we only use 3 least significant bits to determine address.

22

Direct Mapping Cache line number = ( Main Memory Block Address ) Modulo (Number of lines in Cache)   In direct mapping A particular block of main memory can map only to a particular line of the cache. The line number of cache to which a particular block can map is given by-  

Direct Mapping Tag A field in a table used for a memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word.

Direct Mapping

Fully Associative Mapping In fully associative mapping, A block of main memory can map to any line of the cache that is freely available at that moment. This makes fully associative mapping more flexible than direct mapping.

Fully Associative Mapping

Fully Associative Mapping

Need of Replacement Algorithm- In fully associative mapping, A replacement algorithm is required. Replacement algorithm suggests the block to be replaced if all the cache lines are occupied. Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed.

K-way Set Associative Mapping Cache set number = ( Main Memory Block Address ) Modulo (Number of sets in Cache)   In k-way set associative mapping, Cache lines are grouped into sets where each set contains k number of lines. A particular block of main memory can map to only one particular set of the cache. However, within that set, the memory block can map any cache line that is freely available. The set of the cache to which a particular block of the main memory can map is given by- Cache set number = ( Main Memory Block Address ) Modulo (Number of sets in Cache)  

2-way Set Associative Mapping

2-way Set Associative Mapping

Here, k = 2 suggests that each set contains two cache lines. Since cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets. Block ‘j’ of main memory can map to set number (j mod 3) only of the cache. Within that set, block ‘j’ can map to any cache line that is freely available at that moment. If all the cache lines are occupied, then one of the existing blocks will have to be replaced . Need of Replacement Algorithm- Set associative mapping is a combination of direct mapping and fully associative mapping. It uses fully associative mapping within each set. Thus, set associative mapping requires a replacement algorithm.

Write Through  and  Write Back Whenever a Processor wants to  write  a word, it checks to see if the address it wants to write the data to, is present in the cache or not. If address is present in the cache i.e.,  Write Hit . We can update the value in the cache and avoid a expensive main memory access . But this results in  Inconsistent Data   Problem . Two popular cache write properties are   Write Through     Write Back  

Write Through

Write through In write through, data is  simultaneously updated to cache and memory . This process is simpler and more reliable . This is used when there are no frequent writes to the cache(Number of write operation is less ). It helps in data recovery (In case of power outage or system failure). A data write will experience latency (delay) as we have to write to two locations (both Memory and Cache ). It solves the inconsistency problem.

Write Back

Contd.. The data is updated only in the cache and updated into the memory in later time. Data is updated in the memory only when the cache line is ready to replaced (cache line replacement is done using Belady’s Anomaly, Least Recently Used Algorithm, FIFO, LIFO and others depending on the application).

Measuring and Improving Cache Performance T wo different techniques for improving cache performance reducing the miss rate by reducing the probability that two distinct memory blocks will contend for the same cache location. reducing the miss penalty by adding an additional level to the hierarchy. This technique, called multilevel caching,

In direct-mapped placement, there is only one cache block where memory block 12 can be found, and that block is given by (12 modulo 8)=4. In a two-way setassociative cache, there would be four sets, and memory block 12 must be in set (12 mod 4)=0; the memory block could be in either element of the set. In a fully associative placement, the memory block for block address 12 can appear in any of the eight cache blocks

Assume the miss rate of an instruction cache is 2% and the miss rate of the data cache is 4%. If a processor has a CPI of 2 without any memory stalls, and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect 778 cache that never missed. Assume the frequency of all loads and stores is 36%.

MEMORY HIERARCHY 51 A memory hierarchy consists of multiple levels of memory with different speeds and sizes. The faster memories are more expensive per bit than the slower memories and thus are smaller. The principle of locality states that programs access a relatively small portion of their address space at any instant of time, There are two different types of locality : ■ Temporal locality (locality in time): if an item is referenced, it will tend to be referenced again soon. If you recently brought a book to your desk to look at, you will probably need to look at it again soon . ■ Spatial locality (locality in space): if an item is referenced, items whose addresses are close by will tend to be referenced soon. Libraries put books on the same topic together on the same shelves to increase spatial locality .

BASIC STRUCTURE OF MEMORY HIERARCHY 52

Structure of Memory Hierarchy

54

The faster memory is close to the processor and the slower, less expensive memory is below it . The goal is to present the user with as much memory as is available in the cheapest technology, while providing access at the speed offered by the fastest memory . A level closer to the processor is a subset of any level and all the data is stored at the lowest level . A memory hierarchy can consist of multiple levels, but data is copied between only two adjacent levels at a time , The upper level—the one closer to the processor—is smaller and faster than the lower level , since the upper level uses technology that is more expensive . T he minimum unit of information that can be either present or not present in the two-level hierarchy is called a block or a line . If the data requested by the processor appears in some block in the upper level , this is called a hit . If the data is not found in the upper level, the request is called a miss . 55

The hit rate , or hit ratio , is the fraction of memory accesses found in the upper level; it is oft en used as a measure of the performance of the memory hierarchy. The miss rate (1−hit rate) is the fraction of memory accesses not found in the upper level . Hit time is the time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss. The miss penalty is the time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to the processor. Because the upper level is smaller and built using faster memory parts, the hit time will be much smaller than the time to access the next level in the hierarchy, which is the major component of the miss penalty . Main Memory is extended with a higher speed, smaller cache. The cache is not usually visible to the programmer or to the processor. The cache is a device for staging the movement of data between main memory and processor registers to improve performance. A portion of memory can be used as a buffer to hold data temporarily that is to be read out to disk. This technique is referred to as disk cache . 56

VIRTUAL MEMORY Virtual memory: It is a technique that uses main memory as a “cache” for secondary storage. Two major motivations for virtual memory to allow efficient and safe sharing of memory among multiple programs. to remove the programming burdens of a small, limited amount of main memory . To allow multiple virtual machines to share the same memory, we must be able to protect the virtual machines from each other, ensuring that a program can only read and write the portions of main memory that have been assigned to it. Main memory need contain only the active portions of the many virtual machines, just as a cache contains only the active portion of one program. 57

T he virtual machines sharing the memory change dynamically while the virtual machines are running. Because of this dynamic interaction, we would like to compile each program into its own address space —a separate range of memory locations accessible only to this program. Virtual memory implements the translation of a program’s address space to physical addresses . This translation process enforces protection of a program’s address space from other virtual machines . The second motivation for virtual memory is to allow a single user program to exceed the size of primary memory. Formerly, if a program became too large for memory, it was up to the programmer to make it fit . 58

A virtual memory block is called a page , and a virtual memory miss is called a page fault . With virtual memory, the processor produces a virtual address , which is translated by a combination of hardware and software to a physical address , which in turn can be used to access main memory . Figure 5.25 shows the virtually addressed memory with pages mapped to main memory. This process is called address mapping or address translation . Today, the two memory hierarchy levels controlled by virtual memory are usually DRAMs and flash memory in personal mobile devices and DRAMs and magnetic disks in servers Virtual memory also simplifies loading the program for execution by providing relocation . Relocation maps the virtual addresses used by a program to different physical addresses before the addresses are used to access memory. This relocation allows us to load the program anywhere in main memory . In virtual memory, the address is broken into a virtual page number and a page off set . 59

60

Figure 5.26 shows the translation of the virtual page number to a physical page number . The physical page number constitutes the upper portion of the physical address, while the page off set, which is not changed, constitutes the lower portion . The number of bits in the page off set field determines the page size. The number of pages addressable with the virtual address need not match the number of pages addressable with the physical address . Having a larger number of virtual pages than physical pages is the basis for the illusion of an essentially unbounded amount of virtual memory. 61

62

63

VIRTUAL MEMORY

TRANSLATION LOOKASIDE BUFFER(TLB) M odern processors include a special cache that keeps track of recently used translations. This special address translation cache is traditionally referred to as a translation- lookaside buffer ( TLB A cache that keeps track of recently used address mappings to try to avoid an access to the page table Figure 5.29 shows that each tag entry in the TLB holds a portion of the virtual page number, and each data entry of the TLB holds a physical page number. 65

66
Tags