Cache Memory

5,504 views 49 slides May 17, 2021
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

A presentation on Cache Memory prepared for Computer Architecture Class for B.Tech CSE 2nd Year 4th Semester.


Slide Content

CACHE MEMORY ONLINE PRESENTATION BY SUBID BISWAS MAIL: [email protected] CS 401 - Computer Architechture & Organisation

GROUP MEMBERS A INTRODUCTION, WORKING AND LEVELS OF CACHE MEMORY B CACHE STRUCTURE AND ORGANIZATION, CONCLUSION C PERFORMANCE OF CACHE MEMORY D CACHE COHERENCE E SYNCHRONIZATION MECHANISM F MAPPING TECHNIQUES OF CACHE MEMORY G PAGING TECHNIQUES IN CACHE MEMORY H REPLACEMENT ALGORITHM I CACHE WRITE POLICIES J CACHE MEMORY HIERARCHY CSE

WHAT IS CACHE MEMORY ?

What is Cache Memory? Cache memory is a small, high speed RAM buffer located between the CPU and Main Memory. Cache memory holds a copy the instructions or data currently being used by the CPU. The main purpose of a cache memory is to accelerate your computer speed while keeping the price of the computer low. It stores and retains data only until a computer is powered up. The cache memory stores copies of the data from frequently used main memory locations. Cache memory  is  faster than RAM , and because it is located closer to the CPU, it can get and start processing the instructions and data much more quickly. CACHE MEMORY A buffer contains data that is stored for a short amount of time, typically in the computer's memory (RAM). The purpose of a buffer is to hold data right before it is used.

Why Cache is needed? The cache memory is required to balance the speed mismatch between the main memory and the CPU. The clock of the processor is very fast, while the main memory access time is comparatively slower. Hence, the processing speed depends more on the speed of the main memory. How it differs from RAM? Cache memory is a type of super-fast RAM which is designed to make a computer or device run more efficiently. By itself, this may not be particularly useful, but cache memory plays a key role in computing when used with other parts of memory.   ADVANTAGES & DISADVANTAGES Advantages of cache memory The main memory is slower than cache memory. It creates a way for fast data transfers so it consumes less access time as compared to main memory. It stores frequently access that can be executed within a short period of time. Disadvantages of cache memory It is limited capacity memory. It is very expensive as compared to Memory (random access memory (RAM)) and Hard Disk.

LEVELS OF CACHE MEMORY

LEVELS OF CACHE MEMORY

LEVELS OF CACHE MEMORY Levels of memory: Level 1 or Register- It is a type of memory in which data is stored and accepted that are immediately stored in CPU. Most commonly used register is accumulator, Program counter, address register etc. Level 2 or Cache memory- It is the fastest memory which has faster access time where data is temporarily stored for faster access. Level 3 or Main Memory- It is memory on which computer works currently. It is small in size and once power is off data no longer stays in this memory. Level 4 or Secondary Memory- It is external memory which is not as fast as main memory but data stays permanently in this memory.

WORKING OF CACHE MEMORY

HOW IT WORKS ? How Cache Memory works? MEMORY ORGANIZATION CPU – Central Processing Unit is just like brain of a computer; and performs the arithmetical, logical operations of the system by carrying instructions on the code. The memory organization of a system is shown below: At the core is CPU, Cache, RAM Storage Device.

CACHE STRUCTURE AND ORGANIZATION

Cache Structure: Cache row entries usually have the following structure: CACHE STRUCTURE AND ORGANIZATION Tag Data Block Flag Bits An effective memory address which goes along with the cache line (memory block) is split into the tag, the index and the block offset. Tag Index Block Offset The  data block  (cache line) contains the actual data fetched from the main memory. The  tag  contains (part of) the address of the actual data fetched from the main memory.  An instruction cache requires only one flag bit per cache row entry: a valid bit.  The index describes which cache set that the data has been put in. The block offset specifies the desired data within the stored data block within the cache row.

Cache Organization: The cache organization is about mapping data in memory to a location in cache. One way to go about this mapping is to consider last few bits of long memory address to find small cache address, and place them at the found address. The problem with this approach is, we loose the information about high order bits and have no way to find out the lower order bits belong to which higher order bits. CACHE STRUCTURE AND ORGANIZATION

To handle above problem, more information is stored in cache to tell which block of memory is stored in cache. We store additional information as  Tag. CACHE STRUCTURE AND ORGANIZATION What is a Cache Block? Since programs have Spatial Locality (Once a location is retrieved, it is highly probable that the nearby locations would be retrieved in near future). So a cache is organized in the form of blocks. Typical cache block sizes are 32 bytes or 64 bytes. ?? – 11

PERFORMANCE OF CACHE MEMORY

Issues of Cache Memory Performance: The performance of cache memory concerns mainly on two aspects- Cycle Count Hit Ratio The Cycle Count: The cache speed is affected by the underlying static or dynamic ram technology, the cache organization and the cache hit ratios. Hit Ratio: It is affected by the cache size and by the block size . Effect of Block Size: with a fixed cache size, cache performance is rather sensitive to block size. Effects of Set Number: for a fixed cache capacity the hit ratio may increase as the number of sets increase. PERFORMANCE OF CACHE MEMORY

Techniques to Improve Cache Memory Performance of Cache Memory: Technique 1: Larger block size. Technique 2: Larger cache to reduce miss rate. Technique 3: Higher associativity to reduce miss rate. Technique 4: Multi level cache Caches should be faster to keep Pace with the speed of the processor and the cache should be larger to overcome the widening gap between the processor and main memory. Technique 5: Prioritize read misses to reduce miss penalty. Technique 6: Avoid address translation for indexing to reduce hit time. PERFORMANCE OF CACHE MEMORY

CACHE COHERENCE

Cache Coherence: Cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. CACHE COHERENCE Cache and the main memory may have inconsistent copies of the same object. In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same level of the memory hierarchy.

There are three types of coherence: Directory-based: In a directory-based system, the data being shared is placed in a common directory that maintains the coherence between caches. Snooping: Snooping is a process where the individual caches monitor address lines for accesses to memory locations that they have cached. Snarfing: It is a mechanism where a cache controller watches both address and data in an attempt to update its own copy of a memory location when a second master modifies a location in main memory. CACHE COHERENCE

CACHE COHERENCE There are three distinct level of cache coherence :- Every write operation appears to occur instantaneously. All processors see exactly the same sequence of changes of values for each separate operand. Different processors may see an operation and assume different sequences of values; this is known as non-coherent behavior . There are various Cache Coherence Protocols in multiprocessor system. These are :- MSI protocol (Modified, Shared, Invalid) MOSI protocol (Modified, Owned, Shared, Invalid) MESI protocol (Modified, Exclusive, Shared, Invalid) MOESI protocol (Modified, Owned, Exclusive, Shared, Invalid)

SYNCHRONIZATION MECHANISM

Hardware Synchronization Mechanisms: Synchronization is a special form of communication where instead of data control, information is exchanged between communicating processes residing in the same or different processors. Multiprocessor systems use hardware mechanisms to implement low-level synchronization operations. Most multiprocessors have hardware mechanisms to impose atomic operations such as memory read, write or read-modify-write operations to implement some synchronization primitives. Other than atomic memory operations, some inter-processor interrupts are also used for synchronization purposes SYNCHRONIZATION MECHANISM

CACHE MAPPING TECHNIQUES

What is Cache Mapping? Cache mapping is a technique by which the contents of main memory are brought into the cache memory. Cache Mapping Techniques Cache Mapping is performed using following three different techniques – Direct Mapping Full Associative Mapping K-way Set Associative Mapping CACHE MAPPING TECHNIQUES Figure: The given diagram illustrates the mapping process

Direct Mapping Technique: In Direct mapping, each memory block is assigned to a specific line in the cache. If a line is previously taken up by a memory block when a new block needs to be loaded, the old block is trashed. An address space is split into two parts index field and a tag field. The cache is used to store the tag field whereas the rest is stored in the main memory. Direct mapping`s performance is directly proportional to the Hit ratio. DIRECT MAPPING TECHNIQUE Tag Line number Block Offset The line number of cache to which a particular block can map is given by- Cache line no. = (Main block address) modulo (No. of lines in cache) In direct mapping, the physical address is divided as - Example of Direct Mapping

Full Associative Mapping Technique: In this type of mapping, the associative memory is used to store content and addresses of the memory word. Any block can go into any line of the cache. This means that the word in bits are used to identify which word in the block is needed, but the tag becomes all of the remaining bits. This enables the placement of any word at any place in the cache memory. It is considered to be the fastest and the most flexible mapping form. FULL ASSOCIATIVE MAPPING Block No./Tag Block /Line Offset In Fully Associative Mapping the physical address is divided as - Example of Full Associative Mapping

Set Associative Mapping Technique: Set associative addresses the problem of possible thrashing in the direct mapping method. Instead of having exactly one line that a block can map to in the cache, few lines are grouped together creating a  set . Then a block in memory can map to any one of the lines of a specific set. Set-associative mapping allows that each word that is present in the cache can have two or more words in the main memory for the same index address. Set associative cache mapping combines the best of direct and associative cache mapping techniques. SET ASSOCIATIVE MAPPING Tag Set No. Block/Line Offset The set of the cache to which a particular block of the main memory can map is given by- Cache set no. = (Main memory Block address ) modulo ( No. of sets in cache ) In set associative mapping the physical address is given as – Example of 2 Way Set Associative Mapping

PAGING TECHNIQUES IN CACHE MEMORY

What is Paging Technique? Paging is a memory management scheme that eliminates the need for contiguous allocation of physical memory. This scheme permits the physical address space of a process to be non – contiguous. The mapping from virtual to physical address is done by the memory management unit (MMU) which is a hardware device and this mapping is known as paging technique. The Physical Address Space is conceptually divided into a number of fixed-size blocks, called frames. The Logical address Space is also splitted into fixed-size blocks, called pages. Page Size = Frame Size Let us consider an example: Physical Address = 12 bits, then Physical Address Space = 4 K words Logical Address = 13 bits, then Logical Address Space = 8 K words Page size = frame size = 1 K words (assumption) PAG ING TECHNIQUES

Address generated by CPU is divided into Page number(p):  Number of bits required to represent the pages in Logical Address Space or Page number Page offset(d):  Number of bits required to represent particular word in a page or page size of Logical Address Space or word number of a page or page offset. Physical Address is divided into Frame number(f):  Number of bits required to represent the frame of Physical Address Space or Frame number. Frame offset(d):  Number of bits required to represent particular word in a frame or frame size of Physical Address Space or word number of a frame or frame offset. ADDRESS SPACE IN PAG ING TECHNIQUES Main memory access time = m If page table are kept in main memory, Effective access time = m(for page table) + m(for particular page in page table)

TLB IN PAG ING TECHNIQUES The hardware implementation of page table can be done by using dedicated registers. But the usage of register for the page table is satisfactory only if page table is small. If page table contain large number of entries then we can use TLB(Translation Look-Aside Buffer), a special, small, fast look up hardware cache. The TLB is associative, high speed memory. Each entry in TLB consists of two parts: a tag and a value. When this memory is used, then an item is compared with all tags simultaneously.If the item is found, then corresponding value is returned.

PAGE TABLE IN PAG ING TECHNIQUES Page table has page table entries where each page table entry stores a frame number and optional status (like protection) bits. Many of status bits used in the virtual memory system. The most important thing in PTE is frame Number. Page table entry has the following information – Caching in Page Table: Caching enabled/disabled – Some times we need the fresh data. Let us say the user is typing some information from the keyboard and your program should run according to the input given by the user. In that case, the information will come into the main memory. Therefore main memory contains the latest information which is typed by the user. Now if you try to put that page in the cache, that cache will show the old information. So whenever freshness is required, we don’t want to go for caching or many levels of the memory. The information present in the closest level to the CPU and the information present in the closest level to the user might be different. So we want the information has to be consistency, which means whatever information user has given, CPU should be able to see it as first as possible. That is the reason we want to disable caching. So, this bit enables or disable caching of the page.

ADVANTAGES & DISADVANTAGES Advantages of Paging Disadvantages of Paging The paging technique is easy to implement. The paging technique makes efficient utilization of memory. The paging technique supports time-sharing system. The paging technique supports non-contiguous memory allocation Paging may encounter a problem called page break. When the number of pages in virtual memory is quite large, maintaining page table become hectic.

REPLACEMENT ALGORITHM

What is Replacement Algorithm? In computing, cache algorithms (also frequently called cache replacement algorithms or cache replacement policies) are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to manage a cache of information stored on the computer. Caching improves performance by keeping recent or often-used data items in memory locations that are faster or computationally cheaper to access than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for the new ones. There are two primary figures of merit of a cache: The latency, and the hit rate. There are also a number of secondary factors affecting cache performance. The "hit ratio" of a cache describes how often a searched-for item is actually found in the cache. More efficient replacement policies keep track of more usage information in order to improve the hit rate (for a given cache size). The "latency" of a cache describes how long after requesting a desired item the cache can return that item (when there is a hit). Faster replacement strategies typically keep track of less usage information—or, in the case of direct-mapped cache, no information—to reduce the amount of time required to update that information. Each replacement strategy is a compromise between hit rate and latency. REPLACEMENT ALGORITHM

Different Replacement Algorithm: Bélády's algorithm : The  most  efficient caching algorithm would be to always discard the information that will not be needed for the longest time in the future. This optimal result is referred to as  Bélády's optimal algorithm/simply optimal replacement policy or the clairvoyant algorithm. Since it is generally impossible to predict how far in the future information will be needed, this is generally not implementable in practice. The practical minimum can be calculated only after experimentation, and one can compare the effectiveness of the actually chosen cache algorithm. First in first out (FIFO) : Using this algorithm the cache behaves in the same way as a FIFO queue. The cache evicts the blocks in the order they were added, without any regard to how often or how many times they were accessed before. DIFFERENT REPLACEMENT ALGORITHM

Different Replacement Algorithm: Least-frequently used (LFU) : Counts how often an item is needed. Those that are used least often are discarded first. This works very similar to LRU except that instead of storing the value of how recently a block was accessed, we store the value of how many times it was accessed. So of course while running an access sequence we will replace a block which was used fewest times from our cache. E.g., if A was used (accessed) 5 times and B was used 3 times and others C and D were used 10 times each, we will replace B. Random replacement (RR) Randomly selects a candidate item and discards it to make space when necessary. This algorithm does not require keeping any information about the access history. For its simplicity, it has been used in ARM processors. It admits efficient stochastic simulation. Least recently used (LRU) : Discards the least recently used items first. This algorithm requires keeping track of what was used when, which is expensive if one wants to make sure the algorithm always discards the least recently used item. General implementations of this technique require keeping "age bits" for cache-lines and track the "Least Recently Used" cache-line based on age-bits. In such an implementation, every time a cache-line is used, the age of all other cache-lines changes. DIFFERENT REPLACEMENT ALGORITHM

CACHE WRITE POLICIES

CACHE WRITE POLICIES What is Cache Write Policy? Cache is a technique of storing a copy of data temporarily in rapidly accessible storage memory. Cache stores most recently used words in small memory to increase the speed in which a data is accessed. It acts like a buffer between RAM and CPU and thus increases the speed in which data is available to the processor. There are two main Cache Write Policy: 1. Write-Through policy. 2. Write-Back policy.

TYPES OF CACHE WRITE POLICIES Write through policy: Write-through policy is the most commonly used methods of writing into the cache memory. In write-through method when the cache memory is updated simultaneously the main memory is also updated. Thus at any given time, the main memory contains the same data which is available in the cache memory. It is to be noted that, write-through technique is a slow process as everytime it needs to access main memory. Write back policy: Write-back policy can also be used for cache writing. During a write operation only the cache location is updated while following write-back method. When update in cache occurs then updated location is marked by a flag. The flag is known as modified or dirty bit. When the word is replaced from the cache, it is written into main memory if its flag bit is set. The logic behind this technique is based on the fact that during a cache write operation, the word present in the cache may be accessed several times. This method helps reduce the number of references to main memory.

CACHE MEMORY HEIRARCHY

CACHE MEMORY HEIRARCHY Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape As one goes down the hierarchy Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of the memory by the processor – locality of reference

CACHE MEMORY HEIRARCHY Semiconductor Memory Read Only Memory (ROM) RAM – Random Access Memory Misnamed as all semiconductor memory is random access Read/Write Volatile Temporary storage Two main types: Static or Dynamic Permanent storage Microprogramming Library subroutines Systems programs (BIOS) Function tables

QUESTIONS ON CACHE MEMORY

QUESTIONS The number successful access to memory stated as a fraction is called as _____. A cache line is 64 bytes. The main memory has latency 32ns and bandwidth 1 GBytes /s. The time required to fetch the entire cache line from the main memory is _____? Consider a 4-way set associative mapped cache with block size 4 KB. The size of main memory is 16 GB and there are 10 bits in the tag. Find size of cache memory. A computer has a 256 KByte , 4-way set associative, write back data cache with the block size of 32 Bytes. The processor sends 32-bit addresses to the cache controller. Each cache tag directory entry contains, in addition, to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. The number of bits in the tag field of an address is a) 11 b) 14 c) 16 d) 27 Consider a direct mapped cache of size 512 KB with block size 1 KB. There are 7 bits in the tag. Find size of main memory. a) 32 MB b) 60 MB c) 64 KB d) 64 MB

QUESTIONS Memory management technique in which system stores and retrieves data from secondary storage for use in main memory is called ______ a ) Fragmentation b) paging Mapping d) none of the mentioned The address of a page table in memory is pointed by ____________ a) stack pointer b) page table base register c) page register d) program counter The page table contains ____________ base address of each page in physical memory page offset page sized none of the mentioned  

QUESTIONS Operating System maintains the page table for ____________ a) each process b) each thread c) each instruction d) each address The LRU provides very bad performance when it comes to a) Blocks being accessed is sequential b) When the blocks are randomized c) When the consecutive blocks accessed are in the extremes d) None of the mentioned The algorithm which removes the recently used page first is ________ a) LRU b) MRU c) OFM d) None of the mentioned

T HANK Y OU ! ONLINE PRESENTATION BY SUBID BISWAS MAIL: [email protected]