UNIT II samiya patel random use pls.pptx

William Stallings Computer Organization and Architecture 10 th Edition

UNIT II Computer Memory Organization

Lecture 9

Computer memory exhibits the widest range of type, technology, organization, performance, and cost of any feature of a computer system. No single technology is optimal in satisfying the memory requirements for a computer system. As a consequence, the typical computer system is equipped with a hierarchy of memory subsystems, some internal to the system (directly accessible by the processor) and some external (accessible by the processor via an I/O module).

Table 4.1 Key Characteristics of Computer Memory Systems

Characteristics of Memory Systems Location Refers to whether memory is internal and external to the computer Internal memory is often equated with main memory Processor requires its own local memory, in the form of registers Cache is another form of internal memory External memory consists of peripheral storage devices that are accessible to the processor via I/O controllers Capacity Memory is typically expressed in terms of bytes or words Unit of transfer For internal memory the unit of transfer is equal to the number of electrical lines into and out of the memory module. This may be equal to the word length, but is often larger, such as 64, 128, or 256 bits.

To clarify this point, consider three related concepts for internal memory: • Word: The “natural” unit of organization of memory. The size of a word is typically equal to the number of bits used to represent an integer and to the instruction length. Unfortunately, there are many exceptions. For example, the CRAY C90 (an older model CRAY supercomputer) has a 64-bit word length but uses a 46-bit integer representation. The Intel x86 architecture has a wide variety of instruction lengths, expressed as multiples of bytes, and a word size of 32 bits. • Addressable units: In some systems, the addressable unit is the word. However, many systems allow addressing at the byte level. In any case, the relationship between the length in bits A of an address and the number N of addressable units is 2 A = N. • Unit of transfer: For main memory, this is the number of bits read out of or written into memory at a time. The unit of transfer need not equal a word or an addressable unit. For external memory, data are often transferred in much larger units than a word, and these are referred to as blocks

Method of Accessing Units of Data

Capacity and Performance:

Memory The most common forms are: Semiconductor memory Magnetic surface memory Optical Magneto-optical Several physical characteristics of data storage are important: Volatile memory Information decays naturally or is lost when electrical power is switched off Nonvolatile memory Once recorded, information remains without deterioration until deliberately changed No electrical power is needed to retain information Magnetic-surface memories Are nonvolatile Semiconductor memory (memory on integrated circuits) May be either volatile or nonvolatile Nonerasable memory Cannot be altered, except by destroying the storage unit Semiconductor memory of this type is known as read-only memory (ROM) For random-access memory the organization is a key design issue Organization refers to the physical arrangement of bits to form words

Memory Hierarchy Design constraints on a computer’s memory can be summed up by three questions: How much?, how fast?, how expensive? There is a trade-off among capacity, access time, and cost Faster access time, greater cost per bit Greater capacity, smaller cost per bit Greater capacity, slower access time The way out of the memory dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy

Figure 4.1. As one goes down the hierarchy, the following occur: a. Decreasing cost per bit b. Increasing capacity c. Increasing access time d. Decreasing frequency of access of the memory by the processor

Suppose that the processor has access to two levels of memory. Level 1 contains 1000 words and has an access time of 0.01 μs ; level 2 contains 100,000 words and has an access time of 0.1 μs . Assume that if a word to be accessed is in level 1, then the processor accesses it directly. If it is in level 2, then the word is first transferred to level 1 and then accessed by the processor. For simplicity, we ignore the time required for the processorto determine whether the word is in level 1 or level 2. Figure 4.2 shows the general shape of the curve that covers this situation. The figure shows the average access time to a two-level memory as a function of the hit ratio H , where H is defined as the fraction of all memory accesses that are found in the faster memory (e.g., the cache), T 1 is the access time to level 1, and T 2 is the access time to level 2. As can be seen, for high percentages of level 1 access, the average total access time is much closer to that of level 1 than that of level 2.

In our example, suppose 95% of the memory accesses are found in level 1. Then the average time to access a word can be expressed as (0.95)(0.01 μs ) + (0.05)(0.01 μs + 0.1 μs ) = 0.0095 + 0.0055 = 0.015 μs The average access time is much closer to 0.01 μs than to 0.1 μs , as desired.

Accordingly, it is possible to organize data across the hierarchy such that the percentage of accesses to each successively lower level is substantially less than that of the level above. Consider the two-level example already presented. Let level 2 memory contains all program instructions and data. The current clusters can be temporarily placed in level 1. From time to time, one of the clusters in level 1 will have to be swapped back to level 2 to make room for a new cluster coming in to level 1. On average, however, most references will be to instructions and data contained in level 1.

This principle can be applied across more than two levels of memory, as suggested by the hierarchy shown in Figure 4.1. The fastest, smallest, and most expensive type of memory consists of the registers internal to the processor. Typically, a processor will contain a few dozen such registers, although some machines contain hundreds of registers. Main memory is the principal internal memory system of the computer. Each location in main memory has a unique address. Main memory is usually extended with a higher-speed, smaller cache. The cache is not usually visible to the programmer or, indeed, to the processor. It is a device for staging the movement of data between main memory and processor registers to improve performance.

Memory The use of three levels exploits the fact that semiconductor memory comes in a variety of types which differ in speed and cost Data are stored more permanently on external mass storage devices(hard disk and removable media, such as removable magnetic disk, tape, and optical storage. ) External, nonvolatile memory is also referred to as secondary memory or auxiliary memory These are used to store program and data files and are usually visible to the programmer only in terms of files and records, as opposed to individual bytes or words. Disk cache A portion of main memory can be used as a buffer to hold data temporarily that is to be read out to disk A few large transfers of data can be used instead of many small transfers of data Data can be retrieved rapidly from the software cache rather than slowly from the disk

Lecture 10

The first kind of memory inside our computer is the non-volatile kind of memory. Like a hard disk drive. So, this hard disk drive contains mechanically rotating fast disk, which is also known as platters. The typical rotating speed of the disk is either 5400 or 7200 RPM. So, as this hard disk drive contains a mechanically rotating part, so in terms of the speed, they are quite slow.

The typical read or write speed with this hard disk drive is in the range of 80 to 150 MBPS. But the clock speed of the CPU, it is in the range of 1GHz up to the 4 GHz. So, even though the processor is capable of accepting a data much higher speed, but it can't get the data through this hard disk drive at the same rate.

Here, Random Access Memory(RAM) comes in picture. It is faster compared to the hard disk drive. So, this RAM can provide data to the CPU at a much faster rate compared to the Hard disk drive. So, the typical speed of the RAM is in the range of 400 MHz up to the 800 MHz. But still, it slower than the clock speed of the CPU.

The newer generations of the RAM like DDR3 and DDR4, can operate at much faster rate like 1600 MHz or up to the 2100 MHz. They are comparable to the speed of the CPU. But still this newer version of RAM can not supply data to the CPU at the required rate. As,the modern day processors are not single core processor but they are multicore processors.

So, if all the cores are asking for data at the same time, then this RAM can't be able to deliver the data to all the cores at the same time. And that is where this cache memory comes into the picture. So, this cache memory is fastest among all memories.

This cache memory is also Random Access Memory but it is special kind of RAM, which is known as Static RAM. So, unlike the Dynamic RAM, which we find is inside the normal RAM, this static RAM is quite fast. And the typical size of this cache memory is in the range of Kilo Bytes up to the Few Mega Bytes. So, the data which is frequently required by the CPU can be supplied by this cache memory. So, generally, some instructions and data which is required by the CPU are stored in this cache memory.

Cache memory is designed to combine the memory access time of expensive, high-speed memory combined with the large memory size of less expensive, lower-speed memory. The concept is illustrated in Figure 4.3a. There is a relatively large and slow main memory together with a smaller, faster cache memory. The cache contains a copy of portions of main memory. When the processor attempts to read a word of memory, a check is made to determine if the word is in the cache. If so, the word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of words, is read into the cache and then the word is delivered to the processor. Because of the phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that there will be future references to that same memory location or to other words in the block.

L1 Cache The first level of a cache memory is known as Level 1 cache or L1 cache. In this level 1 cache, tiny amount of memory is integrated inside the CPU itself. And all the cores inside CPU have their individual level 1 cache. So, as this level 1 cache integrated inside the CPU, so they can operate at the same speed of the CPU. And that is the reason, level 1 cache is fastest among all caches. The typical memory size of this level 1 cache is 2KB up to the 64 KB. Now inside this level 1 cache, there is two kind of cache. The one is instruction cache and the second is data cache. The instruction cache stores all instructions which are required by the CPU and the data cache stores the data which is required by the CPU.

L2 Cache This cache is known as either Level 2 cache or L2 cache. Level 2 cache could be either inside the CPU or it could be outside the CPU. This level 2 cache could be separate for all the cores or it could be shared between all cores of the CPU. If this level 2 cache is outside the CPU, then it is connected with a very high-speed bus with the CPU. The memory size of this level 2 cache is in the range of 256 KB to the 512 KB. But in terms of the speed, they are slower than the L1 cache.

The third kind of cache is known as Level 3 cache. So, all the processors do not have this level 3 cache, but some higher end processors have this level 3 cache. And this level 3 cache is used to enhance the performance of this level 1 and level 2 cache. This level 3 cache is shared among all cores And it is outside the CPU. The memory size of this level 3 cache is in the range of 1 MB up to the 8 MB for high end processors. But it is slower than the L1 and L2 cache. But still, it is faster than the Random Access Memory or RAM.

Figure 4.3b depicts the use of multiple levels of cache.

Whenever CPU needs some data then, first of all, it will look inside the L1 cache. If it will not find inside the L1 cache then it will look inside the L2 cache. And if data is not inside the L2 cache then it will look into the L3 cache. So, if CPU finds the data inside the cache memory then it is known as a cache hit . But if data is not unavailable inside the cache then it is known as a cache miss .

The performance of cache memory is frequently measured in terms of a quantity called Hit ratio. Hit Ratio(H) = hit / (hit + miss) = no. of hits/total accesses Miss Ratio = miss / (hit + miss) = no. of miss/total accesses = 1 - hit ratio(H)

Suppose if data is not available in the L3 cache, the CPU will look inside the Random Access Memory or RAM. And still, if it doesn't find any data inside the RAM then it will get that data from the Hard Disk Drive. When you are starting your computer for the first time or you are opening some application for the first time, then that data will not be available in either cache memory or in RAM. At that time CPU will get that data from the hard disk drive. But, once the application is open and then after if it requires some data then CPU will get that data from either RAM or cache memory. If this cache memory is so fast then why the size of this cache memory is only in Mega Bytes. So, the reason is that this cache memory is much much costlier than the other memories. And that is the reason you find only a few Mega Bytes of cache memory inside the CPU.

Lecture 11

Secondary memory-main memory

Main memory-Cache

Cache Mapping There are three different types of mapping used for the purpose of cache memory which is as follows: Direct Mapping Associative Mapping Set-Associative Mapping

1. Direct Mapping The simplest technique, known as direct mapping, maps each block of main memory into only one possible cache line. Or In Direct mapping, assign each memory block to a specific line in the cache. If a line is previously taken up by a memory block when a new block needs to be loaded, the old block is trashed.

To find line numer in cache for a block i = j modulo m where i = cache line number j = main memory block number m = number of lines in the cache Cache line number = (Address of the Main Memory Block ) Modulo (Total number of lines in Cache)

Example For block 0, 0 mod 3 =0 i.e block 0 will be placed in line 0.Similarly, 1 mod 3 =1 2 mod 3 =2 3 mod 3 =0 4 mod 3=1 5 mod 3=2 6 mod 3 =0 and so on.

We need to shift main memory data to Cache. So we need to find at which line we should store block data?

Because there are more blocks than lines, an individual line cannot be uniquely and permanently dedicated to a particular block. E.g on Line 3, block no 3,7,11 and 15 can be stored. Here then,how to identify at particular time ,exactly which block is stored on Line 3? For this purpose Tag bits are used. Thus, each line includes a tag that identifies which particular block is currently being stored on cache. Tag bit also indicates how many blocks can be accommodated on a single line. In this case as tag is of 2 bits it means line can have 4 blocks (obviously one at a time)

Direct mapping So in direct mapping , For purposes of cache access, each main memory address can be viewed as consisting of three fields.

The Need for Replacement Algorithm In the case of direct mapping, There is no requirement for a replacement algorithm. It is because the block of the main memory would be able to map to a certain line of the cache only. Thus, the incoming (new) block always happens to replace the block that already exists, if any, in this certain line.

Advantages of direct mapping Simplest type of mapping Fast as only tag field matching is required while searching for a word. It is comparatively less expensive than associative mapping. Needs only one comparison because of using direct formula to get the effective cache address. Search time is less here because there is one possible location in the cache organization for each block from main memory. It has least number of tag bits.

Disadvantages of direct mapping Direct mapping is easy as simple arithmetic is required and every block maps to exact line in cache. But what happens when program uses addreses like 1,5,1,5 …then each access will result in a cache miss and a load into cache Line 1. This cache has 3 lines, but direct mapping might not let us use all of them. This can result in more misses. It is called as conflict miss B1 , B5 , B1 , B5…… Line 0 Line 1 Line 2

example Main memory size=128 words Block size = 4 words Cache size = 16 words Find Block size and line size. How many bits required to represent P.A? 0001010 -> Map to MM and Cache.

Lecture 12

Associative Mapping Fully Associative Mapping refers to a technique of cache mapping that allows mapping of the main memory block to a freely available cache line. Also, a fully associative cache would permit the storage of data in any cache block. There would be no forcing of every memory address into a single particular block. In case the data and information are fetched from memory, then they can be placed in an unused block of a cache.

Every single line of cache is available freely. Thus, any main memory block can map to a line of the cache. In case all the cache lines are occupied, one of the blocks that exists already needs to be replaced.

The Need for Replacement Algorithm In the case of fully associative mapping, The replacement algorithm is always required. The replacement algorithm suggests a block that is to be replaced whenever all the cache lines happen to be occupied. So, replacement algorithms such as LRU Algorithm, FCFS Algorithm, etc., are employed.

Division of Physical Address In case of fully associative, blocks can be placed on any line so line no is not required in P.A. and hence all its bits can be used as Tag bits Tag Word-offset Cache Memory Direct Asso.

Example

To map the memory address to cache In this mapping any block can be placed at any line no in cache which is free,in increases hit ratio.

P.A For 64 word size memory, (2 6 ) i.e 6 bits are required to specify address. Out of 2 LSB bits represent block 0ffset(0-3) and 4 MSB bits represent block number(0-15). Here,we don’t need to fix line number as in case of direct mapping. So PA is divided only in two parts.Block no will become Tag now. Tag is very important as they tell which block is present in which line no. at a particular moment of time. So tag will be of 4 bits to represent 16 blocks can be placed on line Block no Block offset

Advantages It is fast. Easy to implement The mapping of the main memory block can be done with any of the cache block.

Disadvantage Needs comparison with all tag bits, i.e., the cache control logic must examine every block’s tag for a match at the same time in order to determine that a block is in the cache/not. It has the greatest number of tag bits. Expensive because it needs to store address along with the data.

Direct Mapping badly suffers from conflict mess Associative mapping was too flexible because we could map any of the main memory block to any of the cache lines. This solved the conflict miss problem definitely however due to the need of the comparators for every single cache lign the cost of the implementation became too high. I n these circumstances the developers had to come up with a new caching memory mapping technique combining the advantages of strict direct memory mapping where easy retrieval of the main memory block from the cache was guaranteed and the flexible associative mapping which had strong durability against conflict misses and this is nothing but set associative

In this concept the lines of the cache memory are subdivided into sets therefore any of the main memory block if it is mapped onto a cache line it gains the flexibility of being mapped on to any of the lines belonging to that specific set.

suppose the block number zero of the main memory is mapped to cache line number zero now as the 0th line belongs to this particular set so the main memory block number 0 has more than one mapping option because apart from line number 0 it can also be mapped onto either line number 1 or 2. In other words main memory block number zero can be mapped onto any of the cache lines belonging to this particular set. so set associative mapping becomes flexible because the organization provides mapping options.

Now coming to the cache sets. The sets are numbered as set 0 set 1 and so on. All the sets are of equal size. The way is specified based on how many lines in a set that means if one set contains k number of lines (1set= k lines) we call it k way set associative cache. Now this concept is borrowed from direct mapping policy for set associative mapping technique

Direct mapping for PA Set asso . mapping for PA Tag Line no Word-offset (3 bits ) (2 bit) (2 bit) Tag Set no Word-offset (3 bits ) (2 bit) (2 bit) Cache line number = (Address of the Main Memory Block ) Modulo (Total number of lines in Cache) Cache line number = (Address of the Main Memory Block ) Modulo (Total number sets in a Cache)

Now think about this situation, in order to find out the block number 0 or block number 4 we need not search in the entire cache rather we only need to search inside the set number 0. so we will only need two different comparators of three bits

Lecture 13

Replacement policies/algorithms caches are a small subset when compared to the size of main memory. Because of the limited size of a cache, the cache also needs to follow a policy that governs when stored data is subject to be replaced with incoming fresh data. This type of policy is better known as the cache replacement policy.

A cache replacement policy or cache eviction algorithm governs the circumstances under which data is subject to being overwritten. The cache eviction policy is necessary because caches tend to be substantially smaller than the main memory to stay cost-effective and efficient. There are several popular methodologies for determining when to replace data stored in a cache system. Some cache replacement policies Random First in first out (FIFO) Last in first out (LIFO) Least recently used (LRU cache) Least frequently used (LFU cache) Most recently used (MRU cache)

Random

FIFO FIFO policy is similar to QUEUE. In a First in First Out or FIFO replacement policy, any item that first got added to the cache is also the first item to be evicted when there is not enough remaining space. The cache evicts blocks according to the order in which they were added, without regard to how often or how many times they were accessed before.

FIFO-example

LIFO LIFO policy is similar to STACK. In a Last in First Out or LIFO replacement policy, any item that last got added to the cache is also the first item to be evicted when there is not enough remaining space.

Least Recently Used (LRU cache) This style of cache is more interesting and common in a real-world setting. The Least Recently Used or (LRU Cache) keeps track of the items added to the cache, and when they get used. The assumption is that items that haven’t been accessed for a longer time are less likely to be used in the near future. LRU maintains a record of the order in which items are accessed, and when the cache is full, it evicts the item that hasn’t been accessed for the longest period.

b2 is hit,just update list b3 is hit,just update list

b1 is hit,just update list b5 is a miss and needs replacement, so LRU is b4 .just replace it and update list

b6 is a miss and needs replacement, so LRU is b2 .just replace it and update list

LFU (Least Frequently Used)

Initially,all blocks will have frequency 0. Once they are referred in cache line we can update the frequency.

MRU

Lecture 14

General Figure 4.4 depicts the structure of a cache/main- memory system. Main memory consists of up to 2 n addressable words, with each word having a unique n-bit address. For mapping purposes, this memory is considered to consist of a number of fixed- length blocks of K words each. That is, there are M = 2 n /K blocks in main memory. The cache consists of m blocks, called lines. Each line contains K words, plus a tag of a few bits. Each line also includes control bits (not shown), such as a bit to indicate whether the line has been modified since being loaded into the cache.

The length of a line, not including tag and control bits, is the line size. The line size may be as small as 32 bits, with each “word” being a single byte; in this case the line size is 4 bytes. The number of lines is considerably less than the number of main memory blocks (m << M). At any time, some subset of the blocks of memory resides in lines in the cache. If a word in a block of memory is read, that block is transferred to one of the lines of the cache. Because there are more blocks than lines, an individual line cannot be uniquely and permanently dedicated to a particular block. Thus, each line includes a tag that identifies which particular block is currently being stored. The tag is usually a portion of the main memory address

Figure 4.5 illustrates the read operation. The processor generates the read address (RA) of a word to be read. If the word is contained in the cache, it is delivered to the processor. Otherwise, the block containing that word is loaded into the cache, and the word is delivered to the processor.

In this organization, the cache connects to the processor via data, control, and address lines. The data and address lines also attach to data and address buffers, which attach to a system bus from which main memory is reached. When a cache hit occurs, the data and address buffers are disabled and communication is only between processor and cache, with no system bus traffic. When a cache miss occurs, the desired address is loaded onto the system bus and the data are returned through the data buffer to both the cache and the processor. In other organizations, the cache is physically interposed between the processor and the main memory for all data, address, and control lines. In this latter case, for a cache miss, the desired word is first read into the cache and then transferred from cache to processor.

Write Policy

What are Cache Write Policies? Cache write policies dictate how data modifications in the cache are propagated to the main memory. The primary goal of these policies is to balance performance and data consistency. They determine when and how the changes made to the cached data are written back to the main memory.

Cache Write Policies 1. Write-Through 2. Write-Back 3. Write-Around

Write-Through In the write-through policy, data is written to both the cache and the main memory simultaneously. This ensures data consistency between the cache and main memory at the cost of higher write latency. This policy is easy to implement in the architecture but is not as efficient since every write to cache is a write to the slower main memory.

Subsequent updates on X will be immediately written to both Cache and MM

Advantages Ensures data consistency. Simplifies the process of maintaining coherency between cache and memory. Disadvantages Slower write operations due to simultaneous writes. Increased memory bandwidth usage.

Write-Back The write-back policy, also known as write-behind, allows data to be written only to the cache initially. The modified data is written to the main memory at a later time, either when the cache block is evicted or at specific intervals. This policy provides low latency that other policies because data is written in the cache storage and "eventually", in the the source storage. This means, I/O will happen in the background and won't block for completion.

Write-Back

Subsequent updates on X will be written only to Cache and then eventually to MM

Advantages Faster write operations as writes are initially only to the cache. Reduced memory bandwidth usage. Disadvantages Requires mechanisms to ensure data consistency. Complexity in handling cache coherency.

Write-Around Policy In the write-around policy, data is written directly to the main memory, bypassing the cache. The cache is only updated if the same data is read again. This policy provides the best outcome in case we expect data to be accessed soon so data is written only in the source storage and there is no-write allocation.

Advantages Reduces cache pollution with infrequently accessed data. Suitable for workloads with low write locality. Disadvantages May lead to slower write operations. Potential for cache misses on subsequent reads.

Line Size

Multilevel Caches As logic density has increased it has become possible to have a cache on the same chip as the processor The on-chip cache reduces the processor’s external bus activity and speeds up execution time and increases overall system performance When the requested instruction or data is found in the on-chip cache, the bus access is eliminated On-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycles During this period the bus is free to support other transfers Two-level cache: Internal cache designated as level 1 (L1) External cache designated as level 2 (L2) Potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and L2 caches The use of multilevel caches complicates all of the design issues related to caches, including size, replacement algorithm, and write policy

Unified cache and a split cache The main difference between a unified cache and a split cache is how they store data and instructions: Unified cache: Stores both instructions and data in the same cache. Split cache: Has separate caches for instructions and data, which work in parallel.

Unified Versus Split Caches Has become common to split cache: One dedicated to instructions One dedicated to data Both exist at the same level, typically as two L1 caches Advantages of unified cache: Higher hit rate Balances load of instruction and data fetches automatically Only one cache needs to be designed and implemented Trend is toward split caches at the L1 and unified caches for higher levels Advantages of split cache: Eliminates cache contention between instruction fetch/decode unit and execution unit Important in pipelining

Table 4.2 Elements of Cache Design

Table 4.3 Cache Sizes of Some Processors a Two values separated by a slash refer to instruction and data caches. b Both caches are instruction only; no data caches. (Table can be found on page 134 in the textbook.)

Table 4.4 Intel Cache Evolution (Table is on page 150 in the textbook.)

Lecture 15

Cache Addresses Virtual memory Facility that allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available When used, the address fields of machine instructions contain virtual addresses For reads to and writes from main memory, a hardware memory management unit (MMU) translates each virtual address into a physical address in main memory Virtual Memory

One solution is that Main memory could be expanded, and so be able to accommodate more processes. But there are two flaws in this approach. First, main memory is expensive, even today. Second, the appetite of programs for memory has grown as fast as the cost of memory has dropped. So larger memory results in larger processes, not more processes. When program memory requirements are more than available main memory-

Another solution is swapping, depicted in Figure 8.12. We have a long-term queue of process requests, typically stored on disk. These are brought in, one at a time, as space becomes available. As processes are completed, they are moved out of main memory. Swapping, however, is an I/O operation, and therefore there is the potential for making the problem worse, not better. But because disk I/O is generally the fastest I/O on a system (e.g., compared with tape or printer I/O), swapping will usually enhance performance. A more sophisticated scheme, involving virtual memory, improves performance over simple swapping.

The simplest scheme for partitioning available memory is to use fixed-size partitions, as shown in Figure 8.13. Note that, although the partitions are of fixed size, they need not be of equal size. When a process is brought into memory, it is placed in the smallest available partition that will hold it. Even with the use of unequal fixed-size partitions, there will be wasted memory. In most cases, a process will not require exactly as much memory as provided by the partition. For example, a process that requires 3M bytes of memory would be placed in the 4M partition of Figure 8.13b, wasting 1M that could be used by another process. A more efficient approach is to use variable-size partitions. When a process is brought into memory, it is allocated exactly as much memory as it requires and no more.

Logical address - expressed as a location relative to the beginning of the program Physical address - an actual location in main memory Base address - current starting location of the process

An example, using 64 Mbytes of main memory, is shown in Figure 8.14. Initially, main memory is empty, except for the OS (a). The first three processes are loaded in, starting where the OS ends and occupying just enough space for each process (b, c, d). This leaves a “hole” at the end of memory that is too small for a fourth process. At some point, none of the processes in memory is ready. The OS swaps out process 2 (e), which leaves sufficient room to load a new process, process 4 (f). Because process 4 is smaller than process 2, another small hole is created.

Later, a point is reached at which none of the processes in main memory is ready, but process 2, in the Ready- Suspend state, is available. Because there is insufficient room in memory for process 2, the OS swaps process 1 out (g) and swaps process 2 back in (h).

Before we consider ways of dealing with the shortcomings of partitioning, we must clear up one loose end. Consider Figure 8.14; it should be obvious that a process is not likely to be loaded into the same place in main memory each time it is swapped in. Furthermore, if compaction is used, a process may be shifted while in main memory. A process in memory consists of instructions plus data. The instructions will contain addresses for memory locations of two types: • Addresses of data items • Addresses of instructions, used for branching instructions

But these addresses are not fixed. They will change each time a process is swapped in. To solve this problem, a distinction is made between logical addresses and physical addresses. A logical address is expressed as a location relative to the beginning of the program. Instructions in the program contain only logical addresses. A physical address is an actual location in main memory. When the processor executes a process, it automatically converts from logical to physical address by adding the current starting location of the process, called its base address, to each logical address. This is another example of a processor hardware feature designed to meet an OS requirement. The exact nature of this hardware feature depends on the memory management strategy in use.

Both unequal fixed-size and variable-size partitions are inefficient in the use of memory. Suppose, however, that memory is partitioned into equal fixed-size chunks that are relatively small, and that each process is also divided into small fixed-size chunks of some size. Then the chunks of a program, known as pages, could be assigned to available chunks of memory, known as frames, or page frames. At most,then , the wasted space in memory for that process is a fraction of the last page.

Figure 8.15 shows an example of the use of pages and frames. At a given point in time, some of the frames in memory are in use and some are free. The list of free frames is maintained by the OS. Process A, stored on disk, consists of four pages. When it comes time to load this process, the OS finds four free frames and loads the four pages of the process A into the four frames.

Now suppose, as in this example, that there are not sufficient unused contiguous frames to hold the process. Does this prevent the OS from loading A? The answer is no, because we can once again use the concept of logical address. A simple base address will no longer suffice. Rather, the OS maintains a page table for each process. The page table shows the frame location for each page of the process. Within the program, each logical address consists of a page number and a relative address within the page. Recall that in the case of simple partitioning, a logical address is the location of a word relative to the beginning of the program; the processor translates that into a physical address. With paging, the logical-to- physical address translation is still done by processor hardware. The processor must know how to access the page table of the current process. Presented with a logical address (page number, relative address), the processor uses the page table to produce a physical address (frame number, relative address). An example is shown in Figure 8.16.

This approach solves the problems raised earlier. Main memory is divided into many small equal-size frames. Each process is divided into frame-size pages: smaller processes require fewer pages, larger processes require more. When a process is brought in, its pages are loaded into available frames, and a page table is set up.

Virtual Memory Each page of a process is brought in only when it is needed Principle of locality When working with a large process execution may be confined to a small section of a program (subroutine) It is better use of memory to load in just a few pages If the program references data or branches to an instruction on a page not in main memory, a page fault is triggered which tells the OS to bring in the desired page Advantages: More processes can be maintained in memory Time is saved because unused pages are not swapped in and out of memory Disadvantages: When one page is brought in, another page must be thrown out ( page replacement) If a page is thrown out just before it is about to be used the OS will have to go get the page again Thrashing When the processor spends most of its time swapping pages rather than executing instructions Demand Paging

The basic mechanism for reading a word from memory involves the translation of a virtual, or logical, address, consisting of page number and offset, into a physical address, consisting of frame number and offset, using a page table. Because the page table is of variable length, depending on the size of the process, we cannot expect to hold it in registers. Instead, it must be in main memory to be accessed.

When virtual addresses are used, the system designer may choose to place the cache between the processor and the MMU or between the MMU and main memory (Figure 4.7). A logical cache, also known as a virtual cache, stores data using virtual addresses. The processor accesses the cache directly, without going through the MMU. A physical cache stores data using main memory physical addresses.

Segmentation Usually visible to the programmer Provided as a convenience for organizing programs and data and as a means for associating privilege and protection attributes with instructions and data Allows the programmer to view memory as consisting of multiple address spaces or segments Advantages: Simplifies the handling of growing data structures Allows programs to be altered and recompiled independently without requiring that an entire set of programs be re-linked and re-loaded Lends itself to sharing among processes Lends itself to protection

Lecture 16

In earlier computers, the most common form of random-access storage for computer main memory employed an array of doughnut-shaped ferromagnetic loops referred to as cores. Hence, main memory was often referred to as core. Today, the use of semiconductor chips for main memory is almost universal.

The basic element of a semiconductor memory is the memory cell. Although a variety of electronic technologies are used, all semiconductor memory cells share certain properties: • They exhibit two stable (or semistable ) states, which can be used to represent binary 1 and 0. • They are capable of being written into (at least once), to set the state. • They are capable of being read to sense the state.

Figure 5.1 depicts the operation of a memory cell. Most commonly, the cell has three functional terminals capable of carrying an electrical signal. The select terminal, as the name suggests, selects a memory cell for a read or write operation. The control terminal indicates read or write. For writing, the other terminal provides an electrical signal that sets the state of the cell to 1 or 0. For reading, that terminal is used for output of the cell’s state. The details of the internal organization, functioning, and timing of the memory cell depend on the specific integrated circuit technology used

random access memory Random access means individual words of memory are directly accessed through wired-in addressing logic. The most common is referred to as random-access memory (RAM). This is, in fact, a misuse of the term, because all of the types are random access.

volatile and non-volatile memory Non-volatile memory is a type of computer memory that is used to retain stored information during power is removed. It is less expensive than volatile memory. It has a large storage capacity. Whereas volatile memory is a temporary memory. In this memory, the data is stored till the system is capable of, but once the power of the system is turned off the data within the volatile memory is deleted automatically.

Read Only Memory (ROM) Contains a permanent pattern of data that cannot be changed or added to No power source is required to maintain the bit values in memory Data or program is permanently in main memory and never needs to be loaded from a secondary storage device Data is actually wired into the chip as part of the fabrication process Disadvantages of this: No room for error, if one bit is wrong the whole batch of ROMs must be thrown out Data insertion step includes a relatively large fixed cost

Read-Only Memory (ROM) Read-only memory, or ROM, is a type of computer storage containing non-volatile, permanent data that, normally, can only be read, not written to. ROM contains the programming that allows a computer to start up or regenerate each time it is turned on. ROM also performs large input/output (I/O) tasks and protects programs or software instructions. Once data is written on a ROM chip, it cannot be removed. Almost every computer incorporates a small amount of ROM that contains the start-up firmware. This boot firmware is called the basic input/output system (BIOS). This software consists of code that instructs the boot-up processes for the computer -- such as loading the operating system (OS) into the random access memory (RAM) or running hardware diagnostics. Consequently, ROM is most often used for firmware updates.

However, ROM is also utilized in video game consoles, allowing one system to run various games. Additionally, ROM is used in optical storage, including different kinds of compact discs (CD) -- such as CD-ROM and CD-RW. ROM is also used frequently in calculators and peripheral devices like laser printers, whose fonts are commonly stored in ROM.

Types of ROM MROM (Masked read-only memory): We know that ROM is as old as semiconductor technology. MROM was the very first ROM that consists of a grid of word lines and bit lines joined together transistor switches. This type of ROM data is physically encoded in the circuit and only be programmed during fabrication. It was not so expensive. PROM (Programmable read-only memory): PROM is a form of digital memory. The data stored in it are permanently stored and can not be changed or erasable. It is used in low-level programs such as firmware or microcode.

EPROM (Erasable programmable read-only memory): EPROM also called EROM, is a type of PROM but it can be reprogrammed. The data stored in EPROM can be erased and reprogrammed again by ultraviolet light. Reprogrammed of it is limited. Before the era of EEPROM and flash memory, EPROM was used in microcontrollers. EEPROM (Electrically erasable programmable read-only memory): As its name refers, it can be programmed and erased electrically. The data and program of this ROM can be erased and programmed about ten thousand times. The duration of erasing and programming of the EEPROM is near about 4ms to 10ms. It is used in microcontrollers and remote keyless systems.

Read-Mostly Memory

Advantages of ROM ROM provides the necessary instructions for communication between various hardware components. It is essential for the storage and operation of the BIOS, but it can also be used for basic data management, to hold software for basic processes of utilities and to read and write to peripheral devices. Its static nature means it does not require refreshing. It is easy to test. ROM is more reliable than RAM since it is non-volatile in nature and cannot be altered or accidentally changed. The contents of the ROM can always be known and verified. Less expensive than RAM.

Disadvantages of ROM It is a read-only memory, so it cannot be modified. It is slower as compared to RAM.

RAM It is one of the parts of the Main memory, also famously known as Read Write Memory. Random Access memory is present on the motherboard and the computer’s data is temporarily stored in RAM. As the name says, RAM can help in both Read and write. RAM is a volatile memory, which means, it is present as long as the Computer is in the ON state, as soon as the computer turns OFF, the memory is erased.

Features of RAM RAM is volatile in nature, which means, the data is lost when the device is switched off. RAM is known as the Primary memory of the computer. RAM is known to be expensive since the memory can be accessed directly. RAM is the fastest memory, therefore, it is an internal memory for the computer. The speed of computer depends on RAM, say if the computer has less RAM, it will take more time to load and the computer slows down.

Types of RAM RAM is further divided into two types, SRAM – Static Random Access Memory and DRAM - Dynamic Random Access Memory.

Static Random Access Memory SRAM is used for Cache memory, it can hold the data as long as the power availability is there. It is refreshed simultaneously to store the present information. It is made with CMOS technology. It contains 4 to 6 transistors and it also uses clocks. It does not require a periodic refresh cycle due to the presence of transistors. Although SRAM is faster, it requires more power and is more expensive in nature. Since SRAM requires more power, more heat is lost here as well, another drawback of SRAM is that it can not store more bits per chip, for instance, for the same amount of memory stored in DRAM, SRAM would require one more chip.

Dynamic RAM (DRAM) DRAM Made with cells that store data as charge on capacitors Presence or absence of charge in a capacitor is interpreted as a binary 1 or 0 Because capacitors have a natural tendency to discharge ,it r equires periodic charge refreshing to maintain data storage The term dynamic refers to tendency of the stored charge to leak away, even with power continuously applied

Dynamic RAM (DRAM) DRAM is used for the Main memory, it has a different construction than SRAM, it used one transistor and one capacitor (also known as a conductor), which is needed to get recharged in milliseconds due to the presence of the capacitor. Dynamic RAM was the first sold memory integrated circuit. DRAM is the second most compact technology in production (First is Flash Memory). DRAM has one transistor and one capacitor in 1 memory bit. Although DRAM is slower, it can store more bits per chip, for instance, for the same amount of memory stored in SRAM, DRAM requires one less chip. DRAM requires less power and hence, less heat is produced.

If you see the one bit of the DRAM cell, then it consists of one transistor and one capacitor. So, in case of the DRAM cell, the memory bits are stored in the form of charge across this capacitor. So, by charging and discharging the capacitor, we can know that whether the bit that is stored inside this capacitor is logic 1 or logic 0. So, now in case of this DRAM cell, we can access this capacitor by using this pass transistor. So, when this pass transistor is turned ON, then we can read the capacitor data or we can write onto this capacitor. And when this pass transistor is OFF, then the charge across the capacitor should remain as it is. So, in the ideal case, this capacitor should not lose its charge. But in the actual case, if you see, there will be some leakage current and because of that, the capacitor will lose its charge gradually. And that is the reason, this dynamic cell requires the periodic refresh cycles. And that is the reason behind it, why this memory is known as the Dynamic RAM.

Static RAM (SRAM) Digital device that uses the same logic elements used in the processor Binary values are stored using traditional flip-flop logic gate configurations Will hold its data as long as power is supplied to it

SRAM, consists of 6 transistors . So, out of the 6 transistors, the two transistors are the pass transistors which will give access to the bit lines, while the reaming four transistors are the two cross-coupled inverters. So, here this transistor 1 and 2, is the first CMOS inverter pair and the transistor 3 and 4 are the second CMOS inverter pair.

So, in case of this SRAM cell, the memory bit is stored between this two cross-coupled inverters. So, let us say if we have latched logical 1 then at the output of the first inverter we will have logic 0. And again at the output of the second inverter, we will have logic 1. So, as far as the power is supplied to this SRAM, the logic 1 will be get circulated between these two inverter pairs. So, unlike in case of the dynamic RAM cell, we do not require any kind of refresh cycles during this SRAM operation. And that is the reason, this SRAM is known as the static RAM.

SRAM versus DRAM Both volatile Power must be continuously supplied to the memory to preserve the bit values Dynamic cell Simpler to build, smaller More dense (smaller cells = more cells per unit area) Less expensive Requires the supporting refresh circuitry Tend to be favored for large memory requirements Used for main memory Static Faster Used for cache memory (both on and off chip) SRAM DRAM

Table 5.1 Semiconductor Memory Types

Advanced DRAM Organization One of the most critical system bottlenecks when using high-performance processors is the interface to main internal memory The traditional DRAM chip is constrained both by its internal architecture and by its interface to the processor’s memory bus A number of enhancements to the basic DRAM architecture have been explored The schemes that currently dominate the market are SDRAM and DDR-DRAM SDRAM RDRAM DDR-DRAM

The traditional DRAM is asynchronous in nature

S ynchronous DRAM (SDRAM) SDRAM exchanges data with the processor synchronized to an external clock signal and running at the full speed of the processor/memory bus without imposing wait states.

Synchronous DRAM (SDRAM)

So data path is 8 bytes

Double Data Rate SDRAM (DDR SDRAM) Developed by the JEDEC Solid State Technology Association (Electronic Industries Alliance’s semiconductor-engineering-standardization body) Numerous companies make DDR chips, which are widely used in desktop computers and servers DDR achieves higher data rates in three ways: First, the data transfer is synchronized to both the rising and falling edge of the clock, rather than just the rising edge Second, DDR uses higher clock rate on the bus to increase the transfer rate Third, a buffering scheme is used

Table 5.4 DDR Characteristics

Into this mix, as we have seen, as been added flash memory. Flash memory has the advantage over traditional memory that it is nonvolatile. NOR flash is best suited to storing programs and static application data in embedded systems, while NAND flash has characteristics intermediate between DRAM and hard disks. Over time, each of these technologies has seen improvements in scaling: higher bit density, higher speed, lower power consumption, and lower cost. However, for semiconductor memory, it is becoming increasingly difficult to continue the pace of improvement [ITRS14].

Recently, there have been breakthroughs in developing new forms of nonvolatile semiconductor memory that continue scaling beyond flash memory. The most promising technologies are spin-transfer torque RAM (STT-RAM), phasechange RAM (PCRAM), and resistive RAM (ReRAM). All of these are in volume production. However, because NAND Flash and to some extent NOR Flash are still dominating the applications, these emerging memories have been used in specialty applications and have not yet fulfilled their original promise to become dominating mainstream high-density nonvolatile memory. This is likely to change in the next few years.

Lecture 17 External Memory

Example Problem1 Consider a disk with 4 platters, 2 surfaces per platter, 1000 tracks per surface, 50 sectors per track, and 512 bytes per sector. What is the disk capacity? Capacity of disk pack = Total number of surfaces x Number of tracks per surface x Number of sectors per track x Number of bytes per sector =8 x 1000 x 50 x 512 204800000 bytes

Example Problem2 Consider a disk pack with the following specifications- 16 surfaces, 128 tracks per surface, 256 sectors per track and 512 bytes per sector. What is the capacity of disk pack? What is the number of bits required to address the sector? Given- Number of surfaces = 16 Number of tracks per surface = 128 Number of sectors per track = 256 Number of bytes per sector = 512 bytes

Part-01: Capacity of Disk Pack Capacity of disk pack = Total number of surfaces x Number of tracks per surface x Number of sectors per track x Number of bytes per sector = 16 x 128 x 256 x 512 bytes =2 4 x 2 7 x 2 8 x 2 9 = 2 28 bytes To convert bytes to MB =2 28 / (1024) * 1024) =2 28 / 2 10 x 2 10 =2 8 MB = 256 MB 2 10 2 20 2 30 2 40 2 50

Part-02: Number of Bits Required To Address Sector Total number of sectors = Total number of surfaces x Number of tracks per surface x Number of sectors per track = 16 x 128 x 256 sectors =2 4 x 2 7 x 2 8 = 2 19 sectors Thus, Number of bits required to address the sector = 19 bits

RAID -Redundant Array of Independent (or Inexpensive) Disks As businesses grow, demand arises for more reliable infrastructure that can handle critical systems. An important component in stable and scalable infrastructure is proper memory management.

What is RAID? RAID is a storage virtualization technology which is used to organise multiple drives into various arrangments to meet certain goals like redundancy, speed and capacity. RAID can be categorized into Software RAID and Hardware RAID. In software RAID, the memory architecture is managed by the operating system. In case of hardware RAID, there is a dedicated controller and processor present inside the disks that manage the memory. There are various raid levels.

RAID Levels

RAID 0 RAID 0 is based on data striping. A stream of data is divided into multiple segments or blocks and each of those blocks is stored on different disks. So, when the system wants to read that data, it can do so simultaneously from all the disks and join them together to reconstruct the entire data stream. The benefit of this is that the speed increases drastically for read and write operations. It is great for situations where performance is a priority over other aspects. Also, the total capacity of the entire volume is the sum of the capacities of the individual disks. The downside, as you may have already guessed it is that there is almost no redundancy. If one of the disks fails, the entire data becomes corrupt and worthless since it cannot be recreated anymore.

RAID 0 Advantages: Performance boost for read and write operations Space is not wasted as the entire volume of the individual disks are used up to store unique data Disadvantages There is no redundancy/duplication of data. If one of the disks fails, the entire data is lost.

RAID 1 RAID 1 uses the concept of data mirroring. Data is mirrored or cloned to an identical set of disks so that if one of the disks fails, the other one can be used. It also improves read performance since different blocks of data can be accessed from all the disks simultaneously. A multi-threaded process can access Block 1 from Disk 1 and Block 2 from Disk 2 at once thereby increasing the read speed just like RAID 0. But unlike RAID 0, write performance is reduced since all the drives must be updated whenever new data is written. Another disadvantage is that space is wasted to duplicate the data thereby increasing the cost to storage ratio.

RAID 1 Advantages Data can be recovered in case of disk failure Increased performance for read operation Disadvantages Slow write performance Space is wasted by duplicating data which increases the cost per unit memory

RAID 4 RAID 4 stripes the data across multiple disks just like RAID 0. In addition to that, it also stores parity information of all the disks in a separate dedicated disk to achieve redundancy. In the diagram below, Disk 4 serves as the parity disk having parity blocks Ap, Bp, Cp and Dp . So, if one of the disks fails, the data can be reconstructed using the parity information of that disk. Space is more efficiently used here when compared to RAID 1 since parity information uses way less space than mirroring the disk. The write performance becomes slow because all the parity information is written on a single disk which is a bottleneck. This problem is solved in RAID 5 .

RAID 4 Advantages Efficient data redundancy in terms of cost per unit memory Performance boost for read operations due to data stripping Disadvantages Write operation is slow If the dedicated parity disk fails, data redundancy is lost

RAID 5 RAID 5 is very similar to RAID 4, but here the parity information is distributed over all the disks instead of storing them in a dedicated disk. This has two benefits — First, there is no more a bottleneck as the parity stress evens out by using all the disks to store parity information and second, there is no possibility of losing data redundancy since one disk does not store all the parity information.

RAID 5 Advantages All the advantages of RAID 4 plus increased write speed and better data redundancy Disadvantages Can only handle up to a single disk failure

RAID 6 RAID 6 uses double parity blocks to achieve better data redundancy than RAID 5. This increases the fault tolerance for upto two drive failures in the array. Each disk has two parity blocks which are stored on different disks across the array. RAID 6 is a very practical infrastructure for maintaining high availability systems.

RAID 6 Advantages Better data redundancy. Can handle upto 2 failed drives Disadvantages Large parity overhead

RAID 10 (RAID 1+0) RAID 10 combines both RAID 1 and RAID 0 by layering them in opposite order. Sometimes, it is also called as “nested” or “hybrid” RAID. This is a “best of both worlds approach”, because it has the fast performance of RAID 0 and the redundancy of RAID 1. In this setup, multiple RAID 1 blocks are connected with each other to make it like RAID 0. It is used in cases where huge disk performance (greater than RAID 5 or 6) along with redundancy is required.

RAID 10 (RAID 1+0) Advantages Very fast performance Redundancy and fault tolerance Disadvantages Cost per unit memory is high since data is mirrored

RAID Implementation and Support Many operating systems have built-in support for RAID. Understanding the RAID levels is very crucial for developing storage infrastructure that meets the needs of the organisation . RAID has the capability to protect against disk failures and provide fast performance. However, it does not provide any means to protect against data corruption or implement security capabilities.

Thank You

UNIT II samiya patel random use pls.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

UNIT II samiya patel random use pls.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77