Title: Exploring Computer Architecture: Unraveling the Foundations of Modern Computing
Introduction:
Computer architecture serves as the bedrock of modern computing, delineating the structural design principles and methodologies that underpin the functionality and performance of digital systems. Th...
Title: Exploring Computer Architecture: Unraveling the Foundations of Modern Computing
Introduction:
Computer architecture serves as the bedrock of modern computing, delineating the structural design principles and methodologies that underpin the functionality and performance of digital systems. This comprehensive exploration delves into the intricate realm of computer architecture, elucidating its fundamental concepts, historical evolution, and contemporary advancements. From the rudimentary architectures of early computing machines to the intricacies of modern multi-core processors, this book embarks on a journey to unravel the complexities of computer architecture and illuminate its pivotal role in shaping the digital landscape.
Chapter 1: Foundations of Computer Architecture
The inaugural chapter lays the groundwork by elucidating the foundational principles of computer architecture. It delves into the conceptual underpinnings of digital logic, binary representation, and Boolean algebra, elucidating how these elemental concepts form the building blocks of computational systems. Furthermore, it examines the evolution of computing devices from mechanical calculators to electronic computers, tracing the lineage of computer architecture from its nascent stages to contemporary manifestations.
Chapter 2: Instruction Set Architecture
Central to the functionality of any computing system is its instruction set architecture (ISA). This chapter scrutinizes the intricacies of ISA design, exploring the diverse instruction formats, addressing modes, and instruction types prevalent in modern computing devices. Additionally, it delves into the role of compilers and assemblers in translating high-level programming constructs into machine-executable instructions, offering insights into the symbiotic relationship between software and hardware in the realm of computer architecture.
Chapter 3: Processor Architecture
At the heart of every computing system lies the processor, or central processing unit (CPU), which executes instructions and orchestrates the flow of data within the system. This chapter delves into the architectural intricacies of CPUs, examining the organization of functional units, pipelining techniques, and cache hierarchies employed to enhance performance and efficiency. Moreover, it explores the evolution of processor architectures from single-core designs to the parallelism and concurrency inherent in modern multi-core processors, shedding light on the challenges and opportunities associated with scaling computational resources.
Chapter 4: Memory Hierarchy and Storage Systems
Efficient management of memory resources is paramount to the performance and scalability of computing systems. This chapter elucidates the memory hierarchy, encompassing registers, caches, main memory, and secondary storage devices, and analyzes the trade-offs inherent in optimizing memory access latency, throughput, and capacity. Furthermore, it explores emerging mem
Size: 4.59 MB
Language: en
Added: Apr 16, 2024
Slides: 103 pages
Slide Content
1 | Page
Computer Organisation And Architecture
Lecture Notes
Online/Blended Teaching Mode Level 1 Semester 2 over 10 Weeks 20 Contact Hours Lecture Introductory Concepts-
•
Memory Hierarchy (Page 3-5)
•
Memory Organization | Simultaneous Vs Hierarchical (Page 6-16)
Cache Memory-
•
Cache Memory | Multi-level Cache Organization (Page 17-21)
•
Cache Mapping Techniques (Page 22-29)
•
Direct Mapping | Implementation & Formulas (Page 30-34)
•
Problems On Direct Mapping (Page 35-49)
•
Problems On Fully Associative Mapping (Page 50-56)
•
K-way Set Associative Mapping | Implementation & Formulas (Page 57-61)
•
Problems On Set Associative Mapping (Page 62-75)
•
Misc Problems On Cache Mapping Techniques (Page 76-100)
•
Cache Line | Effects of Changing Cache Line Size (Page 101-109)
Magnetic Disk-
•
Magnetic Disk | Important Formulas (Page 110-120)
•
Practice Problems On Disk Formulas (Page 121-137)
2 | Page
Addressing Modes-
•
Addressing Modes | Types and their Applications (Page 138-151)
•
Syntax of Addressing Modes (Page 152-156)
•
Problems On Addressing Modes (Page 157-166)
System Bus-
•
System Bus | Components of System Bus (Page 167-174)
Pipelining-
•
Introduction to Pipelining | Pipelined Architecture (Page 174-179)
•
Pipeline Execution | Performance | Formulas (Page 180-185)
•
Problems On Pipelining (Page 186-205)
3 | Page
Memory Hierarchy-
•
Memory hierarchy is the hierarchy of memory and storage devices found in a computer system.
•
It ranges from the slowest but high capacity auxiliary memory to the fastest but low capacity cache memory.
Need- There is a trade-off among the three key characteristics of memory namely-
•
Cost
•
Capacity
•
Access time
Memory hierarchy is employed to balance this trade-off.
Memory Hierarchy Diagram-
4 | Page
Level-0:
•
At level-0, registers are present which are contained inside the CPU.
•
Since they are present inside the CPU, they have least access time.
•
They are most expensive and therefore smallest in size (in KB).
•
Registers are implemented using Flip-Flops.
Level-1:
•
At level-1, Cache Memory is present.
•
It stores the segments of program that are frequently accessed by the processor.
•
It is expensive and therefore smaller in size (in MB).
•
Cache memory is implemented using static RAM.
Level-2:
•
At level-2, main memory is present.
•
It can communicate directly with the CPU and with auxiliary memory devices through an I/O processor.
•
It is less expensive than cache memory and therefore larger in size (in few GB).
•
Main memory is implemented using dynamic RAM.
Level-3:
•
At level-3, secondary storage devices like Magnetic Disk are present.
•
They are used as back up storage.
•
They are cheaper than main memory and therefore much larger in size (in few TB).
Level-4:
•
At level-4, tertiary storage devices like magnetic tape are present.
•
They are used to store removable files.
•
They are cheapest and largest in size (1-20 TB).
5 | Page
Observations- The following observations can be made when going down in the memory hierarchy-
•
Cost / bit decreases
•
Frequency of access decreases
•
Capacity increases
•
Access time increases
Goals of Memory Hierarchy- The goals of memory hierarchy are-
•
To obtain the highest possible average access speed
•
To minimize the total cost of the entire memory system
To gain better understanding about Memory Hierarchy-
Watch Video1 Self-Study Learning
6 | Page
Memory Organization in Computer Architecture- In a computer,
•
Memory is organized at different levels.
•
CPU may try to access different levels of memory in different ways.
•
On this basis, the memory organization is broadly divided into two types-
1. Simultaneous Access Memory Organization- In this memory organization,
•
All the levels of memory are directly connected to the CPU.
•
Whenever CPU requires any word, it starts searching for it in all the levels simultaneously.
Example-01: Consider the following simultaneous access memory organization-
7 | Page
Here, two levels of memory are directly connected to the CPU.
Let- L1 be the cache memory
•
T1 = Access time of level L1
•
S1 = Size of level L1
•
C1 = Cost per byte of level L1
•
H1 = Hit rate of level L1
Similar are the notations for level L2, let L2 be Main Memory
•
T2 = Access time of level L2
•
S2 = Size of level L2
•
C2 = Cost per byte of level L2
•
H2 = Hit rate of level L2
Average Memory Access Time- Average time required to access memory per operation = H1 x T1 + (1 – H1) x H2 x T2 = H1 x T1 + (1 – H1) x 1 x T2 = H1 x T1 + (1 – H1) x T2
8 | Page
Important Note
In any memory organization,
•
The data item being searched will definitely be present in the last level.
•
Thus, hit rate for the last level is always 1.
Average Cost Per Byte- Average cost per byte of the memory = { C1 x S1 + C2 x S2 } / { S1 + S2 } Example-02: Consider the following simultaneous access memory organization-
Here, three levels of memory are directly connected to the CPU.
Let-
•
T1 = Access time of level L1
•
S1 = Size of level L1
•
C1 = Cost per byte of level L1
•
H1 = Hit rate of level L1
9 | Page
Similar are the notations for other two levels.
Average Memory Access Time- Average time required to access memory per operation = H1 x T1 + (1 – H1) x H2 x T2 + (1 – H1) x (1 – H2) x H3 x T3 = H1 x T1 + (1 – H1) x H2 x T2 + (1 – H1) x (1 – H2) x 1 x T3 = H1 x T1 + (1 – H1) x H2 x T2 + (1 – H1) x (1 – H2) x T3 Average Cost Per Byte- Average cost per byte of the memory = { C1 x S1 + C2 x S2 + C3 x S3 } / { S1 + S2 + S3 } 2. Hierarchical Access Memory Organization- In this memory organization, memory levels are organized as-
•
Level-1 is directly connected to the CPU.
•
Level-2 is directly connected to level-1.
•
Level-3 is directly connected to level-2 and so on.
Whenever CPU requires any word,
•
It first searches for the word in level-1.
•
If the required word is not found in level-1, it searches for the word in level-2.
•
If the required word is not found in level-2, it searches for the word in level-3 and so on.
10 | Page
Example-01: Consider the following hierarchical access memory organization-
Here, two levels of memory are connected to the CPU in a hierarchical fashion.
Let-
•
T1 = Access time of level L1
•
S1 = Size of level L1
•
C1 = Cost per byte of level L1
•
H1 = Hit rate of level L1
Similar are the notations for level L2.
Average Memory Access Time- Average time required to access memory per operation = H1 x T1 + (1 – H1) x H2 x (T1 + T2) = H1 x T1 + (1 – H1) x 1 x (T1 + T2) = H1 x T1 + (1 – H1) x (T1 + T2)
11 | Page
Average Cost Per Byte- Average cost per byte of the memory = { C1 x S1 + C2 x S2 } / { S1 + S2 } Example-02: Consider the following hierarchical access memory organization-
Here, three levels of memory are connected to the CPU in a hierarchical fashion.
Let-
•
T1 = Access time of level L1
•
S1 = Size of level L1
12 | Page
•
C1 = Cost per byte of level L1
•
H1 = Hit rate of level L1
Similar are the notations for other two levels.
Average Memory Access Time- Average time required to access memory per operation = H1 x T1 + (1 – H1) x H2 x (T1 + T2) + (1 – H1) x (1 – H2) x H3 x (T1 + T2 + T3) = H1 x T1 + (1 – H1) x H2 x (T1 + T2) + (1 – H1) x (1 – H2) x 1 x (T1 + T2 + T3) = H1 x T1 + (1 – H1) x H2 x (T1 + T2) + (1 – H1) x (1 – H2) x (T1 + T2 + T3) Average Cost Per Byte- Average cost per byte of the memory = { C1 x S1 + C2 x S2 + C3 x S3 } / { S1 + S2 + S3 } PRACTICE PROBLEMS BASED ON MEMORY ORGANIZATION- Problem-01: What is the average memory access time for a machine with a cache hit rate of 80% and cache access time of 5 ns and main memory access time of 100 ns when-
1. Simultaneous access memory organization is used. 2. Hierarchical access memory organization is used.
Solution- Part-01: Simultaneous Access Memory Organization- The memory organization will be as shown-
13 | Page
Average memory access time = H1 x T1 + (1 – H1) x H2 x T2 = 0.8 x 5 ns + (1 – 0.8) x 1 x 100 ns = 4 ns + 0.2 x 100 ns = 4 ns + 20 ns = 24 ns Part-02: Hierarchical Access Memory Organization- The memory organization will be as shown-
14 | Page
Average memory access time = H1 x T1 + (1 – H1) x H2 x (T1 + T2) = 0.8 x 5 ns + (1 – 0.8) x 1 x (5 ns + 100 ns) = 4 ns + 0.2 x 105 ns = 4 ns + 21 ns = 25 ns Problem-02: A computer has a cache, main memory and a disk used for virtual memory. An access to the cache takes 10 ns. An access to main memory takes 100 ns. An access to the disk takes 10,000 ns. Suppose the cache hit ratio is 0.9 and the main memory hit ratio is 0.8. The effective access time required to access a referenced word on the system is _______ when-
1. Simultaneous access memory organization is used. 2. Hierarchical access memory organization is used.
Solution- Part-01: Simultaneous Access Memory Organization- The memory organization will be as shown-
15 | Page
Effective memory access time = H1 x T1 + (1 – H1) x H2 x T2 + (1 – H1) x (1 – H2) x H3 x T3 = 0.9 x 10 ns + (1 – 0.9) x 0.8 x 100 ns + (1 – 0.9) x (1 – 0.8) x 1 x 10000 ns = 9 ns + 8 ns + 200 ns = 217 ns Part-02: Hierarchical Access Memory Organization- The memory organization will be as shown-
16 | Page
Effective memory access time = H1 x T1 + (1 – H1) x H2 x (T1 + T2) + (1 – H1) x (1 – H2) x H3 x (T1 + T2 + T3) = 0.9 x 10 ns + (1 – 0.9) x 0.8 x (10 ns + 100 ns) + (1 – 0.9) x (1 – 0.8) x 1 x (10 ns + 100 ns + 10000 ns) = 9 ns + 8.8 ns + 202.2 ns = 220 ns
Important Note
While solving numerical problems
If the kind of memory organization used is not mentioned, assume hierarchical
access memory organization.
To gain better understanding about Memory Organization,
Watch Video2 Self-Study Learning
17 | Page
Cache Memory-
•
Cache memory is a Random Access Memory.
•
The main advantage of cache memory is its very fast speed.
•
It can be accessed by the CPU at much faster speed than main memory.
Location-
•
Cache memory lies on the path between the CPU and the main memory.
•
It facilitates the transfer of data between the processor and the main memory at the speed which matches to the speed of the processor.
•
Data is transferred in the form of words between the cache memory and the CPU.
•
Data is transferred in the form of blocks or pages between the cache memory and the main memory.
18 | Page
Purpose-
•
The fast speed of the cache memory makes it extremely useful.
•
It is used for bridging the speed mismatch between the fastest CPU and the main memory.
•
It does not let the CPU performance suffer due to the slower speed of the main memory.
Execution Of Program-
•
Whenever any program has to be executed, it is first loaded in the main memory.
•
The portion of the program that is mostly probably going to be executed in the near future is kept in the cache memory.
•
This allows CPU to access the most probable portion at a faster speed.
Step-01: Whenever CPU requires any word of memory, it is first searched in the CPU registers. Now, there are two cases possible- Case-01:
•
If the required word is found in the CPU registers, it is read from there.
Case-02:
•
If the required word is not found in the CPU registers, Step-02 is followed.
Step-02:
•
When the required word is not found in the CPU registers, it is searched in the cache memory.
•
Tag directory of the cache memory is used to search whether the required word is present in the cache memory or not.
19 | Page
Now, there are two cases possible- Case-01:
•
If the required word is found in the cache memory, the word is delivered to the CPU.
•
This is known as Cache hit.
Case-02:
•
If the required word is not found in the cache memory, Step-03 is followed.
•
This is known as Cache miss.
Step-03:
•
When the required word is not found in the cache memory, it is searched in the main memory.
•
Page Table is used to determine whether the required page is present in the main memory or not.
Now, there are two cases possible- Case-01: If the page containing the required word is found in the main memory,
•
The page is mapped from the main memory to the cache memory.
•
This mapping is performed using cache mapping techniques.
•
Then, the required word is delivered to the CPU.
Case-02: If the page containing the required word is not found in the main memory,
20 | Page
•
A page fault occurs.
•
The page containing the required word is mapped from the secondary memory to the main memory.
•
Then, the page is mapped from the main memory to the cache memory.
•
Then, the required word is delivered to the CPU.
Multilevel Cache Organization-
•
A multilevel cache organization is an organization where cache memories of different sizes are organized at multiple levels to increase the processing speed to a greater extent.
•
The smaller the size of cache, the faster its speed.
•
The smallest size cache memory is placed closest to the CPU.
•
This helps to achieve better performance in terms of speed.
Example- Three level cache organization consists of three cache memories of different size organized at three different levels as shown below-
Size (L1 Cache) < Size (L2 Cache) < Size (L3 Cache) < Size (Main Memory)
21 | Page
To gain better understanding about Cache Memory,
Watch Video3 Lecture Self Study Learning
22 | Page
Cache Memory- We have discussed-
Cache memory bridges the speed mismatch between the processor and the main
memory.
When cache hit occurs,
•
The required word is present in the cache memory.
•
The required word is delivered to the CPU from the cache memory.
When cache miss occurs,
•
The required word is not present in the cache memory.
•
The page containing the required word has to be mapped from the main memory.
•
This mapping is performed using cache mapping techniques.
Next, we will discuss different cache mapping techniques.
Cache Mapping-
•
Cache mapping defines how a block from the main memory is mapped to the cache memory in case of a cache miss.
OR
•
Cache mapping is a technique by which the contents of main memory are brought into the cache memory.
The following diagram illustrates the mapping process-
23 | Page
Now, before proceeding further, it is important to note the following points-
NOTES
•
Main memory is divided into equal size partitions called as blocks or frames.
•
Cache memory is divided into partitions having same size as that of blocks called as lines.
•
During cache mapping, block of main memory is simply copied to the cache and the block is not actually brought from the main memory.
Cache Mapping Techniques- Cache mapping is performed using following three different techniques-
24 | Page
1. Direct Mapping 2. Fully Associative Mapping 3. K-way Set Associative Mapping
1. Direct Mapping- In direct mapping,
•
A particular block of main memory can map only to a particular line of the cache.
•
The line number of cache to which a particular block can map is given by-
Cache line number
= ( Main Memory Block Address ) Modulo (Number of lines in Cache)
Example-
•
Consider cache memory is divided into ‘n’ number of lines.
•
Then, block ‘j’ of main memory can map to line number (j mod n) only of the cache.
25 | Page
Need of Replacement Algorithm- In direct mapping,
•
There is no need of any replacement algorithm.
•
This is because a main memory block can map only to a particular line of the cache.
•
Thus, the new incoming block will always replace the existing block (if any) in that particular line.
Division of Physical Address- In direct mapping, the physical address is divided as-
26 | Page
2. Fully Associative Mapping- In fully associative mapping,
•
A block of main memory can map to any line of the cache that is freely available at that moment.
•
This makes fully associative mapping more flexible than direct mapping.
Example- Consider the following scenario-
27 | Page
Here,
•
All the lines of cache are freely available.
•
Thus, any block of main memory can map to any line of the cache.
•
Had all the cache lines been occupied, then one of the existing blocks will have to be replaced.
Need of Replacement Algorithm- In fully associative mapping,
•
A replacement algorithm is required.
•
Replacement algorithm suggests the block to be replaced if all the cache lines are occupied.
•
Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed.
Division of Physical Address- In fully associative mapping, the physical address is divided as-
3. K-way Set Associative Mapping- In k-way set associative mapping,
•
Cache lines are grouped into sets where each set contains k number of lines.
•
A particular block of main memory can map to only one particular set of the cache.
•
However, within that set, the memory block can map any cache line that is freely available.
•
The set of the cache to which a particular block of the main memory can map is given by-
28 | Page
Cache set number
= ( Main Memory Block Address ) Modulo (Number of sets in Cache)
Example- Consider the following example of 2-way set associative mapping-
Here,
•
k = 2 (because 2-way set associative mapping) suggests that each set contains two cache lines.
•
When cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets.
•
Block ‘j’ of main memory can map to set number (j mod 3) only of the cache.
•
Within that set, block ‘j’ can map to any cache line that is freely available at that moment.
•
If all the cache lines are occupied, then one of the existing blocks will have to be replaced.
29 | Page
Need of Replacement Algorithm-
•
Set associative mapping is a combination of direct mapping and fully associative mapping.
•
It uses fully associative mapping within each set.
•
Thus, set associative mapping requires a replacement algorithm.
Division of Physical Address- In set associative mapping, the physical address is divided as-
Special Cases-
•
If k = 1, then k-way set associative mapping becomes direct mapping i.e.
1-way Set Associative Mapping ≡ Direct Mapping
•
If k = Total number of lines in the cache, then k-way set associative mapping becomes fully associative mapping.
30 | Page
Cache Mapping-
Cache mapping is a technique by which the contents of main memory are
brought into the cache memory.
Different cache mapping techniques are-
1. Direct Mapping 2. Fully Associative Mapping 3. K-way Set Associative Mapping
Next, we will discuss about direct mapping in detail.
Direct Mapping- In direct mapping,
•
A particular block of main memory can map to only one particular line of the cache.
•
The line number of cache to which a particular block can map is given by-
31 | Page
Cache line number
= ( Main Memory Block Address ) Modulo (Number of lines in Cache)
Division of Physical Address- In direct mapping, the physical address is divided as-
Direct Mapped Cache-
Direct mapped cache employs direct cache mapping technique.
The following steps explain the working of direct mapped cache- After CPU generates a memory request,
•
The line number field of the address is used to access the particular line of the cache.
•
The tag field of the CPU address is then compared with the tag of the line.
•
If the two tags match, a cache hit occurs and the desired word is found in the cache.
•
If the two tags do not match, a cache miss occurs.
•
In case of a cache miss, the required word has to be brought from the main memory.
•
It is then stored in the cache together with the new tag replacing the previous one.
32 | Page
Implementation- The following diagram shows the implementation of direct mapped cache-
(For simplicity, this diagram shows does not show all the lines of multiplexers)
The steps involved are as follows- Step-01:
•
Each multiplexer reads the line number from the generated physical address using its select lines in parallel.
•
To read the line number of L bits, number of select lines each multiplexer must have = L.
33 | Page
Step-02:
•
After reading the line number, each multiplexer goes to the corresponding line in the cache memory using its input lines in parallel.
•
Number of input lines each multiplexer must have = Number of lines in the cache memory
Step-03:
•
Each multiplexer outputs the tag bit it has selected from that line to the comparator using its output line.
•
Number of output line in each multiplexer = 1.
UNDERSTAND
It is important to understand-
•
A multiplexer can output only a single bit on output line.
•
So, to output the complete tag to the comparator,
Number of multiplexers required = Number of bits in the tag
•
Each multiplexer is configured to read the tag bit at specific location.
Example-
•
1st multiplexer is configured to output the first bit of the tag.
•
2nd multiplexer is configured to output the second bit of the tag.
•
3rd multiplexer is configured to output the third bit of the tag and so on.
So,
•
Each multiplexer selects the tag bit of the selected line for which it has been configured and outputs on the output line.
•
The complete tag as a whole is sent to the comparator for comparison in parallel.
Step-04:
34 | Page
•
Comparator compares the tag coming from the multiplexers with the tag of the generated address.
•
Only one comparator is required for the comparison where-
Size of comparator = Number of bits in the tag
•
If the two tags match, a cache hit occurs otherwise a cache miss occurs.
Hit latency- The time taken to find out whether the required word is present in the Cache Memory or not is called as hit latency. For direct mapped cache,
Hit latency = Multiplexer latency + Comparator latency
Important Results- Following are the few important results for direct mapped cache-
•
Block j of main memory can map to line number (j mod number of lines in cache) only of the cache.
•
Number of multiplexers required = Number of bits in the tag
•
Size of each multiplexer = Number of lines in cache x 1
•
Number of comparators required = 1
•
Size of comparator = Number of bits in the tag
•
Hit latency = Multiplexer latency + Comparator latency
To gain better understanding about direct mapping,
Watch Video4 Lecture Self Study Learning
35 | Page
Direct Mapping-Practice Problems In direct mapping,
•
A particular block of main memory can be mapped to one particular cache line only.
•
Block ‘j’ of main memory will map to line number (j mod number of cache lines) of the cache.
•
There is no need of any replacement algorithm.
Next, we will discuss practice problems based on direct mapping.
PRACTICE PROBLEMS BASED ON DIRECT MAPPING- Problem-01: Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main memory is 128 KB. Find-
1. Number of bits in tag 2. Tag directory size
Solution- Given-
•
Cache memory size = 16 KB
•
Block size = Frame size = Line size = 256 bytes
•
Main memory size = 128 KB
We consider that the memory is byte addressable. Number of Bits in Physical Address- We have, Size of main memory
36 | Page
= 128 KB = 2
17
bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset- We have, Block size = 256 bytes = 2
8
bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Line Number- Total number of lines in cache = Cache size / Line size = 16 KB / 256 bytes = 2
14
bytes / 2
8
bytes
= 2
6
lines
Thus, Number of bits in line number = 6 bits
37 | Page
Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – (Number of bits in line number + Number of bits in block offset) = 17 bits – (6 bits + 8 bits) = 17 bits – 14 bits = 3 bits Thus, Number of bits in tag = 3 bits
Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
6
x 3 bits
= 192 bits = 24 bytes Thus, size of tag directory = 24 bytes
38 | Page
Problem-02: Given-
•
Cache memory size = 512 KB
•
Block size = Frame size = Line size = 1 KB
•
Number of bits in tag = 7 bits
We consider that the memory is byte addressable. Number of Bits in Block Offset- We have, Block size = 1 KB = 2
10
bytes
Thus, Number of bits in block offset = 10 bits
Number of Bits in Line Number- Total number of lines in cache = Cache size / Line size = 512 KB / 1 KB = 2
9
lines
Thus, Number of bits in line number = 9 bits
39 | Page
Number of Bits in Physical Address- Number of bits in physical address = Number of bits in tag + Number of bits in line number + Number of bits in block offset = 7 bits + 9 bits + 10 bits = 26 bits Thus, Number of bits in physical address = 26 bits Size of Main Memory- We have, Number of bits in physical address = 26 bits Thus, Size of main memory = 2
26
bytes
= 64 MB Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
9
x 7 bits
= 3584 bits = 448 bytes Thus, size of tag directory = 448 bytes
40 | Page
Problem-03: Consider a direct mapped cache with block size 4 KB. The size of main memory is 16 GB and there are 10 bits in the tag. Find-
1. Size of cache memory 2. Tag directory size
Solution- Given-
•
Block size = Frame size = Line size = 4 KB
•
Size of main memory = 16 GB
•
Number of bits in tag = 10 bits
We consider that the memory is byte addressable. Number of Bits in Physical Address- We have, Size of main memory = 16 GB = 2
34
bytes
Thus, Number of bits in physical address = 34 bits
41 | Page
Number of Bits in Block Offset- We have, Block size = 4 KB = 2
12
bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Line Number- Number of bits in line number = Number of bits in physical address – (Number of bits in tag + Number of bits in block offset) = 34 bits – (10 bits + 12 bits) = 34 bits – 22 bits = 12 bits Thus, Number of bits in line number = 12 bits
42 | Page
Number of Lines in Cache- We have- Number of bits in line number = 12 bits Thus, Total number of lines in cache = 2
12
lines
Size of Cache Memory- Size of cache memory = Total number of lines in cache x Line size = 2
12
x 4 KB
= 2
14
KB
= 16 MB Thus, Size of cache memory = 16 MB Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
12
x 10 bits
= 40960 bits = 5120 bytes Thus, size of tag directory = 5120 bytes
43 | Page
Problem-04: Consider a direct mapped cache of size 32 KB with block size 32 bytes. The CPU generates 32 bit addresses. The number of bits needed for cache indexing and the number of tag bits are respectively-
A. 10, 17 B. 10, 22 C. 15, 17 D. 5, 17
Solution- Given-
•
Cache memory size = 32 KB
•
Block size = Frame size = Line size = 32 bytes
•
Number of bits in physical address = 32 bits
Number of Bits in Block Offset- We have, Block size = 32 bytes = 2
5
bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number- Total number of lines in cache
44 | Page
= Cache size / Line size = 32 KB / 32 bytes = 2
10
lines
Thus, Number of bits in line number = 10 bits
Number of Bits Required For Cache Indexing- Number of bits required for cache indexing = Number of bits in line number = 10 bits Number Of Bits in Tag- Number of bits in tag = Number of bits in physical address – (Number of bits in line number + Number of bits in block offset) = 32 bits – (10 bits + 5 bits) = 32 bits – 15 bits = 17 bits Thus, Number of bits in tag = 17 bits
45 | Page
Thus, Option (A) is correct. Problem-05: Consider a machine with a byte addressable main memory of 2
32
bytes divided into blocks
of size 32 bytes. Assume that a direct mapped cache having 512 cache lines is used with this machine. The size of the tag field in bits is ______. Solution- Given-
•
Main memory size = 2
32
bytes
•
Block size = Frame size = Line size = 32 bytes
•
Number of lines in cache = 512 lines
Number of Bits in Physical Address- We have, Size of main memory = 2
32
bytes
Thus, Number of bits in physical address = 32 bits
46 | Page
Number of Bits in Block Offset- We have, Block size = 32 bytes = 2
5
bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number- Total number of lines in cache = 512 lines = 2
9
lines
Thus, Number of bits in line number = 9 bits
Number Of Bits in Tag- Number of bits in tag
47 | Page
= Number of bits in physical address – (Number of bits in line number + Number of bits in block offset) = 32 bits – (9 bits + 5 bits) = 32 bits – 14 bits = 18 bits Thus, Number of bits in tag = 18 bits
Problem-06: An 8 KB direct-mapped write back cache is organized as multiple blocks, each of size 32 bytes. The processor generates 32 bit addresses. The cache controller maintains the tag information for each cache block comprising of the following-
•
1 valid bit
•
1 modified bit
•
As many bits as the minimum needed to identify the memory block mapped in the cache
What is the total size of memory needed at the cache controller to store meta data (tags) for the cache?
A. 4864 bits B. 6144 bits C. 6656 bits D. 5376 bits
Solution- Given-
•
Cache memory size = 8 KB
•
Block size = Frame size = Line size = 32 bytes
•
Number of bits in physical address = 32 bits
48 | Page
Number of Bits in Block Offset- We have, Block size = 32 bytes = 2
5
bytes
Thus, Number of bits in block offset = 5 bits
Number of Bits in Line Number- Total number of lines in cache = Cache memory size / Line size = 8 KB / 32 bytes = 2
13
bytes / 2
5
bytes
= 2
8
lines
Thus, Number of bits in line number = 8 bits
Number Of Bits in Tag-
49 | Page
Number of bits in tag = Number of bits in physical address – (Number of bits in line number + Number of bits in block offset) = 32 bits – (8 bits + 5 bits) = 32 bits – 13 bits = 19 bits Thus, Number of bits in tag = 19 bits
Memory Size Needed At Cache Controller- Size of memory needed at cache controller = Number of lines in cache x (1 valid bit + 1 modified bit + 19 bits to identify block) = 2
8
x 21 bits
= 5376 bits To watch video solutions and practice more problems,
Watch Video5 Lecture Self Study Learning
50 | Page
Fully Associative Mapping- In fully associative mapping,
•
A block of main memory can be mapped to any freely available cache line.
•
This makes fully associative mapping more flexible than direct mapping.
•
A replacement algorithm is needed to replace a block if the cache is full.
Next, we will discuss practice problems based on fully associative mapping.
PRACTICE PROBLEMS BASED ON FULLY ASSOCIATIVE MAPPING- Problem-01: Consider a fully associative mapped cache of size 16 KB with block size 256 bytes. The size of main memory is 128 KB. Find-
1. Number of bits in tag 2. Tag directory size
Solution- Given-
•
Cache memory size = 16 KB
•
Block size = Frame size = Line size = 256 bytes
•
Main memory size = 128 KB
We consider that the memory is byte addressable. Number of Bits in Physical Address- We have, Size of main memory
51 | Page
= 128 KB = 2
17
bytes
Thus, Number of bits in physical address = 17 bits
\
Number of Bits in Block Offset- We have, Block size = 256 bytes = 2
8
bytes
Thus, Number of bits in block offset = 8 bits
Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – Number of bits in block offset = 17 bits – 8 bits = 9 bits Thus, Number of bits in tag = 9 bits
52 | Page
Number of Lines in Cache-
Total number of lines in cache
= Cache size / Line size
= 16 KB / 256 bytes
= 2
14
bytes / 2
8
bytes
= 2
6
lines
Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
6
x 9 bits
= 576 bits = 72 bytes Thus, size of tag directory = 72 bytes Problem-02: Consider a fully associative mapped cache of size 512 KB with block size 1 KB. There are 17 bits in the tag. Find-
1. Size of main memory 2. Tag directory size
53 | Page
Solution- Given-
•
Cache memory size = 512 KB
•
Block size = Frame size = Line size = 1 KB
•
Number of bits in tag = 17 bits
We consider that the memory is byte addressable. Number of Bits in Block Offset- We have, Block size = 1 KB = 2
10
bytes
Thus, Number of bits in block offset = 10 bits
Number of Bits in Physical Address- Number of bits in physical address = Number of bits in tag + Number of bits in block offset = 17 bits + 10 bits = 27 bits Thus, Number of bits in physical address = 27 bits
54 | Page
Size of Main Memory- We have, Number of bits in physical address = 27 bits Thus, Size of main memory = 2
27
bytes
= 128 MB Number of Lines in Cache- Total number of lines in cache = Cache size / Line size = 512 KB / 1 KB = 512 lines = 2
9
lines
Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
9
x 17 bits
= 8704 bits
55 | Page
= 1088 bytes Thus, size of tag directory = 1088 bytes Problem-03: Consider a fully associative mapped cache with block size 4 KB. The size of main memory is 16 GB. Find the number of bits in tag. Solution- Given-
•
Block size = Frame size = Line size = 4 KB
•
Size of main memory = 16 GB
We consider that the memory is byte addressable. Number of Bits in Physical Address- We have, Size of main memory = 16 GB = 2
34
bytes
Thus, Number of bits in physical address = 34 bits
Number of Bits in Block Offset-
56 | Page
We have, Block size = 4 KB = 2
12
bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – Number of bits in block offset = 34 bits – 12 bits = 22 bits Thus, Number of bits in tag = 22 bits
To watch video explanation and practice more problems,
Watch Video6A & Video6B Lecture Self Study Learning
57 | Page
Cache Mapping-
Cache mapping is a technique by which the contents of main memory are
brought into the cache memory.
Different cache mapping techniques are-
1. Direct Mapping 2. Fully Associative Mapping 3. K-way Set Associative Mapping
Next, we will discuss about set associative mapping in detail.
Set Associative Mapping- In k-way set associative mapping,
•
Cache lines are grouped into sets where each set contains k number of lines.
•
A particular block of main memory can map to only one particular set of the cache.
•
However, within that set, the memory block can map to any freely available cache line.
•
The set of the cache to which a particular block of the main memory can map is given by-
58 | Page
Cache set number
= ( Main Memory Block Address ) Modulo (Number of sets in Cache)
Division of Physical Address- In set associative mapping, the physical address is divided as-
Set Associative Cache-
Set associative cache employs set associative cache mapping technique.
The following steps explain the working of set associative cache- After CPU generates a memory request,
•
The set number field of the address is used to access the particular set of the cache.
•
The tag field of the CPU address is then compared with the tags of all k lines within that set.
•
If the CPU tag matches to the tag of any cache line, a cache hit occurs.
•
If the CPU tag does not match to the tag of any cache line, a cache miss occurs.
•
In case of a cache miss, the required word has to be brought from the main memory.
•
If the cache is full, a replacement is made in accordance with the employed replacement policy.
59 | Page
Implementation- The following diagram shows the implementation of 2-way set associative cache-
(For simplicity, this diagram shows does not show all the lines of multiplexers)
The steps involved are as follows- Step-01:
•
Each multiplexer reads the set number from the generated physical address using its select lines in parallel.
•
To read the set number of S bits, number of select lines each multiplexer must have = S.
60 | Page
Step-02:
•
After reading the set number, each multiplexer goes to the corresponding set in the cache memory.
•
Then, each multiplexer goes to the lines of that set using its input lines in parallel.
•
Number of input lines each multiplexer must have = Number of lines in one set
Step-03:
•
Each multiplexer outputs the tag bit it has selected from the lines of selected set to the comparators using its output line.
•
Number of output line in each multiplexer = 1.
UNDERSTAND
It is important to understand-
•
A multiplexer can output only a single bit on output line.
•
So, to output one complete tag to the comparator,
Number of multiplexers required = Number of bits in the tag
•
If there are k lines in one set, then number of tags to output = k, thus-
Number of multiplexers required = Number of lines in one set (k) x Number of bits
in the tag
•
Each multiplexer is configured to read the tag bit of specific line at specific location.
•
So, each multiplexer selects the tag bit for which it has been configured and outputs on the output line.
•
The complete tags as whole are sent to the comparators for comparison in parallel.
Step-04:
•
Comparators compare the tags coming from the multiplexers with the tag of the generated address.
61 | Page
•
This comparison takes place in parallel.
•
If there are k lines in one set (thus k tags), then-
Number of comparators required = k
and
Size of each comparator = Number of bits in the tag
•
The output result of each comparator is fed as an input to an OR Gate.
•
OR Gate is usually implemented using 2 x 1 multiplexer.
•
If the output of OR Gate is 1, a cache hit occurs otherwise a cache miss occurs.
Hit latency-
•
The time taken to find out whether the required word is present in the Cache Memory or not is called as hit latency.
For set associative mapping,
Hit latency = Multiplexer latency + Comparator latency + OR Gate latency
Important Results-
Following are the few important results for set associative cache-
•
Block j of main memory maps to set number (j mod number of sets in cache) of the cache.
•
Number of multiplexers required = Number of lines in one set (k) x Number of bits in tag
•
Size of each multiplexer = Number of lines in one set (k) x 1
•
Number of comparators required = Number of lines in one set (k)
•
Size of each comparator = Number of bits in the tag
•
Hit latency = Multiplexer latency + Comparator latency + OR Gate latency
To gain better understanding about set associative mapping,
Watch Video7 Lecture Self Study Learning
62 | Page
Set Associative Mapping- In set associative mapping,
•
A particular block of main memory can be mapped to one particular cache set only.
•
Block ‘j’ of main memory will map to set number (j mod number of sets in cache) of the cache.
•
A replacement algorithm is needed if the cache is full.
Next, we will discuss practice problems based on set associative mapping.
PRACTICE PROBLEMS BASED ON SET ASSOCIATIVE MAPPING- Problem-01: Consider a 2-way set associative mapped cache of size 16 KB with block size 256 bytes. The size of main memory is 128 KB. Find-
1. Number of bits in tag 2. Tag directory size
Solution- Given-
•
Set size = 2
•
Cache memory size = 16 KB
•
Block size = Frame size = Line size = 256 bytes
•
Main memory size = 128 KB
We consider that the memory is byte addressable. Number of Bits in Physical Address-
63 | Page
We have, Size of main memory = 128 KB = 2
17
bytes
Thus, Number of bits in physical address = 17 bits
Number of Bits in Block Offset- We have, Block size = 256 bytes = 2
8
bytes
Thus, Number of bits in block offset = 8 bits
Number of Lines in Cache- Total number of lines in cache = Cache size / Line size = 16 KB / 256 bytes = 2
14
bytes / 2
8
bytes
= 64 lines Thus, Number of lines in cache = 64 lines
64 | Page
Number of Sets in Cache- Total number of sets in cache = Total number of lines in cache / Set size = 64 / 2 = 32 sets = 2
5
sets
Thus, Number of bits in set number = 5 bits
Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – (Number of bits in set number + Number of bits in block offset) = 17 bits – (5 bits + 8 bits) = 17 bits – 13 bits = 4 bits Thus, Number of bits in tag = 4 bits
65 | Page
Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 64 x 4 bits = 256 bits = 32 bytes Thus, size of tag directory = 32 bytes Problem-02: Consider a 8-way set associative mapped cache of size 512 KB with block size 1 KB. There are 7 bits in the tag. Find-
1. Size of main memory 2. Tag directory size
Solution- Given-
•
Set size = 8
•
Cache memory size = 512 KB
•
Block size = Frame size = Line size = 1 KB
•
Number of bits in tag = 7 bits
66 | Page
We consider that the memory is byte addressable. Number of Bits in Block Offset- We have, Block size = 1 KB = 2
10
bytes
Thus, Number of bits in block offset = 10 bits
Number of Lines in Cache- Total number of lines in cache = Cache size / Line size = 512 KB / 1 KB = 512 lines Thus, Number of lines in cache = 512 lines Number of Sets in Cache- Total number of sets in cache = Total number of lines in cache / Set size = 512 / 8 = 64 sets = 2
6
sets
67 | Page
Thus, Number of bits in set number = 6 bits
Number of Bits in Physical Address- Number of bits in physical address = Number of bits in tag + Number of bits in set number + Number of bits in block offset = 7 bits + 6 bits + 10 bits = 23 bits Thus, Number of bits in physical address = 23 bits Size of Main Memory- We have, Number of bits in physical address = 23 bits Thus, Size of main memory = 2
23
bytes
= 8 MB Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 512 x 7 bits = 3584 bits
68 | Page
= 448 bytes Thus, size of tag directory = 448 bytes Problem-03: Consider a 4-way set associative mapped cache with block size 4 KB. The size of main memory is 16 GB and there are 10 bits in the tag. Find-
1. Size of cache memory 2. Tag directory size
Solution- Given-
•
Set size = 4
•
Block size = Frame size = Line size = 4 KB
•
Main memory size = 16 GB
•
Number of bits in tag = 10 bits
We consider that the memory is byte addressable. Number of Bits in Physical Address- We have, Size of main memory = 16 GB = 2
34
bytes
Thus, Number of bits in physical address = 34 bits
69 | Page
Number of Bits in Block Offset- We have, Block size = 4 KB = 2
12
bytes
Thus, Number of bits in block offset = 12 bits
Number of Bits in Set Number- Number of bits in set number = Number of bits in physical address – (Number of bits in tag + Number of bits in block offset) = 34 bits – (10 bits + 12 bits) = 34 bits – 22 bits = 12 bits Thus, Number of bits in set number = 12 bits
70 | Page
Number of Sets in Cache- We have- Number of bits in set number = 12 bits Thus, Total number of sets in cache = 2
12
sets
Number of Lines in Cache- We have- Total number of sets in cache = 2
12
sets
Each set contains 4 lines Thus, Total number of lines in cache = Total number of sets in cache x Number of lines in each set = 2
12
x 4 lines
= 2
14
lines
Size of Cache Memory- Size of cache memory = Total number of lines in cache x Line size = 2
14
x 4 KB
71 | Page
= 2
16
KB
= 64 MB Thus, Size of cache memory = 64 MB Tag Directory Size- Tag directory size = Number of tags x Tag size = Number of lines in cache x Number of bits in tag = 2
14
x 10 bits
= 163840 bits = 20480 bytes = 20 KB Thus, size of tag directory = 20 KB Problem-04: Consider a 8-way set associative mapped cache. The size of cache memory is 512 KB and there are 10 bits in the tag. Find the size of main memory. Solution- Given-
•
Set size = 8
•
Cache memory size = 512 KB
•
Number of bits in tag = 10 bits
We consider that the memory is byte addressable. Let-
•
Number of bits in set number field = x bits
•
Number of bits in block offset field = y bits
72 | Page
Sum of Number Of Bits Of Set Number Field And Block Offset Field- We have, Cache memory size = Number of sets in cache x Number of lines in one set x Line size Now, substituting the values, we get- 512 KB = 2
x
x 8 x 2
y
bytes
2
19
bytes = 2
3+x+y
bytes
19 = 3 +x + y x + y = 19 – 3 x + y = 16 Number of Bits in Physical Address- Number of bits in physical address = Number of bits in tag + Number of bits in set number + Number of bits in block offset = 10 bits + x bits + y bits = 10 bits + (x + y) bits = 10 bits + 16 bits = 26 bits Thus, Number of bits in physical address = 26 bits
73 | Page
Size of Main Memory- We have, Number of bits in physical address = 26 bits Thus, Size of main memory = 2
26
bytes
= 64 MB Thus, size of main memory = 64 MB Problem-05: Consider a 4-way set associative mapped cache. The size of main memory is 64 MB and there are 10 bits in the tag. Find the size of cache memory. Solution- Given-
•
Set size = 4
•
Main memory size = 64 MB
•
Number of bits in tag = 10 bits
We consider that the memory is byte addressable. Number of Bits in Physical Address-
74 | Page
We have, Size of main memory = 64 MB = 2
26
bytes
Thus, Number of bits in physical address = 26 bits
Sum Of Number Of Bits Of Set Number Field And Block Offset Field- Let-
•
Number of bits in set number field = x bits
•
Number of bits in block offset field = y bits
Then, Number of bits in physical address = Number of bits in tag + Number of bits in set number + Number of bits in block offset So, we have- 26 bits = 10 bits + x bits + y bits
75 | Page
26 = 10 + (x + y) x + y = 26 – 10 x + y = 16 Thus, Sum of number of bits of set number field and block offset field = 16 bits Size of Cache Memory- Cache memory size = Number of sets in cache x Number of lines in one set x Line size = 2
x
x 4 x 2
y
bytes
= 2
2+x+y
bytes
= 2
2+16
bytes
= 2
18
bytes
= 256 KB Thus, size of cache memory = 256 KB To watch video solutions and practice more problems,
Watch the Video8 Lecture Self Study Learning
76 | Page
Cache Mapping- We have discussed-
•
Cache mapping is a technique by which the contents of main memory are brought into the cache.
•
Cache mapping is performed using following three different techniques-
Next, we will discuss practice problems based on cache mapping techniques.
PRACTICE PROBLEMS BASED ON CACHE MAPPING TECHNIQUES- Problem-01: The main memory of a computer has 2 cm blocks while the cache has 2c blocks. If the cache uses the set associative mapping scheme with 2 blocks per set, then block k of the main memory maps to the set-
A. (k mod m) of the cache B. (k mod c) of the cache C. (k mod 2 c) of the cache D. (k mod 2 cm) of the cache
Solution-
77 | Page
Given-
•
Number of blocks in main memory = 2 cm
•
Number of blocks in cache = 2 c
•
Number of blocks in one set of cache = 2
Number of Sets in Cache- Number of sets in cache = Number of blocks in cache / Number of blocks in one set = 2 c / 2 = c Required Mapping- In set associative mapping,
•
Block ‘j’ of main memory maps to set number (j modulo number of sets in cache) of the cache.
•
So, block ‘k’ of main memory maps to set number (k mod c) of the cache.
•
Thus, option (B) is correct.
Problem-02: In a k-way set associative cache, the cache is divided into v sets, each of which consists of k lines. The lines of a set placed in sequence one after another. The lines in set s are sequenced before the lines in set (s+1). The main memory blocks are numbered 0 on wards. The main memory block numbered ‘j’ must be mapped to any one of the cache lines from-
A. (j mod v) x k to (j mod v) x k + (k – 1) B. (j mod v) to (j mod v) + (k – 1) C. (j mod k) to (j mod k) + (v – 1) D. (j mod k) x v to (j mod k) x v + (v – 1)
Solution-
78 | Page
Let-
•
2-way set associative mapping is used, then k = 2
•
Number of sets in cache = 4, then v = 4
•
Block number ‘3’ has to be mapped, then j = 3
The given options on substituting these values reduce to-
A. (3 mod 4) x 2 to (3 mod 4) x 2 + (2 – 1) = 6 to 7 B. (3 mod 4) to (3 mod 4) + (2 – 1) = 3 to 4 C. (3 mod 2) to (3 mod 2) + (4 – 1) = 1 to 4 D. (3 mod 2) x 4 to (3 mod 2) x 4 + (4 – 1) = 4 to 7
Now, we have been asked to what range of cache lines, block number 3 can be mapped. According to the above data, the cache memory and main memory will look like-
In set associative mapping,
•
Block ‘j’ of main memory maps to set number (j modulo number of sets in cache) of the cache.
•
So, block 3 of main memory maps to set number (3 mod 4) = 3 of cache.
79 | Page
•
Within set number 3, block 3 can be mapped to any of the cache lines.
•
Thus, block 3 can be mapped to cache lines ranging from 6 to 7.
Thus, Option (A) is correct. Problem-03: A block-set associative cache memory consists of 128 blocks divided into four block sets . The main memory consists of 16,384 blocks and each block contains 256 eight bit words.
1. How many bits are required for addressing the main memory?
2. How many bits are needed to represent the TAG, SET and WORD fields?
Solution- Given-
•
Number of blocks in cache memory = 128
•
Number of blocks in each set of cache = 4
•
Main memory size = 16384 blocks
•
Block size = 256 bytes
•
1 word = 8 bits = 1 byte
Main Memory Size- We have- Size of main memory = 16384 blocks = 16384 x 256 bytes = 2
22
bytes
Thus, Number of bits required to address main memory = 22 bits Number of Bits in Block Offset- We have-
80 | Page
Block size = 256 bytes = 2
8
bytes
Thus, Number of bits in block offset or word = 8 bits Number of Bits in Set Number- Number of sets in cache = Number of lines in cache / Set size = 128 blocks / 4 blocks = 32 sets = 2
5
sets
Thus, Number of bits in set number = 5 bits Number of Bits in Tag Number- Number of bits in tag = Number of bits in physical address – (Number of bits in set number + Number of bits in word) = 22 bits – (5 bits + 8 bits) = 22 bits – 13 bits = 9 bits Thus, Number of bits in tag = 9 bits Thus, physical address is-
81 | Page
Problem-04: A computer has a 256 KB, 4-way set associative, write back data cache with block size of 32 bytes. The processor sends 32 bit addresses to the cache controller. Each cache tag directory entry contains in addition to address tag, 2 valid bits, 1 modified bit and 1 replacement bit. Part-01: The number of bits in the tag field of an address is-
A. 11 B. 14 C. 16 D. 27
Part-02: The size of the cache tag directory is-
A. 160 Kbits B. 136 Kbits C. 40 Kbits D. 32 Kbits
Solution- Given-
•
Cache memory size = 256 KB
•
Set size = 4 blocks
•
Block size = 32 bytes
•
Number of bits in physical address = 32 bits
Number of Bits in Block Offset- We have- Block size
82 | Page
= 32 bytes = 2
5
bytes
Thus, Number of bits in block offset = 5 bits Number of Lines in Cache- Number of lines in cache = Cache size / Line size = 256 KB / 32 bytes = 2
18
bytes / 2
5
bytes
= 2
13
lines
Thus, Number of lines in cache = 2
13
lines
Number of Sets in Cache- Number of sets in cache = Number of lines in cache / Set size = 2
13
lines / 2
2
lines
= 2
11
sets
Thus, Number of bits in set number = 11 bits Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – (Number of bits in set number + Number of bits in block offset) = 32 bits – (11 bits + 5 bits) = 32 bits – 16 bits = 16 bits Thus, Number of bits in tag = 16 bits
83 | Page
Tag Directory Size- Size of tag directory = Number of lines in cache x Size of tag = 2
13
x (16 bits + 2 valid bits + 1 modified bit + 1 replacement bit)
= 2
13
x 20 bits
= 163840 bits = 20 KB or 160 Kbits Thus,
•
For part-01, Option (C) is correct.
•
For part-02, Option (A) is correct.
Problem-05: A 4-way set associative cache memory unit with a capacity of 16 KB is built using a block size of 8 words. The word length is 32 bits. The size of the physical address space is 4 GB. The number of bits for the TAG field is _____. Solution- Given-
•
Set size = 4 lines
•
Cache memory size = 16 KB
•
Block size = 8 words
•
1 word = 32 bits = 4 bytes
•
Main memory size = 4 GB
Number of Bits in Physical Address- We have,
84 | Page
Main memory size = 4 GB = 2
32
bytes
Thus, Number of bits in physical address = 32 bits Number of Bits in Block Offset- We have, Block size = 8 words = 8 x 4 bytes = 32 bytes = 2
5
bytes
Thus, Number of bits in block offset = 5 bits Number of Lines in Cache- Number of lines in cache = Cache size / Line size = 16 KB / 32 bytes = 2
14
bytes / 2
5
bytes
= 2
9
lines
= 512 lines Thus, Number of lines in cache = 512 lines Number of Sets in Cache- Number of sets in cache = Number of lines in cache / Set size = 512 lines / 4 lines
85 | Page
= 2
9
lines / 2
2
lines
= 2
7
sets
Thus, Number of bits in set number = 7 bits Number of Bits in Tag- Number of bits in tag = Number of bits in physical address – (Number of bits in set number + Number of bits in block offset) = 32 bits – (7 bits + 5 bits) = 32 bits – 12 bits = 20 bits Thus, number of bits in tag = 20 bits Problem-06: If the associativity of a processor cache is doubled while keeping the capacity and block size unchanged, which one of the following is guaranteed to be NOT affected?
A. Width of tag comparator B. Width of set index decoder C. Width of way selection multiplexer D. Width of processor to main memory data bus
Solution- Since block size is unchanged, so number of bits in block offset will remain unchanged. Effect On Width Of Tag Comparator-
•
Associativity of cache is doubled means number of lines in one set is doubled.
•
Since number of lines in one set is doubled, therefore number of sets reduces to half.
•
Since number of sets reduces to half, so number of bits in set number decrements by 1.
86 | Page
•
Since number of bits in set number decrements by 1, so number of bits in tag increments by 1.
•
Since number of bits in tag increases, therefore width of tag comparator also increases.
Effect On Width of Set Index Decoder-
•
Associativity of cache is doubled means number of lines in one set is doubled.
•
Since number of lines in one set is doubled, therefore number of sets reduces to half.
•
Since number of sets reduces to half, so number of bits in set number decrements by 1.
•
Since number of bits in set number decreases, therefore width of set index decoder also decreases.
Effect On Width Of Way Selection Multiplexer-
•
Associativity of cache (k) is doubled means number of lines in one set is doubled.
•
New associativity of cache is 2k.
•
To handle new associativity, size of multiplexers must be 2k x 1.
•
Therefore, width of way selection multiplexer increases.
Effect On Width Of Processor To Main Memory Data Bus-
•
Processor to main memory data bus has nothing to do with cache associativity.
•
It depends on the number of bits in block offset which is unchanged here.
•
So, width of processor to main memory data bus is not affected and remains unchanged.
Thus, Option (D) is correct. Problem-07: Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block requests are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the sequence?
87 | Page
A. 3 B. 18 C. 20 D. 30
Also, calculate the hit ratio and miss ratio. Solution- We have,
•
There are 8 blocks in cache memory numbered from 0 to 7.
•
In direct mapping, a particular block of main memory is mapped to a particular line of cache memory.
•
The line number is given by-
Cache line number = Block address modulo Number of lines in cache
For the given sequence-
•
Requests for memory blocks are generated one by one.
•
The line number of the block is calculated using the above relation.
•
Then, the block is placed in that particular line.
•
If already there exists another block in that line, then it is replaced.
88 | Page
Thus,
•
Out of given options, only block-18 is not present in the main memory.
•
Option-(B) is correct.
•
Hit ratio = 3 / 20
•
Miss ratio = 17 / 20
89 | Page
Problem-08: Consider a fully associative cache with 8 cache blocks (0-7). The memory block requests are in the order-
4, 3, 25, 8, 19, 6, 25, 8, 16, 35, 45, 22, 8, 3, 16, 25, 7
If LRU replacement policy is used, which cache block will have memory block 7? Also, calculate the hit ratio and miss ratio. Solution- We have,
•
There are 8 blocks in cache memory numbered from 0 to 7.
•
In fully associative mapping, any block of main memory can be mapped to any line of the cache that is freely available.
•
If all the cache lines are already occupied, then a block is replaced in accordance with the replacement policy.
Thus,
•
Line-5 contains the block-7.
90 | Page
•
Hit ratio = 5 / 17
•
Miss ratio = 12 / 17
Problem-09: Consider a 4-way set associative mapping with 16 cache blocks. The memory block requests are in the order-
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
If LRU replacement policy is used, which cache block will not be present in the cache?
A. 3 B. 8 C. 129 D. 216
Also, calculate the hit ratio and miss ratio. Solution- We have,
•
There are 16 blocks in cache memory numbered from 0 to 15.
•
Each set contains 4 cache lines.
•
In set associative mapping, a particular block of main memory is mapped to a particular set of cache memory.
•
The set number is given by-
Cache line number = Block address modulo Number of sets in cache
For the given sequence-
•
Requests for memory blocks are generated one by one.
•
The set number of the block is calculated using the above relation.
•
Within that set, the block is placed in any freely available cache line.
•
If all the blocks are already occupied, then one of the block is replaced in accordance with the employed replacement policy.
91 | Page
Thus,
•
Out of given options, only block-216 is not present in the main memory.
•
Option-(D) is correct.
•
Hit ratio = 1 / 17
•
Miss ratio = 16 / 17
92 | Page
Problem-10: Consider a small 2-way set associative mapping with a total of 4 blocks. LRU replacement policy is used for choosing the block to be replaced. The number of cache misses for the following sequence of block addresses 8, 12, 0, 12, 8 is ____. Solution-
•
Practice yourself.
•
Total number of cache misses = 4
Problem-11: Consider the cache has 4 blocks. For the memory references-
5, 12, 13, 17, 4, 12, 13, 17, 2, 13, 19, 13, 43, 61, 19
What is the hit ratio for the following cache replacement algorithms-
A. FIFO B. LRU C. Direct mapping D. 2-way set associative mapping using LRU
Solution-
•
Practice yourself.
•
Using FIFO as cache replacement algorithm, hit ratio = 5/15 = 1/3.
•
Using LRU as cache replacement algorithm, hit ratio = 6/15 = 2/5.
•
Using direct mapping as cache replacement algorithm, hit ratio = 1/15.
•
Using 2-way set associative mapping as cache replacement algorithm, hit ratio = 5/15 = 1/3
93 | Page
Problem-12: A hierarchical memory system has the following specification, 20 MB main storage with access time of 300 ns, 256 bytes cache with access time of 50 ns, word size 4 bytes, page size 8 words. What will be the hit ratio if the page address trace of a program has the pattern 0, 1, 2, 3, 0, 1, 2, 4 following LRU page replacement technique? Solution- Given-
•
Main memory size = 20 MB
•
Main memory access time = 300 ns
•
Cache memory size = 256 bytes
•
Cache memory access time = 50 ns
•
Word size = 4 bytes
•
Page size = 8 words
Line Size- We have, Line size = 8 words = 8 x 4 bytes = 32 bytes Number of Lines in Cache- Number of lines in cache = Cache size / Line size = 256 bytes / 32 bytes = 8 lines
94 | Page
Thus,
•
Number of hits = 3
•
Hit ratio = 3 / 8
Problem-13: Consider an array A[100] and each element occupies 4 words. A 32 word cache is used and divided into 8 word blocks. What is the hit ratio for the following code-
for (i=0 ; i < 100 ; i++)
A[i] = A[i] + 10;
Solution- Number of lines in cache = Cache size / Line size = 32 words / 8 words = 4 lines
95 | Page
Since each element of the array occupies 4 words, so-
Number of elements that can be placed in one line = 2
Now, let us analyze the statement-
A[i] = A[i] + 10;
For each i,
•
Firstly, the value of A[i] is read.
•
Secondly, 10 is added to the A[i].
•
Thirdly, the updated value of A[i] is written back.
Thus,
•
For each i, A[i] is accessed two times.
•
Two memory accesses are required-one for read operation and other for write operation.
Assume the cache is all empty initially. In the first loop iteration,
•
First, value of A[0] has to be read.
•
Since cache is empty, so A[0] is brought in the cache.
•
Along with A[0], element A[1] also enters the cache since each block can hold 2 elements of the array.
96 | Page
•
Thus, For A[0], a miss occurred for the read operation.
•
Now, 10 is added to the value of A[0].
•
Now, the updated value has to be written back to A[0].
•
Again cache memory is accessed to get A[0] for writing its updated value.
•
This time, for A[0], a hit occurs for the write operation.
In the second loop iteration,
•
First, value of A[1] has to be read.
•
A[1] is already present in the cache.
•
Thus, For A[1]. a hit occurs for the read operation.
•
Now, 10 is added to the value of A[1].
•
Now, the updated value has to be written back to A[1].
•
Again cache memory is accessed to get A[1] for writing its updated value.
•
Again, for A[1], a hit occurs for the write operation.
In the similar manner, for every next two consecutive elements-
•
There will be a miss for the read operation for the first element.
•
There will be a hit for the write operation for the first element.
•
There will be a hit for both read and write operations for the second element.
Likewise, for 100 elements, we will have 50 such pairs in cache and in every pair, there will be one miss and three hits. Thus,
•
Total number of hits = 50 x 3 = 150
•
Total number of misses = 50 x 1 = 50
•
Total number of references = 200 (100 for read and 100 for write)
Thus,
•
Hit ratio = 150 / 200 = 3 / 4 = 0.75
•
Miss ratio = 50 / 200 = 1 / 4 = 0.25
97 | Page
Problem-14: Consider an array has 100 elements and each element occupies 4 words. A 32 bit word cache is used and divided into a block of 8 words. What is the hit ratio for the following statement-
for (i=0 ; i < 10 ; i++) for (j=0 ; j < 10 ; j++)
A[i][j] = A[i][j] + 10;
if-
1. Row major order is used 2. Column major order is used
Solution- According to question, cache is divided as-
In each cache line, two elements of the array can reside.
Case-01: When Row Major Order is Used-
98 | Page
In row major order, elements of the array are stored row wise in the memory as-
•
In the first iteration, when A[0][0] will be brought in the cache, A[0][1] will also be brought.
•
Then, things goes as it went in the previous question.
•
There will be a miss for A[0,0] read operation and hit for A[0,0] write operation.
•
For A[0,1], there will be a hit for both read and write operations.
•
Similar will be the story with every next two consecutive elements.
Thus,
•
Total number of hits = 50 x 3 = 150
•
Total number of misses = 50 x 1 = 50
•
Total number of references = 200 (100 for read and 100 for write)
Thus,
•
Hit ratio = 150 / 200 = 3 / 4 = 0.75
•
Miss ratio = 50 / 200 = 1 / 4 = 0.25
99 | Page
Case-02: When Column Major Order is Used- In column major order, elements of the array are stored column wise in the memory as-
•
In the first iteration, when A[0,0] will be brought in the cache, A[1][0] will also be brought.
•
This time A[1][0] will not be useful in iteration-2.
•
A[1][0] will be needed after surpassing 10 element.
•
But by that time, this block would have got replaced since there are only 4 lines in the cache.
•
For A[1][0] to be useful, there has to be at least 10 lines in the cache.
Thus, Under given scenario, for each element of the array-
•
There will be a miss for the read operation.
100 | Page
•
There will be a hit for the write operation.
Thus,
•
Total number of hits = 100 x 1 = 100
•
Total number of misses = 100 x 1 = 100
•
Total number of references = 200 (100 for read and 100 for write)
Thus,
•
Hit ratio = 100 / 200 = 1 / 2 = 0.50
•
Miss ratio = 100 / 200 = 1 / 2 = 0.50
To increase the hit rate, following measures can be taken-
•
Row major order can be used (Hit rate = 75%)
•
The statement A[i][j] = A[i][j] + 10 can be replaced with A[j,i] = A[j][i] + 10 (Hit rate = 75%)
Problem-15: An access sequence of cache block addresses is of length N and contains n unique block addresses. The number of unique block addresses between two consecutive accesses to the same block address is bounded above by k. What is the miss ratio if the access sequence is passed through a cache of associativity A>=k exercising LRU replacement policy?
A. n/N B. 1/N C. 1/A D. k/n
Solution- Required miss ratio = n/N. Thus, Option (A) is correct. To watch video solutions and practice more problems,
Watch the Video8A & Video8B Lecture Self Study
Learning
101 | Page
Cache Memory- We have discussed-
•
Cache memory is a random access memory.
•
It lies on the path between the processor and the main memory.
•
It bridges the speed mismatch between the fastest processor and the slower main memory.
Cache Lines-
Cache memory is divided into equal size partitions called as cache lines.
•
While designing a computer’s cache system, the size of cache lines is an important parameter.
•
The size of cache line affects a lot of parameters in the caching system.
The following results discuss the effect of changing the cache block (or line) size in a caching system. Result-01: Effect of Changing Block Size on Spatial Locality-
The larger the block size, better will be the spatial locality.
Explanation- Keeping the cache size constant, we have- Case-01: Decreasing the Block Size-
102 | Page
•
A smaller block size will contain a smaller number of near by addresses in it.
•
Thus, only smaller number of near by addresses will be brought into the cache.
•
This increases the chances of cache miss which reduces the exploitation of spatial locality.
•
Thus, smaller is the block size, inferior is the spatial locality.
•
Case-02: Increasing the Block Size-
•
A larger block size will contain a larger number of near by addresses in it.
•
Thus, larger number of near by addresses will be brought into the cache.
•
This increases the chances of cache hit which increases the exploitation of spatial locality.
•
Thus, larger is the block size, better is the spatial locality.
Result-02: Effect of Changing Block Size On Cache Tag in Direct Mapped Cache-
In direct mapped cache, block size does not affect the cache tag anyhow.
Explanation- Keeping the cache size constant, we have- Case-01: Decreasing the Block Size-
•
Decreasing the block size increases the number of lines in cache.
•
With the decrease in block size, the number of bits in block offset decreases.
•
However, with the increase in the number of cache lines, number of bits in line number increases.
•
So, number of bits in line number + number of bits in block offset = remains constant.
•
Thus, there is no effect on the cache tag.
103 | Page
Example-
Case-02: Increasing the Block Size-
•
Increasing the block size decreases the number of lines in cache.
•
With the increase in block size, the number of bits in block offset increases.
•
However, with the decrease in the number of cache lines, number of bits in line number decreases.
•
Thus, number of bits in line number + number of bits in block offset = remains constant.
•
Thus, there is no effect on the cache tag.
Example-
104 | Page
Result-03: Effect of Changing Block Size On Cache Tag in Fully Associative Cache-
In fully associative cache, on decreasing block size, cache tag is reduced and
vice versa.
Explanation- Keeping the cache size constant, we have- Case-01: Decreasing the Block Size-
•
Decreasing the block size decreases the number of bits in block offset.
•
With the decrease in number of bits in block offset, number of bits in tag increases.
Case-02: Increasing the Block Size-
105 | Page
•
Increasing the block size increases the number of bits in block offset.
•
With the increase in number of bits in block offset, number of bits in tag decreases.
Result-04: Effect of Changing Block Size On Cache Tag in Set Associative Cache-
In set associative cache, block size does not affect cache tag anyhow.
Explanation- Keeping the cache size constant, we have- Case-01: Decreasing the Block Size-
•
Decreasing the block size increases the number of lines in cache.
•
With the decrease in block size, number of bits in block offset decreases.
•
With the increase in the number of cache lines, number of sets in cache increases.
•
With the increase in number of sets in cache, number of bits in set number increases.
•
So, number of bits in set number + number of bits in block offset = remains constant.
•
Thus, there is no effect on the cache tag.
Example-
106 | Page
Case-02: Increasing the Block Size-
•
Increasing the block size decreases the number of lines in cache.
•
With the increase in block size, number of bits in block offset increases.
•
With the decrease in the number of cache lines, number of sets in cache decreases.
•
With the decrease in number of sets in cache, number of bits in set number decreases.
•
So, number of bits in set number + number of bits in block offset = remains constant.
•
Thus, there is no effect on the cache tag.
Example-
107 | Page
Result-05: Effect of Changing Block Size On Cache Miss Penalty-
A smaller cache block incurs a lower cache miss penalty.
Explanation-
•
When a cache miss occurs, block containing the required word has to be brought from the main memory.
•
If the block size is small, then time taken to bring the block in the cache will be less.
•
Hence, less miss penalty will incur.
•
But if the block size is large, then time taken to bring the block in the cache will be more.
•
Hence, more miss penalty will incur.
108 | Page
Result-06: Effect of Cache Tag On Cache Hit Time-
A smaller cache tag ensures a lower cache hit time.
Explanation-
•
Cache hit time is the time required to find out whether the required block is in cache or not.
•
It involves comparing the tag of generated address with the tag of cache lines.
•
Smaller is the cache tag, lesser will be the time taken to perform the comparisons.
•
Hence, smaller cache tag ensures lower cache hit time.
•
On the other hand, larger is the cache tag, more will be time taken to perform the comparisons.
•
Thus, larger cache tag results in higher cache hit time.
PRACTICE PROBLEM BASED ON CACHE LINE- Problem- In designing a computer’s cache system, the cache block or cache line size is an important parameter. Which of the following statements is correct in this context?
A. A smaller block size implies better spatial locality B. A smaller block size implies a smaller cache tag and hence lower cache tag overhead C. A smaller block size implies a larger cache tag and hence lower cache hit time D. A smaller bock size incurs a lower cache miss penalty
Solution- Option (D) is correct. (Result-05) Reasons-
109 | Page
Option (A) is incorrect because-
•
Smaller block does not imply better spatial locality.
•
Always, Larger the block size, better is the spatial locality.
Option (B) is incorrect because-
•
In direct mapped cache and set associative cache, there is no effect of changing block size on cache tag.
•
In fully associative mapped cache, on decreasing block size, cache tag becomes larger.
•
Thus, smaller block size does not imply smaller cache tag in any cache organization.
Option (C) is incorrect because-
•
“A smaller block size implies a larger cache tag” is true only for fully associative mapped cache.
•
Larger cache tag does not imply lower cache hit time rather cache hit time is increased.
110 | Page
Magnetic Disk in Computer Architecture- In computer architecture,
•
Magnetic disk is a storage device that is used to write, rewrite and access data.
•
It uses a magnetization process.
Architecture-
•
The entire disk is divided into platters.
•
Each platter consists of concentric circles called as tracks.
•
These tracks are further divided into sectors which are the smallest divisions in the disk.
•
A cylinder is formed by combining the tracks at a given radius of a disk pack.
111 | Page
•
There exists a mechanical arm called as Read / Write head.
•
It is used to read from and write to the disk.
•
Head has to reach at a particular track and then wait for the rotation of the platter.
•
The rotation causes the required sector of the track to come under the head.
•
Each platter has 2 surfaces- top and bottom and both the surfaces are used to store the data.
•
Each surface has its own read / write head.
112 | Page
Disk Performance Parameters- The time taken by the disk to complete an I/O request is called as disk service time or disk access time. Components that contribute to the service time are-
113 | Page
1. Seek time 2. Rotational latency 3. Data transfer rate 4. Controller overhead 5. Queuing delay
1. Seek Time-
•
The time taken by the read / write head to reach the desired track is called as seek time.
•
It is the component which contributes the largest percentage of the disk service time.
•
The lower the seek time, the faster the I/O operation.
Specifications
Seek time specifications include-
1. Full stroke 2. Average 3. Track to Track
1. Full Stroke-
114 | Page
•
It is the time taken by the read / write head to move across the entire width of the disk from the innermost track to the outermost track
2. Average-
•
It is the average time taken by the read / write head to move from one random track to another.
Average seek time = 1 / 3 x Full stroke
3. Track to Track-
•
It is the time taken by the read-write head to move between the adjacent tracks.
2. Rotational Latency-
•
The time taken by the desired sector to come under the read / write head is called as rotational latency.
•
It depends on the rotation speed of the spindle.
Average rotational latency = 1 / 2 x Time taken for full rotation
3. Data Transfer Rate-
•
The amount of data that passes under the read / write head in a given amount of time is called as data transfer rate.
•
The time taken to transfer the data is called as transfer time.
115 | Page
It depends on the following factors-
1. Number of bytes to be transferred 2. Rotation speed of the disk 3. Density of the track 4. Speed of the electronics that connects the disk to the computer
4. Controller Overhead-
•
The overhead imposed by the disk controller is called as controller overhead.
•
Disk controller is a device that manages the disk.
5. Queuing Delay-
•
The time spent waiting for the disk to become free is called as queuing delay.
NOTE-
All the tracks of a disk have the same storage capacity.
Storage Density-
•
All the tracks of a disk have the same storage capacity.
•
This is because each track has different storage density.
•
Storage density decreases as we from one track to another track away from the center.
Thus,
•
Innermost track has maximum storage density.
•
Outermost track has minimum storage density.
Important Formulas-
116 | Page
1. Disk Access Time- Disk access time is calculated as-
Disk access time
= Seek time + Rotational delay + Transfer time + Controller overhead +
Queuing delay
2. Average Disk Access Time- Average disk access time is calculated as-
Average disk access time
= Average seek time + Average rotational delay + Transfer time +
Controller overhead + Queuing delay
3. Average Seek Time- Average seek time is calculated as-
Average seek time
= 1 / 3 x Time taken for one full stroke
Alternatively, If time taken by the head to move from one track to adjacent track = t units and there are total k tracks, then- Average seek time
117 | Page
= { Time taken to move from track 1 to track 1 + Time taken to move from track 1 to last track } / 2 = { 0 + (k-1)t } / 2 = (k-1)t / 2 4. Average Rotational Latency- Average rotational latency is calculated as-
Average rotational latency
= 1 / 2 x Time taken for one full rotation
Average rotational latency may also be referred as-
•
Average rotational delay
•
Average latency
•
Average delay
5. Capacity Of Disk Pack- Capacity of a disk pack is calculated as-
Capacity of a disk pack
= Total number of surfaces x Number of tracks per surface x Number of
sectors per track x Storage capacity of one sector
Formatting overhead
= Number of sectors x Overhead per sector
7. Formatted Disk Space- Formatted disk space also called as usable disk space is the disk space excluding formatting overhead. It is calculated as-
Formatted disk space
= Total disk space or capacity – Formatting overhead
8. Recording Density Or Storage Density- Recording density or Storage density is calculated as-
Storage density of a track
= Capacity of the track / Circumference of the track
From here, we can infer-
Storage density of a track ∝ 1 / Circumference of the track
9. Track Capacity- Capacity of a track is calculated as-
119 | Page
Capacity of a track
= Recording density of the track x Circumference of the track
10. Data Transfer Rate- Data transfer rate is calculated as-
Data transfer rate
= Number of heads x Bytes that can be read in one full rotation x
Number of rotations in one second
OR
Data transfer rate
= Number of heads x Capacity of one track x Number of rotations in one
second
11. Tracks Per Surface- Total number of tracks per surface is calculated as-
Total number of tracks per surface
= (Outer radius – Inner radius) / Inter track gap
120 | Page
Points to Remember-
•
The entire disk space is not usable for storage because some space is wasted in formatting.
•
When rotational latency is not given, use average rotational latency for solving numerical problems.
•
When seek time is not given, use average seek time for solving numerical problems.
•
It is wrong to say that as we move from one track to another away from the center, the capacity increases.
•
All the tracks have same storage capacity.
To gain better understanding about magnetic disk-
Watch Video9 Lecture Self Study Learning
121 | Page
Magnetic Disk in Computer Architecture- We have discussed-
•
Magnetic disk is divided into platters which are further divided into tracks and sectors.
•
Read / write head is a mechanical arm that is used to write to and read from the disk.
•
Various important formulas.
Next, we will discuss practice problems based on magnetic disk.
PRACTICE PROBLEMS BASED ON MAGNETIC DISK- Problem-01: Consider a disk pack with the following specifications- 16 surfaces, 128 tracks per surface, 256 sectors per track and 512 bytes per sector. Answer the following questions-
1. What is the capacity of disk pack? 2. What is the number of bits required to address the sector? 3. If the format overhead is 32 bytes per sector, what is the formatted disk space? 4. If the format overhead is 64 bytes per sector, how much amount of memory is lost due to
formatting?
5. If the diameter of innermost track is 21 cm, what is the maximum recording density? 6. If the diameter of innermost track is 21 cm with 2 KB/cm, what is the capacity of one
track?
7. If the disk is rotating at 3600 RPM, what is the data transfer rate? 8. If the disk system has rotational speed of 3000 RPM, what is the average access time
with a seek time of 11.5 msec?
Solution- Given-
122 | Page
•
Number of surfaces = 16
•
Number of tracks per surface = 128
•
Number of sectors per track = 256
•
Number of bytes per sector = 512 bytes
Part-01: Capacity of Disk Pack- Capacity of disk pack = Total number of surfaces x Number of tracks per surface x Number of sectors per track x Number of bytes per sector = 16 x 128 x 256 x 512 bytes = 2
28
bytes
= 256 MB Part-02: Number of Bits Required To Address Sector- Total number of sectors = Total number of surfaces x Number of tracks per surface x Number of sectors per track = 16 x 128 x 256 sectors = 2
19
sectors
Thus, Number of bits required to address the sector = 19 bits Part-03: Formatted Disk Space- Formatting overhead = Total number of sectors x overhead per sector = 2
19
x 32 bytes
= 2
19
x 2
5
bytes
= 2
24
bytes
= 16 MB
123 | Page
Now, Formatted disk space = Total disk space – Formatting overhead = 256 MB – 16 MB = 240 MB Part-04: Formatting Overhead- Amount of memory lost due to formatting = Formatting overhead = Total number of sectors x Overhead per sector = 2
19
x 64 bytes
= 2
19
x 2
6
bytes
= 2
25
bytes
= 32 MB Part-05: Maximum Recording Density- Storage capacity of a track = Number of sectors per track x Number of bytes per sector = 256 x 512 bytes = 2
8
x 2
9
bytes
= 2
17
bytes
= 128 KB Circumference of innermost track = 2 x π x radius = π x diameter = 3.14 x 21 cm = 65.94 cm
124 | Page
Now, Maximum recording density = Recording density of innermost track = Capacity of a track / Circumference of innermost track = 128 KB / 65.94 cm = 1.94 KB/cm Part-06: Capacity Of Track- Circumference of innermost track = 2 x π x radius = π x diameter = 3.14 x 21 cm = 65.94 cm Capacity of a track = Storage density of the innermost track x Circumference of the innermost track = 2 KB/cm x 65.94 cm = 131.88 KB ≅ 132 KB Part-07: Data Transfer Rate- Number of rotations in one second = (3600 / 60) rotations/sec = 60 rotations/sec Now, Data transfer rate = Number of heads x Capacity of one track x Number of rotations in one second = 16 x (256 x 512 bytes) x 60
125 | Page
= 2
4
x 2
8
x 2
9
x 60 bytes/sec
= 60 x 2
21
bytes/sec
= 120 MBps Part-08: Average Access Time- Time taken for one full rotation = (60 / 3000) sec = (1 / 50) sec = 0.02 sec = 20 msec Average rotational delay = 1/2 x Time taken for one full rotation = 1/2 x 20 msec = 10 msec Now, average access time = Average seek time + Average rotational delay + Other factors = 11.5 msec + 10 msec + 0 = 21.5 msec Problem-02: What is the average access time for transferring 512 bytes of data with the following specifications-
•
Average seek time = 5 msec
•
Disk rotation = 6000 RPM
•
Data rate = 40 KB/sec
•
Controller overhead = 0.1 msec
126 | Page
Solution- Given-
•
Average seek time = 5 msec
•
Disk rotation = 6000 RPM
•
Data rate = 40 KB/sec
•
Controller overhead = 0.1 msec
Time Taken For One Full Rotation- Time taken for one full rotation = (60 / 6000) sec = (1 / 100) sec = 0.01 sec = 10 msec Average Rotational Delay- Average rotational delay = 1/2 x Time taken for one full rotation = 1/2 x 10 msec = 5 msec Transfer Time- Transfer time = (512 bytes / 40 KB) sec = 0.0125 sec = 12.5 msec
127 | Page
Average Access Time- Average access time = Average seek time + Average rotational delay + Transfer time + Controller overhead + Queuing delay = 5 msec + 5 msec + 12.5 msec + 0.1 msec + 0 = 22.6 msec Problem-03: A certain moving arm disk storage with one head has the following specifications-
•
Number of tracks per surface = 200
•
Disk rotation speed = 2400 RPM
•
Track storage capacity = 62500 bits
•
Average latency = P msec
•
Data transfer rate = Q bits/sec
What is the value of P and Q? Solution- Given-
•
Number of tracks per surface = 200
•
Disk rotation speed = 2400 RPM
•
Track storage capacity = 62500 bits
Time Taken For One Full Rotation- Time taken for one full rotation = (60 / 2400) sec = (1 / 40) sec = 0.025 sec = 25 msec
128 | Page
Average Latency- Average latency or Average rotational latency = 1/2 x Time taken for one full rotation = 1/2 x 25 msec = 12.5 msec Data Transfer Rate- Data transfer rate = Number of heads x Capacity of one track x Number of rotations in one second = 1 x 62500 bits x (2400 / 60) = 2500000 bits/sec = 2.5 x 10
6
bits/sec
Thus, P = 12.5 and Q = 2.5 x 10
6
Problem-04: A disk pack has 19 surfaces and storage area on each surface has an outer diameter of 33 cm and inner diameter of 22 cm. The maximum recording storage density on any track is 200 bits/cm and minimum spacing between tracks is 0.25 mm. Calculate the capacity of disk pack. Solution- Given-
•
Number of surfaces = 19
•
Outer diameter = 33 cm
•
Inner diameter = 22 cm
129 | Page
•
Maximum recording density = 200 bits/cm
•
Inter track gap = 0.25 mm
Number Of Tracks On Each Surface- Number of tracks on each surface = (Outer radius – Inner radius) / Inter track gap = (16.5 cm – 11 cm) / 0.25 mm = 5.5 cm / 0.25 mm = 55 mm / 0.25 mm = 220 tracks Capacity Of Each Track- Capacity of each track = Maximum recording density x Circumference of innermost track = 200 bits/cm x (3.14 x 22 cm) = 200 x 69.08 bits = 13816 bits = 1727 bytes Capacity Of Disk Pack- Capacity of disk pack = Total number of surfaces x Number of tracks per surface x Capacity of one track = 19 x 220 x 1727 bytes = 7218860 bytes = 6.88 MB
130 | Page
Problem-05: Consider a typical disk that rotates at 15000 RPM and has a transfer rate of 50 x 10
6
bytes/sec. If the average seek time of the disk is twice the average rotational delay and
the controller’s transfer time is 10 times the disk transfer time. What is the average time (in milliseconds) to read or write a 512 byte sector of the disk? Solution- Given-
•
Rotation speed of the disk = 15000 RPM
•
Transfer rate = 50 x 10
6
bytes/sec
•
Average seek time = 2 x Average rotational delay
•
Controller’s transfer time = 10 x Disk transfer time
Time Taken For One Full Rotation- Time taken for one full rotation = (60 / 15000) sec = 0.004 sec = 4 msec Average Rotational Delay- Average rotational delay = 1/2 x Time taken for one full rotation = 1/2 x 4 msec = 2 msec Average Seek Time- Average seek time
131 | Page
= 2 x Average rotational delay = 2 x 2 msec = 4 msec Disk Transfer Time- Disk transfer time = Time taken to read or write 512 bytes = 512 bytes / (50 x 10
6
bytes/sec)
= 10.24 x 10
-6
sec
= 0.01024 msec Controller’s Transfer Time- Controller’s transfer time = 10 x Disk transfer time = 10 x 0.01024 msec = 0.1024 msec Average Time To Read Or Write 512 Bytes- Average time to read or write 512 bytes = Average seek time + Average rotational delay + Disk transfer time + Controller’s transfer time + Queuing delay = 4 msec + 2 msec + 0.01024 msec + 0.1024 msec + 0 = 6.11 msec
132 | Page
Problem-06: A hard disk system has the following parameters-
•
Number of tracks = 500
•
Number of sectors per track = 100
•
Number of bytes per sector = 500
•
Time taken by the head to move from one track to another adjacent track = 1 msec
•
Rotation speed = 600 RPM
What is the average time taken for transferring 250 bytes from the disk? Solution- Given-
•
Number of tracks = 500
•
Number of sectors per track = 100
•
Number of bytes per sector = 500
•
Time taken by the head to move from one track to another adjacent track = 1 msec
•
Rotation speed = 600 RPM
Average Seek Time- Average seek time = (Time taken by the head to move from track-1 to track-1 + Time taken by the head to move from track-1 to track-500) / 2 = (0 + 499 x 1 msec) / 2 = 249.5 msec Time Taken For One Full Rotation- Time taken for one full rotation = (60 / 600) sec = 0.1 sec
133 | Page
= 100 msec Average Rotational Delay- Average rotational delay = 1/2 x Time taken for one full rotation = 1/2 x 100 msec = 50 msec Capacity Of One Track- Capacity of one track = Number of sectors per track x Number of bytes per sector = 100 x 500 bytes = 50000 bytes Data Transfer Rate- Data transfer rate = Number of heads x Capacity of one track x Number of rotations in one second = 1 x 50000 bytes x (600 / 60) = 50000 x 10 bytes/sec = 5 x 10
5
bytes/sec
Transfer Time- Transfer time = (250 bytes / 5 x 10
5
bytes) sec
= 50 x 10
-5
sec
= 0.5 msec
134 | Page
Average Time Taken To Transfer 250 Bytes- Average time taken to transfer 250 bytes = Average seek time + Average rotational delay + Transfer time + Controller overhead + Queuing delay = 249.5 msec + 50 msec + 0.5 msec + 0 + 0 = 300 msec Problem-07: A hard disk has 63 sectors per track, 10 platters each with 2 recording surfaces and 1000 cylinders. The address of a sector is given as a triple (c, h, s) where c is the cylinder number, h is the surface number and s is the sector number. Thus, the 0th sector is addressed as (0,0,0), the 1st sector as (0,0,1) and so on. Part-01: The address <400, 16, 29> corresponds to sector number-
A. 505035 B. 505036 C. 505037 D. 505038
Part-02: The address of 1039 sector is-
A. <0, 15, 31> B. <0, 16, 30> C. <0, 16, 31> D. <0, 17, 31>
135 | Page
Solution-
Know this Concept?
In general, when counting of items is started from 0, then-
•
For any item-n, number ‘n’ specifies the number of items that must be crossed in order to reach that item.
Example- If counting is started from 0, then-
•
To reach cylinder-5, the number of cylinders that must be crossed = 5 cylinders
•
To reach surface-5, the number of surfaces that must be crossed = 5 surfaces
•
To reach sector-5, the number of sectors that must be crossed = 5 sectors
To solve this question, we assume there is only one track on each surface. Part-01: We have to calculate the sector number for the address <400, 16, 29> Step-01: To reach our desired cylinder, we have to cross 400 cylinders. Total number of sectors that are crossed in 400 cylinders = Number of cylinders x Number of surfaces per cylinder x Number of tracks per surface x Number of sectors per track = 400 x (10 x 2) x 1 x 63 = 504000
136 | Page
Now, after crossing 400 cylinders (cylinder-0 to cylinder-399), we are at cylinder-400. Step-02: To reach our desired surface, we have to cross 16 surfaces. Total number of sectors that are crossed in 16 surfaces = Number of surfaces x Number of tracks per surface x Number of sectors per track = 16 x 1 x 63 = 1008 Now, after crossing 16 surfaces (surface-0 to surface-15) in cylinder-400, we are at surface-16. Step-03: To reach our desired sector, we have to cross 29 sectors. Now, after crossing 29 sectors on surface-16 of cylinder-400, we are at sector-29. Thus Total number of sectors that are crossed = 504000 + 1008 + 29 = 505037 Thus,
•
After crossing 505037 sectors, we are at sector-505037.
•
So, required address of the sector is 505037.
•
Option (C) is correct.
137 | Page
Part-02: We have to find the address of the sector-2039. Let us check all the options one by one. Option-A: For the address <0, 15, 31>, the sector number is- Sector number = 0 + (15 x 1 x 63) + 31 = 976 Option-B: For the address <0, 16, 30>, the sector number is- Sector number = 0 + (16 x 1 x 63) + 30 = 1038 Option-C: For the address <0, 16, 31>, the sector number is- Sector number = 0 + (16 x 1 x 63) + 31 = 1039 Option-D: For the address <0, 17, 31>, the sector number is- Sector number = 0 + (17 x 1 x 63) + 31 = 1102 Thus, Option (C) is correct. To gain better understanding about Disk Calculation problems, Watch Video10A, Video10B, Video10C, Video10D Lectures Self Study Learning
138 | Page
Addressing Modes-
The different ways of specifying the location of an operand in an instruction are
called as addressing modes.
Types of Addressing Modes- In computer architecture, there are following types of addressing modes-
Next, we will discuss about these addressing modes in detail.
1. Implied Addressing Mode- In this addressing mode,
•
The definition of the instruction itself specify the operands implicitly.
•
It is also called as implicit addressing mode.
Examples-
•
The instruction “Complement Accumulator” is an implied mode instruction.
•
In a stack organized computer, Zero Address Instructions are implied mode instructions.
(since operands are always implied to be present on the top of the stack)
2. Stack Addressing Mode- In this addressing mode,
•
The operand is contained at the top of the stack.
•
Example-
140 | Page
ADD
•
This instruction simply pops out two symbols contained at the top of the stack.
•
The addition of those two operands is performed.
•
The result so obtained after addition is pushed again at the top of the stack.
3. Immediate Addressing Mode- In this addressing mode,
•
The operand is specified in the instruction explicitly.
•
Instead of address field, an operand field is present that contains the operand.
Examples-
•
ADD 10 will increment the value stored in the accumulator by 10.
•
MOV R #20 initializes register R to a constant value 20.
4. Direct Addressing Mode- In this addressing mode,
•
The address field of the instruction contains the effective address of the operand.
•
Only one reference to memory is required to fetch the operand.
•
It is also called as absolute addressing mode.
141 | Page
Example-
•
ADD X will increment the value stored in the accumulator by the value stored at memory location X.
AC ← AC + [X]
5. Indirect Addressing Mode- In this addressing mode,
•
The address field of the instruction specifies the address of memory location that contains the effective address of the operand.
•
Two references to memory are required to fetch the operand.
142 | Page
Example-
•
ADD X will increment the value stored in the accumulator by the value stored at memory location specified by X.
AC ← AC + [[X]]
6. Register Direct Addressing Mode- In this addressing mode,
•
The operand is contained in a register set.
•
The address field of the instruction refers to a CPU register that contains the operand.
•
No reference to memory is required to fetch the operand.
143 | Page
Example-
•
ADD R will increment the value stored in the accumulator by the content of register R.
AC ← AC + [R]
NOTE- It is interesting to note-
•
This addressing mode is similar to direct addressing mode.
•
The only difference is address field of the instruction refers to a CPU register instead of main memory.
7. Register Indirect Addressing Mode- In this addressing mode,
•
The address field of the instruction refers to a CPU register that contains the effective address of the operand.
•
Only one reference to memory is required to fetch the operand.
144 | Page
Example-
•
ADD R will increment the value stored in the accumulator by the content of memory location specified in register R.
AC ← AC + [[R]]
NOTE- It is interesting to note-
•
This addressing mode is similar to indirect addressing mode.
•
The only difference is address field of the instruction refers to a CPU register.
8. Relative Addressing Mode- In this addressing mode,
•
Effective address of the operand is obtained by adding the content of program counter with the address part of the instruction.
Effective Address
= Content of Program Counter + Address part of the instruction
145 | Page
NOTE-
•
Program counter (PC) always contains the address of the next instruction to be executed.
•
After fetching the address of the instruction, the value of program counter immediately increases.
•
The value increases irrespective of whether the fetched instruction has completely executed or not.
9. Indexed Addressing Mode- In this addressing mode,
•
Effective address of the operand is obtained by adding the content of index register with the address part of the instruction.
146 | Page
Effective Address
= Content of Index Register + Address part of the instruction
10. Base Register Addressing Mode- In this addressing mode,
•
Effective address of the operand is obtained by adding the content of base register with the address part of the instruction.
Effective Address
= Content of Base Register + Address part of the instruction
147 | Page
11. Auto-Increment Addressing Mode-
•
This addressing mode is a special case of Register Indirect Addressing Mode where-
Effective Address of the Operand
= Content of Register
In this addressing mode,
•
After accessing the operand, the content of the register is automatically incremented by step size ‘d’.
•
Step size ‘d’ depends on the size of operand accessed.
•
Only one reference to memory is required to fetch the operand.
Example-
148 | Page
Assume operand size = 2 bytes Here,
•
After fetching the operand 6B, the instruction register R
AUTO
will be automatically
incremented by 2.
•
Then, updated value of R
AUTO
will be 3300 + 2 = 3302.
•
At memory address 3302, the next operand will be found.
NOTE- In auto-increment addressing mode,
•
First, the operand value is fetched.
•
Then, the instruction register R
AUTO
value is incremented by step size ‘d’.
12. Auto-Decrement Addressing Mode-
149 | Page
•
This addressing mode is again a special case of Register Indirect Addressing Mode where-
Effective Address of the Operand
= Content of Register – Step Size
In this addressing mode,
•
First, the content of the register is decremented by step size ‘d’.
•
Step size ‘d’ depends on the size of operand accessed.
•
After decrementing, the operand is read.
•
Only one reference to memory is required to fetch the operand.
Example-
Assume operand size = 2 bytes. Here,
150 | Page
•
First, the instruction register R
AUTO
will be decremented by 2.
•
Then, updated value of R
AUTO
will be 3302 – 2 = 3300.
•
At memory address 3300, the operand will be found.
NOTE- In auto-decrement addressing mode,
•
First, the instruction register R
AUTO
value is decremented by step size ‘d’.
•
Then, the operand value is fetched.
Applications of Addressing Modes-
Addressing Modes
Applications
Immediate Addressing Mode
•
To initialize registers to a constant value
Direct Addressing Mode
and
Register Direct Addressing Mode
•
To access static data
•
To implement variables
Indirect Addressing Mode
and
Register Indirect Addressing Mode
•
To implement pointers because pointers are memory locations that store the address of another variable
•
To pass array as a parameter because array name is the base address and pointer is needed to point the address
Relative Addressing Mode
•
For program relocation at run time i.e. for position independent code
•
To change the normal sequence of execution of instructions
•
For branch type instructions since it directly updates the program counter
151 | Page
Index Addressing Mode
•
For array implementation or array addressing
•
For records implementation
Base Register Addressing Mode
•
For writing relocatable code i.e. for relocation of program in memory even at run time
•
For handling recursive procedures
Auto-increment Addressing Mode
and
Auto-decrement Addressing Mode
•
For implementing loops
•
For stepping through arrays in a loop
•
For implementing a stack as push and pop
To gain better understanding about Addressing Modes,
Watch the Video11 Lecture for Self Study Learning
152 | Page
Addressing Modes in Computer Architecture- We have discussed-
•
Addressing modes are the different ways of specifying the location of an operand.
•
Various kinds of addressing modes and their applications.
Next, we will discuss syntax used for addressing modes.
Syntax Of Addressing Modes-
•
The computer instructions may use different addressing modes.
•
Different addressing modes use different syntax for their representation.
The syntax used for addressing modes are discussed below- 1. Immediate Addressing Mode- Syntax- # Expression Interpretation-
Operand = Expression
Examples-
•
Load R1, #1000 is interpreted as R1 ← 1000
•
ADD R2, #3 is interpreted as R2 ← [R2] + 3
2. Direct Addressing Mode-
153 | Page
Syntax-
Constant
Interpretation-
Effective address of the operand = Constant
Operand = Content of the memory location ‘Constant’
Examples-
•
Load R1, 1000 is interpreted as R1 ← [1000]
•
ADD R2, 3 is interpreted as R2 ← [R2] + [3]
3. Register Direct Addressing Mode- Syntax-
R
n
or [R
n
]
Interpretation-
Operand = Content of the register R
n
Operand = [R
n
]
Examples-
•
Load R1, R2 is interpreted as R1 ← [R2]
•
ADD R1, R2 is interpreted as R1 ← [R1] + [R2]
154 | Page
4. Indirect Addressing Mode- Syntax-
@Expression or @(Expression) or (Expression)
Interpretation-
Effective address of the operand = Content of the memory location ‘expression’
Operand = [[ Expression]]
Examples-
•
Load R1, @1000 is interpreted as R1 ← [[1000]]
•
ADD R1, @(1000) is interpreted as R1 ← [R1] + [[1000]]
•
ADD R1, (1000) is interpreted as R1 ← [R1] + [[1000]]
5. Register Indirect Addressing Mode- Syntax-
@R
n
or @(R
n
) or (R
n
)
Interpretation-
Effective address of the operand = Content of the register R
n
Operand = [[ R
n
]]
Examples-
•
Load R1, @R2 is interpreted as R1 ← [[R1]]
•
ADD R1, @(R2) is interpreted as R1 ← [R1] + [[R2]]
•
ADD R1, (R2) is interpreted as R1 ← [R1] + [[R2]]
155 | Page
6. Displacement Addressing Mode- This addressing mode may be-
•
Relative addressing mode
•
Index addressing mode
•
Base register addressing mode
Syntax-
disp (R
n
)
Interpretation-
Effective address of the operand = disp + Content of the register R
n
Operand = [disp + [ R
n
]]
Examples-
•
Load R1, 100(R2) is interpreted as R1 ← [100 + [R1]]
•
ADD R1, 100(R2) is interpreted as R1 ← [R1] + [100 + [R2]]
7. Auto-Increment Addressing Mode- Syntax-
(R
n
)+
Interpretation-
Effective address of the operand = Content of the register R
n
Operand = [[ R
n
]]
After fetching the operand, increment [R
n
] by step size ‘d’
156 | Page
Examples-
•
Load R1, (R2)+ is interpreted as R1 ← [[R2]] followed by R2 ← [R2] + d
•
ADD R1, (R2)+ is interpreted as R1 ← [R1] + [[R2]] followed by R2 ← [R2] + d
8. Auto-Decrement Addressing Mode- Syntax-
-(R
n
)
Interpretation-
Before fetching the operand, decrement [R
n
] by step size ‘d’
Then, Effective address of the operand = Content of the register R
n
Operand = [[ R
n
]]
Examples-
•
Load R1, -(R2) is interpreted as R2 ← [R2] – d followed by R1 ← [[R2]]
•
ADD R1, -(R2) is interpreted as R2 ← [R2] – d followed by R1 ← [R1] + [[R2]]
To gain better understanding about Syntax of Addressing Modes,
Watch the Video12A, Video12B & Video12C Lecture for Self
Study learning
157 | Page
Addressing Modes in Computer Architecture- We have discussed-
•
Addressing modes are the different ways of specifying the location of an operand.
•
Various kinds of addressing modes and their applications.
Next, we will discuss practice problems based on addressing modes.
PRACTICE PROBLEMS BASED ON ADDRESSING MODES- Problem-01:
158 | Page
The most appropriate matching for the following pairs is- Column-1: X: Indirect addressing Y: Immediate addressing Z: Auto decrement addressing Column-2: 1. Loops 2. Pointers 3. Constants
A. X-3, Y-2, Z-1 B. X-1, Y-3, Z-2 C. X-2, Y-3, Z-1 D. X-3, Y-1, Z-2
Solution- Option (C) is correct. Problem-02: In the absolute addressing mode,
A. The operand is inside the instruction B. The address of the operand is inside the instruction C. The register containing the address of the operand is specified inside the instruction D. The location of the operand is implicit
Solution-
159 | Page
Option (B) is correct. Problem-03: Which of the following addressing modes are suitable for program relocation at run time?
1. Absolute addressing 2. Base addressing 3. Relative addressing 4. Indirect addressing
A. 1 and 4 B. 1 and 2 C. 2 and 3 D. 1, 2 and 4
Solution- Option (C) is correct. Problem-04: What is the most appropriate match for the items in the first column with the items in the second column- Column-1:
1. Array implementation 2. Writing relocatable code
160 | Page
3. Passing array as parameter
A. X-3, Y-1, Z-2 B. X-2, Y-3, Z-1 C. X-3, Y-2, Z-1 D. X-1, Y-3, Z-2
Solution- Option (A) is correct. Problem-05: Which of the following addressing modes permits relocation without any change whatsoever in the code?
A. Indirect addressing B. Indexed addressing C. Base register addressing D. PC relative addressing
Solution- Option (C) is correct. Problem-06: Consider a three word machine instruction-
ADD A[R
0
], @B
The first operand (destination) “A[R
0
]” uses indexed addressing mode with R
0
as the index
register. The second operand operand (source) “@B” uses indirect addressing mode. A and B are memory addresses residing at the second and the third words, respectively. The first word of the instruction specifies the opcode, the index register designation and the source
161 | Page
and destination addressing modes. During execution of ADD instruction, the two operands are added and stored in the destination (first operand). The number of memory cycles needed during the execution cycle of the instruction is-
A. 3 B. 4 C. 5 D. 6
Solution- For the first operand,
•
It uses indexed addressing mode.
•
Thus, one memory cycle will be needed to fetch the operand.
For the second operand,
•
It uses indirect addressing mode.
•
Thus, two memory cycles will be needed to fetch the operand.
After fetching the two operands,
•
The operands will be added and result is stored back in the memory.
•
Thus, one memory cycle will be needed to store the result.
Total number of memory cycles needed = 1 + 2 + 1 = 4 Thus, Option (B) is correct. To watch video solution, click here. Problem-07:
162 | Page
Consider a hypothetical processor with an instruction of type LW R
1
, 20(R
2
), which during
execution reads a 32-bit word from memory and stores it in a 32-bit register R
1
. The
effective address of the memory location is obtained by the addition of a constant 20 and the contents of register R
2
. Which of the following best reflects the addressing mode
implemented by this instruction for operand in memory?
A. Immediate Addressing B. Register Addressing C. Register Indirect Scaled Addressing D. Base Indexed Addressing
Solution- Clearly, the instruction uses base indexed addressing mode. Thus, Option (D) is correct. Problem-08: The memory locations 1000, 1001 and 1020 have data values 18, 1 and 16 respectively before the following program is executed.
MOVI
R
s
, 1
Move immediate
LOAD
R
d
, 1000(R
s
)
Load from memory
ADDI
R
d
, 1000
Add immediate
STOREI
0(R
d
), 20
Store immediate
Which of the statements below is TRUE after the program is executed?
A. Memory location 1000 has value 20 B. Memory location 1020 has value 20
163 | Page
C. Memory location 1021 has value 20 D. Memory location 1001 has value 20
Solution- Before the execution of program, the memory is-
Now, let us execute the program instructions one by one- Instruction-01: MOVI R
s
, 1
•
This instruction uses immediate addressing mode.
•
The instruction is interpreted as Rs ← 1.
•
Thus, value = 1 is moved to the register R
s
.
Instruction-02: LOAD R
d
, 1000(R
s
)
•
This instruction uses displacement addressing mode.
•
The instruction is interpreted as R
d
← [1000 + [R
s
]].
164 | Page
•
Value of the operand = [1000 + [R
s
]] = [1000 + 1] = [1001] = 1.
•
Thus, value = 1 is moved to the register R
d
.
Instruction-03: ADDI R
d
, 1000
•
This instruction uses immediate addressing mode.
•
The instruction is interpreted as R
d
← [R
d
] + 1000.
•
Value of the operand = [R
d
] + 1000 = 1 + 1000 = 1001.
•
Thus, value = 1001 is moved to the register R
d
.
Instruction-04: STOREI 0(R
d
), 20
•
This instruction uses displacement addressing mode.
•
The instruction is interpreted as 0 + [R
d
] ← 20.
•
Value of the destination address = 0 + [R
d
] = 0 + 1001 = 1001.
•
Thus, value = 20 is moved to the memory location 1001.
Thus,
•
After the program execution is completed, memory location 1001 has value 20.
•
Option (D) is correct.
To watch video solution, click here. Problem-09: Consider the following memory values and a one-address machine with an accumulator, what values do the following instructions load into accumulator?
•
Word 20 contains 40
•
Word 30 contains 50
•
Word 40 contains 60
•
Word 50 contains 70
Instructions are-
1. Load immediate 20 2. Load direct 20 3. Load indirect 20
165 | Page
4. Load immediate 30 5. Load direct 30 6. Load indirect 30
Solution- Instruction-01: Load immediate 20
•
This instruction uses immediate addressing mode.
•
The instruction is interpreted as Accumulator ← 20.
•
Thus, value 20 is loaded into the accumulator.
Instruction-02: Load direct 20
•
This instruction uses direct addressing mode.
•
The instruction is interpreted as Accumulator ← [20].
•
It is given that word 20 contains 40.
•
Thus, value 40 is loaded into the accumulator
Instruction-03: Load indirect 20
•
This instruction uses indirect addressing mode.
•
The instruction is interpreted as Accumulator ← [[20]].
•
It is given that word 20 contains 40 and word 40 contains 60.
•
Thus, value 60 is loaded into the accumulator.
Instruction-04: Load immediate 30
•
This instruction uses immediate addressing mode.
•
The instruction is interpreted as Accumulator ← 30.
•
Thus, value 30 is loaded into the accumulator.
Instruction-02: Load direct 30
166 | Page
•
This instruction uses direct addressing mode.
•
The instruction is interpreted as Accumulator ← [30].
•
It is given that word 30 contains 50.
•
Thus, value 50 is loaded into the accumulator
Instruction-03: Load indirect 30
•
This instruction uses indirect addressing mode.
•
The instruction is interpreted as Accumulator ← [[30]].
•
It is given that word 30 contains 50 and word 50 contains 70.
•
Thus, value 70 is loaded into the accumulator.
To watch video solution for further comprehension,
Watch the Video13A, Video13B & Video13C Lecture for Self
Study learning
167 | Page
System Bus in Computer Architecture- What Is A System Bus?
•
A bus is a set of electrical wires (lines) that connects the various hardware components of a computer system.
•
It works as a communication pathway through which information flows from one hardware component to the other hardware component.
A bus that connects major components (CPU, memory and I/O devices) of a
computer system is called as a System Bus.
Why Do We Need Bus?
•
A computer system is made of different components such as memory, ALU, registers etc.
•
Each component should be able to communicate with other for proper execution of instructions and information flow.
•
If we try to implement a mesh topology among different components, it would be really expensive.
•
So, we use a common component to connect each necessary component i.e. BUS.
168 | Page
Components Of A System Bus- The system bus consists of three major components-
1. Data Bus 2. Address Bus 3. Control Bus
Let us learn about each component one by one.
169 | Page
1) Data Bus-
•
As the name suggests, data bus is used for transmitting the data / instruction from CPU to memory/IO and vice-versa.
•
It is bi-directional.
Data Bus Width
•
The width of a data bus refers to the number of bits (electrical wires) that the bus can carry at a time.
•
Each line carries 1 bit at a time. So, the number of lines in data bus determine how many bits can be transferred parallely.
•
The width of data bus is an important parameter because it determines how much data can be transmitted at one time.
•
The wider the bus width, faster would be the data flow on the data bus and thus better would be the system performance.
Examples-
•
A 32-bit bus has thirty two (32) wires and thus can transmit 32 bits of data at a time.
•
A 64-bit bus has sixty four (64) wires and thus can transmit 64 bits of data at a time.
2) Control Bus-
•
As the name suggests, control bus is used to transfer the control and timing signals from one component to the other component.
•
The CPU uses control bus to communicate with the devices that are connected to the computer system.
•
The CPU transmits different types of control signals to the system components.
•
It is bi-directional.
What Are Control & Timing Signals? Control signals are generated in the control unit of CPU. Timing signals are used to synchronize the memory and I/O operations with a CPU clock.
170 | Page
Typical control signals hold by control bus-
•
Memory read – Data from memory address location to be placed on data bus.
•
Memory write – Data from data bus to be placed on memory address location.
•
I/O Read – Data from I/O address location to be placed on data bus.
•
I/O Write – Data from data bus to be placed on I/O address location.
Other control signals hold by control bus are interrupt, interrupt acknowledge, bus request, bus grant and several others. The type of action taking place on the system bus is indicated by these control signals. Example-
When CPU wants to read or write data, it sends the memory read or memory write control signal on the control bus to perform the memory read or write operation from the main memory. Similarly, when the processor wants to read from an I/O device, it generates the I/O read signal. 3) Address Bus-
•
As the name suggests, address bus is used to carry address from CPU to memory/IO devices.
•
It is used to identify the particular location in memory.
•
It carries the source or destination address of data i.e. where to store or from where to retrieve the data.
•
It is uni-directional.
Example-
When CPU wants to read or write data, it sends the memory read or memory write control signal on the control bus to perform the memory read or write operation from the main memory and the address of the memory location is sent on the address bus. If CPU wants to read data stored at the memory location (address) 4, the CPU send the value 4 in binary on the address bus.
171 | Page
Address Bus Width
•
The width of address bus determines the amount of physical memory addressable by the processor.
•
In other words, it determines the size of the memory that the computer can use.
•
The wider is the address bus, the more memory a computer will be able to use.
•
The addressing capacity of the system can be increased by adding more address lines.
Examples-
•
An address bus that consists of 16 wires can convey 2
16
(= 64K) different
addresses.
•
An address bus that consists of 32 wires can convey 2
32
(= 4G) different
addresses.
PRACTICE PROBLEMS BASED ON SYSTEM BUS- Problem-01: Which of the following system bus is used to designate the source or destination of the data on the bus itself?
A. Control bus B. Data bus C. Address bus D. System bus
Solution- The correct option is (C) Address bus. Address bus carries the source or destination address of data i.e. where to store or from where to retrieve the data.
172 | Page
Problem-02: The bus which is used to transfer data from main memory to peripheral device is-
A. Data bus B. Input bus C. DMA bus D. Output bus
Solution- The correct option is (A) Data bus. Data bus carries data / instruction from CPU to memory/IO and vice-versa. Problem-03: How many memory locations a system with a 32-bit address bus can address?
A. 2
8
B. 2
16
C. 2
32
D. 2
64
Solution- The correct option is (C) 2
32
.
2
32
memory locations can be addressed by a 32-bit address bus.
173 | Page
Problem-04: How many bits can be transmitted at a time using a bus with 32 data lines?
A. 8 bits B. 16 bits C. 32 bits D. 1024 bits
Solution- Each line carries one bit. So, a bus with 32 data lines can transmit 32 bits at a time. Problem-05: A microprocessor has a data bus with 64 lines and an address bus with 32 lines. The maximum number of bits that can be stored in memory is-
A. 32 x 2
12
B. 32 x 2
64
C. 64 x 2
32
D. 64 x 2
64
Solution- The correct option is (C) 64 x 2
32
.
The amount of blocks that could be located is 2
32
. Now, since data bus has 64 lines, so
each block is 64 bits. Thus, maximum number of bits stored in memory is 2
32
x 64 bits.
174 | Page
Problem-06: The address bus with a ROM of size 1024 x 8 bits is-
A. 8 bits B. 10 bits C. 12 bits D. 16 bits
Solution- The correct option is (B) 10 bits. The size of the ROM is 1024 x 8 = 2
10
x 8. Here, 10 indicates the address bus and 8
indicates the data bus width. Problem-07: The data bus width of a ROM of size 2048 x 8 bits is-
A. 8 B. 10 C. 12 D. 16
Solution- The correct option is (A) 8. The size of the ROM is 2048 x 8 = 2
11
x 8. Here, 11 indicates the address bus and 8
indicates the data bus width.
175 | Page
Pipelining in Computer Architecture
Introduction-
•
A program consists of several number of instructions.
•
These instructions may be executed in the following two ways-
1. Non-Pipelined Execution 2. Pipelined Execution
1. Non-Pipelined Execution- In non-pipelined architecture,
•
All the instructions of a program are executed sequentially one after the other.
•
A new instruction executes only after the previous instruction has executed completely.
•
This style of executing the instructions is highly inefficient.
Example- Consider a program consisting of three instructions. In a non-pipelined architecture, these instructions execute one after the other as-
176 | Page
If time taken for executing one instruction = t, then-
Time taken for executing ‘n’ instructions = n x t
2. Pipelined Execution- In pipelined architecture,
•
Multiple instructions are executed in parallel (at the same time).
•
This style of executing the instructions is highly efficient.
Now, let us discuss instruction pipelining in detail.
Instruction Pipelining-
Instruction pipelining is a technique that implements a form of parallelism called
as instruction level parallelism within a single processor.
•
A pipelined processor does not wait until the previous instruction has executed completely.
•
Rather, it fetches the next instruction and begins its execution.
177 | Page
Pipelined Architecture- In pipelined architecture,
•
The hardware of the CPU is split up into several functional units.
•
Each functional unit performs a dedicated task.
•
The number of functional units may vary from processor to processor.
•
These functional units are called as stages of the pipeline.
•
Control unit manages all the stages using control signals.
•
There is a register associated with each stage that holds the data.
•
There is a global clock that synchronizes the working of all the stages.
•
At the beginning of each clock cycle, each stage takes the input from its register.
•
Each stage then processes the data and feed its output to the register of the next stage.
Four-Stage Pipeline- In four stage pipelined architecture, the execution of each instruction is completed in following 4 stages-
1. Instruction fetch (IF) 2. Instruction decode (ID) 3. Instruction Execute (IE) 4. Write back (WB)
To implement four stage pipeline,
•
The hardware of the CPU is divided into four functional units.
•
Each functional unit performs a dedicated task.
178 | Page
Stage-01: At stage-01,
•
First functional unit performs instruction fetch.
•
It fetches the instruction to be executed.
Stage-02: At stage-02,
•
Second functional unit performs instruction decode.
•
It decodes the instruction to be executed.
Stage-03: At stage-03,
•
Third functional unit performs instruction execution.
•
It executes the instruction.
Stage-04: At stage-04,
•
Fourth functional unit performs write back.
•
It writes back the result so obtained after executing the instruction.
Execution- In pipelined architecture,
•
Instructions of the program execute parallely.
•
When one instruction goes from n
th
stage to (n+1)
th
stage, another instruction goes from
(n-1)
th
stage to n
th
stage.
Phase-Time Diagram-
•
Phase-time diagram shows the execution of instructions in the pipelined architecture.
179 | Page
•
The following diagram shows the execution of three instructions in four stage pipeline architecture.
Time taken to execute three instructions in four stage pipelined architecture = 6 clock
cycles.
NOTE-
In non-pipelined architecture,
Time taken to execute three instructions would be
= 3 x Time taken to execute one instruction
= 3 x 4 clock cycles
= 12 clock cycles
Clearly, pipelined execution of instructions is far more efficient than non-pipelined
execution.
To gain better understanding about Pipelining in Computer Architecture,
Watch the Video14 Lecture for Self Study Learning
180 | Page
Instruction Pipelining- In instruction pipelining,
•
A form of parallelism called as instruction level parallelism is implemented.
•
Multiple instructions execute simultaneously.
•
The efficiency of pipelined execution is more than that of non-pipelined execution.
Performance of Pipelined Execution- The following parameters serve as criterion to estimate the performance of pipelined execution-
•
Speed Up
•
Efficiency
•
Throughput
1. Speed Up- It gives an idea of “how much faster” the pipelined execution is as compared to non-pipelined execution. It is calculated as-
2. Efficiency- The efficiency of pipelined execution is calculated as-
181 | Page
3. Throughput- Throughput is defined as number of instructions executed per unit time. It is calculated as-
Calculation of Important Parameters- Let us learn how to calculate certain important parameters of pipelined architecture.
Consider-
•
A pipelined architecture consisting of k-stage pipeline
•
Total number of instructions to be executed = n
Point-01: Calculating Cycle Time-
In pipelined architecture,
•
There is a global clock that synchronizes the working of all the stages.
182 | Page
•
Frequency of the clock is set such that all the stages are synchronized.
•
At the beginning of each clock cycle, each stage reads the data from its register and
process it.
•
Cycle time is the value of one clock cycle.
There are two cases possible-
Case-01: All the stages offer same delay-
If all the stages offer same delay, then-
Cycle time = Delay offered by one stage including the delay due to its register
Case-02: All the stages do not offer same delay-
If all the stages do not offer same delay, then-
Cycle time = Maximum delay offered by any stage including the delay due to its register
Point-02: Calculating Frequency Of Clock-
Frequency of the clock (f) = 1 / Cycle time
Point-03: Calculating Non-Pipelined Execution Time-
In non-pipelined architecture,
•
The instructions execute one after the other.
•
The execution of a new instruction begins only after the previous instruction has
executed completely.
•
So, number of clock cycles taken by each instruction = k clock cycles
Thus,
Non-pipelined execution time
183 | Page
= Total number of instructions x Time taken to execute one instruction = n x k clock cycles Point-04: Calculating Pipelined Execution Time- In pipelined architecture,
•
Multiple instructions execute parallely.
•
Number of clock cycles taken by the first instruction = k clock cycles
•
After first instruction has completely executed, one instruction comes out per clock cycle.
•
So, number of clock cycles taken by each remaining instruction = 1 clock cycle
Thus, Pipelined execution time = Time taken to execute first instruction + Time taken to execute remaining instructions = 1 x k clock cycles + (n-1) x 1 clock cycle = (k + n – 1) clock cycles Point-04: Calculating Speed Up- Speed up = Non-pipelined execution time / Pipelined execution time = n x k clock cycles / (k + n – 1) clock cycles = n x k / (k + n – 1) = n x k / n + (k – 1) = k / { 1 + (k – 1)/n }
•
For very large number of instructions, n→∞. Thus, speed up = k.
•
Practically, total number of instructions never tend to infinity.
•
Therefore, speed up is always less than number of stages in pipeline.
Important Notes-
184 | Page
Note-01:
•
The aim of pipelined architecture is to execute one complete instruction in one clock cycle.
•
In other words, the aim of pipelining is to maintain CPI ≅ 1.
•
Practically, it is not possible to achieve CPI ≅ 1 due to delays that get introduced due to registers.
•
Ideally, a pipelined architecture executes one complete instruction per clock cycle (CPI=1).
Note-02:
•
The maximum speed up that can be achieved is always equal to the number of stages.
•
This is achieved when efficiency becomes 100%.
•
Practically, efficiency is always less than 100%.
•
Therefore speed up is always less than number of stages in pipelined architecture.
Note-03: Under ideal conditions,
•
One complete instruction is executed per clock cycle i.e. CPI = 1.
•
Speed up = Number of stages in pipelined architecture
Note-04:
•
Experiments show that 5 stage pipelined processor gives the best performance.
Note-05: In case only one instruction has to be executed, then-
•
Non-pipelined execution gives better performance than pipelined execution.
•
This is because delays are introduced due to registers in pipelined architecture.
•
Thus, time taken to execute one instruction in non-pipelined architecture is less.
185 | Page
Note-06: High efficiency of pipelined processor is achieved when-
•
All the stages are of equal duration.
•
There are no conditional branch instructions.
•
There are no interrupts.
•
There are no register and memory conflicts.
Performance degrades in absence of these conditions.
To gain better understanding about Pipelining in Computer Architecture,
Watch Video14 Lecture for Self Study Learning
186 | Page
Pipelining in Computer Architecture- In instruction pipelining,
•
A form of parallelism called as instruction level parallelism is implemented.
•
Multiple instructions execute simultaneously.
•
Speed up, Efficiency and Throughput serve as performance measures of pipelined execution.
Next, we will discuss practice problems based on pipelining.
PRACTICE PROBLEMS BASED ON PIPELINING IN COMPUTER ARCHITECTURE- Problem-01: Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns. Calculate-
1. Pipeline cycle time 2. Non-pipeline execution time 3. Speed up ratio 4. Pipeline time for 1000 tasks 5. Sequential time for 1000 tasks 6. Throughput
Solution- Given-
•
Four stage pipeline is used
•
Delay of stages = 60, 50, 90 and 80 ns
•
Latch delay or delay due to each register = 10 ns
Part-01: Pipeline Cycle Time-
187 | Page
Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 60, 50, 90, 80 } + 10 ns = 90 ns + 10 ns = 100 ns Part-02: Non-Pipeline Execution Time- Non-pipeline execution time for one instruction = 60 ns + 50 ns + 90 ns + 80 ns = 280 ns Part-03: Speed Up Ratio- Speed up = Non-pipeline execution time / Pipeline execution time = 280 ns / Cycle time = 280 ns / 100 ns = 2.8 Part-04: Pipeline Time For 1000 Tasks- Pipeline time for 1000 tasks = Time taken for 1st task + Time taken for remaining 999 tasks = 1 x 4 clock cycles + 999 x 1 clock cycle = 4 x cycle time + 999 x cycle time = 4 x 100 ns + 999 x 100 ns = 400 ns + 99900 ns = 100300 ns
188 | Page
Part-05: Sequential Time For 1000 Tasks- Non-pipeline time for 1000 tasks = 1000 x Time taken for one task = 1000 x 280 ns = 280000 ns Part-06: Throughput- Throughput for pipelined execution = Number of instructions executed per unit time = 1000 tasks / 100300 ns Problem-02: A four stage pipeline has the stage delays as 150, 120, 160 and 140 ns respectively. Registers are used between the stages and have a delay of 5 ns each. Assuming constant clocking rate, the total time taken to process 1000 data items on the pipeline will be-
A. 120.4 microseconds B. 160.5 microseconds C. 165.5 microseconds D. 590.0 microseconds
Solution- Given-
•
Four stage pipeline is used
•
Delay of stages = 150, 120, 160 and 140 ns
•
Delay due to each register = 5 ns
•
1000 data items or instructions are processed
Cycle Time-
189 | Page
Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 150, 120, 160, 140 } + 5 ns = 160 ns + 5 ns = 165 ns Pipeline Time To Process 1000 Data Items- Pipeline time to process 1000 data items = Time taken for 1st data item + Time taken for remaining 999 data items = 1 x 4 clock cycles + 999 x 1 clock cycle = 4 x cycle time + 999 x cycle time = 4 x 165 ns + 999 x 165 ns = 660 ns + 164835 ns = 165495 ns = 165.5 μs Thus, Option (C) is correct. Problem-03: Consider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per instruction of 4. The same processor is upgraded to a pipelined processor with five stages but due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume there are no stalls in the pipeline. The speed up achieved in this pipelined processor is-
A. 3.2 B. 3.0 C. 2.2 D. 2.0
Solution-
190 | Page
Cycle Time in Non-Pipelined Processor- Frequency of the clock = 2.5 gigahertz Cycle time = 1 / frequency = 1 / (2.5 gigahertz) = 1 / (2.5 x 10
9
hertz)
= 0.4 ns Non-Pipeline Execution Time- Non-pipeline execution time to process 1 instruction = Number of clock cycles taken to execute one instruction = 4 clock cycles = 4 x 0.4 ns = 1.6 ns Cycle Time in Pipelined Processor- Frequency of the clock = 2 gigahertz Cycle time = 1 / frequency = 1 / (2 gigahertz) = 1 / (2 x 10
9
hertz)
= 0.5 ns Pipeline Execution Time-
191 | Page
Since there are no stalls in the pipeline, so ideally one instruction is executed per clock cycle. So, Pipeline execution time = 1 clock cycle = 0.5 ns Speed Up- Speed up = Non-pipeline execution time / Pipeline execution time = 1.6 ns / 0.5 ns = 3.2 Thus, Option (A) is correct. Problem-04: The stage delays in a 4 stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage is replaced with a functionally equivalent design involving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is _____%. Solution- Execution Time in 4 Stage Pipeline- Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 800, 500, 400, 300 } + 0 = 800 picoseconds Thus, Execution time in 4 stage pipeline = 1 clock cycle = 800 picoseconds.
192 | Page
Throughput in 4 Stage Pipeline- Throughput = Number of instructions executed per unit time = 1 instruction / 800 picoseconds Execution Time in 2 Stage Pipeline- Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 600, 350 } + 0 = 600 picoseconds Thus, Execution time in 2 stage pipeline = 1 clock cycle = 600 picoseconds. Throughput in 2 Stage Pipeline- Throughput = Number of instructions executed per unit time = 1 instruction / 600 picoseconds Throughput Increase- Throughput increase = { (Final throughput – Initial throughput) / Initial throughput } x 100 = { (1 / 600 – 1 / 800) / (1 / 800) } x 100 = { (800 / 600) – 1 } x 100 = (1.33 – 1) x 100 = 0.3333 x 100 = 33.33 %
193 | Page
Problem-05: A non-pipelined single cycle processor operating at 100 MHz is converted into a synchronous pipelined processor with five stages requiring 2.5 ns, 1.5 ns, 2 ns, 1.5 ns and 2.5 ns respectively. The delay of the latches is 0.5 sec. The speed up of the pipeline processor for a large number of instructions is-
A. 4.5 B. 4.0 C. 3.33 D. 3.0
Solution- Cycle Time in Non-Pipelined Processor- Frequency of the clock = 100 MHz Cycle time = 1 / frequency = 1 / (100 MHz) = 1 / (100 x 10
6
hertz)
= 0.01 μs Non-Pipeline Execution Time- Non-pipeline execution time to process 1 instruction = Number of clock cycles taken to execute one instruction = 1 clock cycle = 0.01 μs = 10 ns
194 | Page
Cycle Time in Pipelined Processor- Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 2.5, 1.5, 2, 1.5, 2.5 } + 0.5 ns = 2.5 ns + 0.5 ns = 3 ns Pipeline Execution Time- Pipeline execution time = 1 clock cycle = 3 ns Speed Up- Speed up = Non-pipeline execution time / Pipeline execution time = 10 ns / 3 ns = 3.33 Thus, Option (C) is correct. Problem-06: We have 2 designs D1 and D2 for a synchronous pipeline processor. D1 has 5 stage pipeline with execution time of 3 ns, 2 ns, 4 ns, 2 ns and 3 ns. While the design D2 has 8 pipeline stages each with 2 ns execution time. How much time can be saved using design D2 over design D1 for executing 100 instructions?
A. 214 ns B. 202 ns C. 86 ns D. 200 ns
195 | Page
Solution- Cycle Time in Design D1- Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 3, 2, 4, 2, 3 } + 0 = 4 ns Execution Time For 100 Instructions in Design D1- Execution time for 100 instructions = Time taken for 1st instruction + Time taken for remaining 99 instructions = 1 x 5 clock cycles + 99 x 1 clock cycle = 5 x cycle time + 99 x cycle time = 5 x 4 ns + 99 x 4 ns = 20 ns + 396 ns = 416 ns Cycle Time in Design D2- Cycle time = Delay due to a stage + Delay due to its register = 2 ns + 0 = 2 ns Execution Time For 100 Instructions in Design D2- Execution time for 100 instructions
196 | Page
= Time taken for 1st instruction + Time taken for remaining 99 instructions = 1 x 8 clock cycles + 99 x 1 clock cycle = 8 x cycle time + 99 x cycle time = 8 x 2 ns + 99 x 2 ns = 16 ns + 198 ns = 214 ns Time Saved- Time saved = Execution time in design D1 – Execution time in design D2 = 416 ns – 214 ns = 202 ns Thus, Option (B) is correct. Problem-07: Consider an instruction pipeline with four stages (S1, S2, S3 and S4) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure-
197 | Page
What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation?
A. 4.0 B. 2.5 C. 1.1 D. 3.0
Solution- Non-Pipeline Execution Time- Non-pipeline execution time for 1 instruction = 5 ns + 6 ns + 11 ns + 8 ns = 30 ns Cycle Time in Pipelined Processor- Cycle time = Maximum delay due to any stage + Delay due to its register = Max { 5, 6, 11, 8 } + 1 ns = 11 ns + 1 ns = 12 ns Pipeline Execution Time- Pipeline execution time = 1 clock cycle = 12 ns Speed Up-
198 | Page
Speed up = Non-pipeline execution time / Pipeline execution time = 30 ns / 12 ns = 2.5 Thus, Option (B) is correct. Problem-08: Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3 and I4 in stages S1, S2, S3 and S4 is shown below-
What is the number of cycles needed to execute the following loop?
for(i=1 to 2) { I1; I2; I3; I4; }
A. 16 B. 23 C. 28 D. 30
199 | Page
Solution- The phase-time diagram is-
From here, number of clock cycles required to execute the loop = 23 clock cycles. Thus, Option (B) is correct. Problem-09: Consider a pipelined processor with the following four stages- IF : Instruction Fetch ID : Instruction Decode and Operand Fetch EX : Execute WB : Write Back The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What is the number of clock cycles taken to complete the following sequence of instructions? ADD R2, R1, R0 R2 ← R0 + R1 MUL R4, R3, R2 R4 ← R3 + R2 SUB R6, R5, R4 R6 ← R5 + R4
A. 7
200 | Page
B. 8 C. 10 D. 14
Solution- The phase-time diagram is-
From here, number of clock cycles required to execute the instructions = 8 clock cycles. Thus, Option (B) is correct. Problem-10: Consider the following procedures. Assume that the pipeline registers have zero latency. P1 : 4 stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns P2 : 4 stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns P3 : 5 stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns P4 : 5 stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns Which procedure has the highest peak clock frequency?
A. P1
201 | Page
B. P2 C. P3 D. P4
Solution- It is given that pipeline registers have zero latency. Thus, Cycle time = Maximum delay due to any stage + Delay due to its register = Maximum delay due to any stage For Processor P1: Cycle time = Max { 1 ns, 2 ns, 2 ns, 1 ns } = 2 ns Clock frequency = 1 / Cycle time = 1 / 2 ns = 0.5 gigahertz For Processor P2: Cycle time = Max { 1 ns, 1.5 ns, 1.5 ns, 1.5 ns } = 1.5 ns Clock frequency = 1 / Cycle time
202 | Page
= 1 / 1.5 ns = 0.67 gigahertz For Processor P3: Cycle time = Max { 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns } = 1 ns Clock frequency = 1 / Cycle time = 1 / 1 ns = 1 gigahertz For Processor P4: Cycle time = Max { 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns } = 1.1 ns Clock frequency = 1 / Cycle time = 1 / 1.1 ns = 0.91 gigahertz Clearly, Process P3 has the highest peak clock frequency. Thus, Option (C) is correct. Problem-11:
203 | Page
Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies T
1
,
T
2
and T
3
such that T
1
= 3T
2
/4 = 2T
3
. If the longest pipeline stage is split into two pipeline
stages of equal latency, the new frequency is ____ GHz, ignoring delays in the pipeline registers. Solution- Let ‘t’ be the common multiple of each ratio, then-
•
T
1
= t
•
T
2
= 4t / 3
•
T
3
= t / 2
Pipeline Cycle Time- Pipeline cycle time = Maximum delay due to any stage + Delay due to its register = Max { t, 4t/3, t/2 } + 0 = 4t/3 Frequency Of Pipeline- Frequency = 1 / Pipeline cycle time = 1 / (4t / 3) = 3 / 4t Given frequency = 3 GHz. So, 3 / 4t = 3 GHz 4t = 1 ns t = 0.25 ns
204 | Page
Stage Latencies Of Pipeline- Stage latency of different stages are-
•
Latency of stage-01 = 0.25 ns
•
Latency of stage-02 = 0.33 ns
•
Latency of stage-03 = 0.125 ns
Splitting The Pipeline- The stage with longest latency i.e. stage-02 is split up into 4 stages. After splitting, the latency of different stages are-
•
Latency of stage-01 = 0.25 ns
•
Latency of stage-02 = 0.165 ns
•
Latency of stage-03 = 0.165 ns
•
Latency of stage-04 = 0.125 ns
Splitted Pipeline Cycle Time- Splitted pipeline cycle time = Maximum delay due to any stage + Delay due to its register = Max { 0.25, 0.165, 0.165, 0.125 } + 0 = 0.25 ns Frequency Of Splitted Pipeline- Frequency = 1 / splitted pipeline cycle time = 1 / 0.25 ns = 4 GHz Thus, new frequency = 4 GHz.
205 | Page
To gain better understanding about Pipelining Problem Solving, Watch Video15A, Video15B, Video15C Lecture for Self Study Learning
206 | Page
Recommended Books:
Computer Organization By Carl Hamacher
Computer Architecture By Hennessy Computer Organization & Architecture By William Stallings