15845007 Computer architecture and Org.ppt

ssuser127433 7 views 57 slides Feb 26, 2025
Slide 1
Slide 1 of 57
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57

About This Presentation

Computer architecture and Org


Slide Content

Computer Architecture
Memory System Design

February 26, 2025 Veton Këpuska 2
Chapter Outline
7.1 Introduction: The Components of the Memory
System
7.2 RAM Structure: The Logic Designer’s Perspective
7.3 Memory Boards and Modules
7.4 Two-Level Memory Hierarchy
7.5 The Cache
7.6 Virtual Memory
7.7 The Memory Subsystem in the Computer

February 26, 2025 Veton Këpuska 3
Introduction
So far, we’ve treated memory as an array of words
limited in size only by the number of address bits.
Real world issues arise:
cost
speed
size
power consumption
volatility
...

February 26, 2025 Veton Këpuska 4
Topics Covered

Memory components:

RAM memory cells and cell arrays

Static RAM—more expensive, but less complex

Tree and matrix decoders—needed for large RAM chips

Dynamic RAM—less expensive, but needs “refreshing”

Chip organization

Timing

ROM—Read-only memory

Memory boards

Arrays of chips give more addresses and/or wider
words

2-D and 3-D chip arrays

Memory modules

Large systems can benefit by partitioning memory for

separate access by system components

fast access to multiple words

February 26, 2025 Veton Këpuska 5
Topics Covered
•The memory hierarchy : from fast and expensive to slow
and cheap
Example: Registers Cache Main Memory
Disk
At first, consider just two adjacent levels in the
hierarchy
The cache: High speed and expensive

Kinds: Direct mapped, associative, set associative
Virtual memory—makes the hierarchy transparent

Translate the address from CPU’s logical address
to the physical address where the information is
actually stored

Memory management—how to move information
back and forth

Multiprogramming—what to do while we wait

The Translation Look-aside Buffer - “TLB” helps in
speeding the address translation process
•Overall consideration of the memory as a subsystem

February 26, 2025 Veton Këpuska 6
The CPU–Memory Interface

Sequence of events:

Read:
1.CPU loads MAR, issues Read, and REQUEST
2.Main memory transmits words to MDR
3.Main memory asserts COMPLETE

Write:
1.CPU loads MAR and MDR, asserts Write, and REQUEST
2.Value in MDR is written into address in MAR
3.Main memory asserts COMPLETE
CPU
m
Main memory
Address busData bus
s
Address
0
1
2
3
2
m
– 1
A
0
– A
m–1
D
0
– D
b–1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b

February 26, 2025 Veton Këpuska 7
The CPU–Memory Interface (cont’d.)

Additional points:

If b < w, main memory must make (w/b) b-bit transfers

Some CPUs allow reading and writing of word sizes < w

Example: Intel 8088: m = 20, w = 16, s = b = 8

8- and 16-bit values can be read and written

If memory is sufficiently fast, or if its response is
predictable,
then COMPLETE may be omitted

Some systems use separate R and W lines, and omit
REQUEST
CPU
m
Main memory
Address busData bus
s
Address
0
1
2
3
2
m
– 1
A
0
– A
mÐ1
D
0
– D
bÐ1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b

February 26, 2025 Veton Këpuska 8
Some Memory Properties
Symbol Definition Intel Intel PowerPC
8088 8086 601
w CPU word size 16 bits 16 bits 64 bits
m Bits in a logical memory address 20 bits 20 bits 32 bits
s Bits in smallest addressable unit 8 bits 8 bits 8 bits
b Data bus size 8 bits 16 bits 64 bits
2m Memory word capacity, s-sized wds 220 wds 220 wds 232 wds
2mxsMemory bit capacity 220x8 bits 220 8 bits 232x8 bits

February 26, 2025 Veton Këpuska 9
Big-Endian and Little-Endian Storage
When data types having a word size larger than the
smallest addressable unit are stored in memory the
question arises,
“Is the least significant part of the word stored at the
lowest address (little-Endian, little end first) or—
is the most significant part of the word stored at the lowest
address (big-Endian, big end first)”?
Example: The hexadecimal 16-bit number ABCDH,
stored at address 0:
AB CD
msb ... lsb
AB
CD0
1
AB
CD
0
1
Little-Endian Big-Endian

February 26, 2025 Veton Këpuska 10
Bin-Endian and Little-Endian Storage of a PowerPC 32-
Bit Word
Little-Endian Storage
Memory
Address
Contents
0 b7 … b0
1 b15 … b8
2 b23 … b16
3 b31 … b24

CPU Word
CPU Word bit
locations
b31...b24 b23…b16 b15...b8 b7…b0
Big-Endian byte
addresses
0 1 2 3
Little-Endian byte
addresses
3 2 1 0
Big-Endian Storage
Memory
Address
Contents
0 b31 … b24
1 b23 … b16
2 b15 … b8
3 b7 … b0

February 26, 2025 Veton Këpuska 11
Memory Performance Parameters
Symbol DefinitionUnitsMeaning
t
a
Access timeTime Time to access a memory word until
the assertion of the memory
completion signal
t
c
Cycle time Time Time from start of read to start of next
memory operation
k Block size Words Number of words per block
 Bandwidth Words/
sec
Word Transmission Rate
t
l
Latency Time Time to access first of a sequence of
words
t
bl
=t
1
+k/Block
access
time
Time Time to access entire block form
start of read

February 26, 2025 Veton Këpuska 12
The Memory Hierarchy, Cost, and Performance
Some Typical Values:
Component
Access type RandomRandom Random Direct Sequential
accessaccess access access access
Capacity, bytes 64–10248–512 KB8–64 MB 1–10 GB 1 TB
Latency 1–10 ns20 ns 50 ns 10 ms 10 ms–10 s
Block size 1 word16 words16 words4 KB 4 KB
Bandwidth System8 MB/s 1 MB/s 1 MB/s 1 MB/s
clock
rate
Cost/MB High $500 $30 $0.25 $0.02
CPU
Cache Main Memory Disk Memory
Tape
Memory

February 26, 2025 Veton Këpuska 13
Memory Hierarchy
Memory Hierarchy
CPU Cache Main
Memory
HD
word
Primary
Level
Secondary
Level
Ternary
Level
2
nd
-ary
level
Block
3
rd
-ary
level
Block
Cache
lines
Virtual Memory
•Block Size
•Latency
•Bandwidth
•Hit Rate
Increases
Frequency of transaction Decreases
Principle of Locality
Fast & Small
Slow & Large

February 26, 2025 Veton Këpuska 14
Memory Hierarchy
Considering Any Two Adjacent Levels of the Memory
Hierarchy
Some definitions:
Temporal locality: the property of most programs that if a
given memory location is referenced, it is likely to be
referenced again, “soon.”
Spatial locality: if a given memory location is referenced,
those locations near it are likely to be referenced “soon.”
Working set: The set of memory locations referenced over a
fixed period of time, or within a time window.
Notice that temporal and spatial locality both work to
assure that the contents of the working set change only
slowly over execution time.
CPU •
 • •

 • •
Slower,
larger
Defining the primary and secondary levels:
Secondary
level
Primary
level
Faster,
smaller
two adjacent levels in the hierarchy

February 26, 2025 Veton Këpuska 15
Primary and Secondary Levels of the Memory Hierarchy

Secondary
level
•The item of commerce between any two levels is the block.
•Blocks may/will differ in size at different levels in the hierarchy.
Example: Cache block size ~ 16–64 bytes.
Disk block size: ~ 1–4 Kbytes.
•As working set changes, blocks are moved back/forth through the
hierarchy to satisfy memory access requests.
•The primary level block size is often determined more by the
properties of information transfer than by spatial locality principle.
Speed between levels defined by
latency: time to access first word, and
bandwidth: the number of words per second transmitted between
levels.
Typical latencies:
Cache latency: a few clocks
Disk latency: 100,000 clocks
Primary
level

February 26, 2025 Veton Këpuska 16
Primary and Secondary Addresses
•A complication: Addresses will differ
depending on the level.
•Primary address: the address of a value in
the primary level.
• Secondary address : the address of a
value in the secondary level.
Primary and Secondary Address
Examples
Main memory address: unsigned integer
Disk address: track number, sector number,
offset of word in sector.

February 26, 2025 Veton Këpuska 17
Addressing and Accessing a Two-Level Hierarchy

The
computer
system, HW
or SW,
must
perform any
address
translation
that is
required:
Two ways of forming the address: Segmentation and Paging.
Paging is more common. Sometimes the two are used together,
one “on top of” the other. More about address translation and paging later...
Miss
System
address
Hit
Address in
secondary
memory
Memory management unit (MMU)
Address in
primary
memory
Block
Word
Primary
level
Secondary
level
Translation function
(mapping tables,
permissions, etc.)

February 26, 2025 Veton Këpuska 18
Primary Address Formation
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word+

February 26, 2025 Veton Këpuska 19
Hits and Misses; Paging; Block Placement

Hit: the word was found at the level from which it was
requested.

Miss: the word was not found at the level from which it was
requested.

A miss will result in a request for the block containing the word
from the next higher level in the hierarchy.

Hit ratio (or hit rate) = h = number of hits

Miss ratio: 1 - hit ratio

tp = primary memory access time.

ts = secondary memory access time

Access time, ta = h •
 tp + (1-h) • ts

Page: commonly, a disk block.

Page fault: synonymous with a miss.

Demand paging : pages are moved from disk to main
memory only when a word in the page is requested by the
processor.

Block placement and replacement decisions must be made
each time a block is moved.
total number of
references

February 26, 2025 Veton Këpuska 20
Hits and Misses; Paging; Block Placement

The Cache Regime:

Secondary latency only about 10 time as long as
primary-level access time.

CPU wait afforded as long as miss ratio is small.

A small fast memory (sometimes part of CPU), called
cache, connected to a slower memory typically of
different semiconductor technology.

Cache’s access time matches the processor speed.

Two level cashes:

Primarily cache (faster level)

Secondary cache (slower level)

Since CPU waits on cache miss, software cannot be
used in miss response thus everything has to be
done in hardware.

This means that placement and replacement decisions
and secondary address formation must be kept simple
so they can be done quickly with limited hardware.

Less flexibility of placement in cache then in primary
level of a virtual memory system.

February 26, 2025 Veton Këpuska 21
Hits and Misses; Paging; Block Placement
The Cache Miss versus the Main Memory
Miss
The cache hit time is typically 1-2 clock cycles.
Miss penalty only a few 10’s of clock cycles to
bring the desired block into the cache from main
memory.
Main memory miss penalty typically runs to the
10,000-100,000 of clock cycles to bring the
block from the disk.
Cache miss rates of no greater than 2% are
tolerable
Main memory miss rates no greater than
0.001% to avoid significant degradation in
system performance.

February 26, 2025 Veton Këpuska 22
Virtual Memory

A virtual memory is a memory hierarchy, usually
consisting of at least main memory and disk, in which
the processor issues all memory references as
effective addresses in a flat address space. All
translations to primary and secondary addresses are
handled transparently to the process making the
address reference, thus providing the illusion of a
flat address space.

Recall that disk accesses may require 100,000 clock
cycles to complete, due to the slow access time of
the disk subsystem. Once the processor has, through
mediation of the operating system, made the proper
request to the disk subsystem, it is available for other
tasks.

Multiprogramming shares the processor among
independent programs that are resident in main
memory and thus available for execution.

February 26, 2025 Veton Këpuska 23
Decisions in Designing a 2-Level Hierarchy

Translation procedure to translate from system address to primary
address.

Block size—block transfer efficiency and miss ratio will be affected.

Processor dispatch on miss—processor wait or processor
multiprogrammed.

Primary-level placement—direct, associative, or a combination.
Discussed later.

Replacement policy—which block is to be replaced upon a miss.

Direct access to secondary level—in the cache regime, can the
processor directly access main memory upon a cache miss?

Write through—can the processor write directly to main memory upon a
cache miss?
 Read through—can the processor read directly from main memory upon a
cache miss as the cache is being updated?
 Read or write bypass—can certain infrequent read or write misses be
satisfied by a direct access of main memory without any block movement?

February 26, 2025 Veton Këpuska 24
The Cache Mapping Function

CPU
Cache
Block
Main memory
Mapping function
Address
Word
The cache mapping function is responsible for all cache operations:
•Placement strategy: where to place an incoming block in the cache
•Replacement strategy: which block to replace upon a miss
•Read and write policy: how to handle reads and writes upon cache misses
Mapping function must be implemented in hardware.
Three different types of mapping functions:
•Associative
•Direct mapped
•Block-set associative
Example: 256 KB 16 words 32 MB

February 26, 2025 Veton Këpuska 25
Memory Fields and Address Translation

Example of processor-issued 32-bit virtual address:
031
32 bits
That same 32-bit address partitioned into two fields, a block field,
and a word field. The word field represents the offset into the block
specified in the block field:
Block number Word
26 6
2
26
64 word blocks
00 ••• 001001 001011
Example of a specific memory reference: Block 9, word 11.

February 26, 2025 Veton Këpuska 26
Associative Cache

*16 bits, while unrealistically small, simplifies the examples
Associative mapped
cache model: any
block from main
memory can be put
anywhere in the
cache.
Assume a 16-bit
main memory.*
Cache
memory
Main
memory
Valid
bits
0
1
2
255
421
?
119
2
Cache block 0 MM block 0
MM block 1
MM block 2
MM block 119
MM block 421
MM block 8191
?
Cache block 2
Cache block 255
1
0
1
313
1
Tag
memory
Tag
field,
13 bits
Tag
Main memory address:
Byte
One cache line,
8 bytes
One cache line,
8 bytes
Valid,
1 bitAssociative
Memory Search

February 26, 2025 Veton Këpuska 27
Associative Cache
In General:
B- Number of Bytes per Block
T-Address size in bits of Main Memory, also
Number of Tag bits in the Tag Memory
L-Number of Blocks in Cache.
S
MM
=Bx2
T
– Size of Main Memory in Bytes
S
CM=LxB – Size of Cache Memory in Bytes
Main Memory address length in bits:
log
2S
MM = T+log
2B

February 26, 2025 Veton Këpuska 28
Associative Cache Mechanism

Because any block can reside anywhere in the cache, an associative (content
addressable) memory is used. All locations are searched simultaneously.
Match
bit
Valid
bit
Match
64
3
8
To CPU
Argument
register
Associative tag memory
313 Selector
Tag
Main memory address
Byte
Cache block 0
?
Cache block 2
Cache block 255
One cache line,
8 bytes
2
3
1
4
5
6

February 26, 2025 Veton Këpuska 29
Advantages and Disadvantages
of the Associative Mapped Cache
Advantage
Most flexible of all—any MM block can go
anywhere in the cache.
Disadvantages
Large tag memory.
Need to search entire tag memory
simultaneously means lots of hardware.
Replacement Policy is an issue when the
cache is full.
Q.: How is an associative search conducted at the logic gate level?
Direct-mapped caches simplify the hardware by allowing each MM block
to go into only one place in the cache:

February 26, 2025 Veton Këpuska 30
Direct-Mapped Cache

Key Idea: all the MM
blocks from a given
group can go into only
one location in the
cache, corresponding
to the group number.
Now the cache needs only examine the single group
that its reference specifies.
Cache
memory Main memory block numbers Group #:
Valid
bits
0
1
2
255
30
9
1
1
1
1
1
38
38
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache address:
Main memory address:
Byte
One
cache
line,
8 bytes One cache line,
8 bytes
0
1
2
255
256
257
258
511
512
513
514
767
2305
7680
7681
7682
7936
7937
7938
0
1
2
2558191
0Tag #: 1 2 9 3031

February 26, 2025 Veton Këpuska 31
Direct-Mapped Cache
In General:
B- Number of Bytes per Block
T- Number of Tag bits in the Tag Memory also
Number of Columns in Main Memory
G-Number of Groups  L
L-Number of Blocks in Cache.
S
MM=GxBx2
T
– Size of Main Memory in Bytes
S
CM
=LxB – Size of Cache Memory in Bytes
Main Memory address length in bits:
log
2S
MM = T+log
2G+log
2B

February 26, 2025 Veton Këpuska 32
Direct-Mapped Cache Operation
1.Decode the
group number of
the incoming MM
address to select
the group
2.If Match
AND Valid
3.Then gate out the
tag field
4.Compare cache
tag with
incoming tag
5.If a hit, then gate
out the cache
line
6.and use the word field to
select the desired word.
Cache
memory
Valid
bits
0
1
2
255
30
9
1
1
1
5
55
8
64
3
1
256
1
1
38
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache miss
Cache hit=
Main memory address
Byte
Hit
5-bit
comparator
Selector
8–256
decoder
4
3
1
5
2
6

February 26, 2025 Veton Këpuska 33
Direct-Mapped Caches
The direct mapped cache uses less hardware,
but is much more restrictive in block placement.
If two blocks from the same group are
frequently referenced, then the cache will
“thrash.” That is, repeatedly bring the two
competing blocks into and out of the cache. This
will cause a performance degradation.
Block replacement strategy is trivial.
Compromise—allow several cache blocks in each
group—the Block-Set-Associative Cache

February 26, 2025 Veton Këpuska 34
2-Way Set-Associative Cache
Example shows 256 groups, a set of two per group.
Sometimes referred to as a 2-way set-associative cache.
Cache
memory Main memory block numbers Group #:
7680
2304
258
0
1
2
255
2
2
0
38
38
Tag
memory
Tag
field,
5 bits
Set
5
Tag
Cache group address:
Main memory address:
Byte
One
cache
line,
8 bytes
One cache line,
8 bytes
512
513
255
0
1
2
255
256
257
258
511
512
513
514
767
2304
7680
7681
7682
7936
7937
7938
0
1
2
2558191
0Tag #: 1 2 9 3031
30
9
1
1 511

February 26, 2025 Veton Këpuska 35
K-Way Set-Associative Cache
Tag Memory Valid Bits Cache Memory
L
T
K
L
1
K 2
T
LS
B
T log
2
S log
2
B
Primary
Address
Main
Memory
Address

February 26, 2025 Veton Këpuska 36
K-Way Set-Associative Cache
K=1-way Block-Set Associative Cache
 Direct Mapped Cache
S=1 Block-Set Associative Cache 
Associative Cache.

February 26, 2025 Veton Këpuska 37
Read/Write Policies
Cache Write Policies upon a Cache Hit:
Write-through:
Updates both cache & Main Memory upon each write to
a block that is present in the cache.
Write-back:
Postpones main memory update until block is replaced
uses “dirty bit” (this bit is set upon first write in a block
of the cache).
Read and Write Miss Policies
Miss upon a read:
Read-through: Immediate forwarding (from MM) to CPU
then cache is updated
Load the block into the cache first.
Miss upon a write:
Write allocate: block is first brought into the cache and
then (MM) updated.
Write no-allocate: write directly to main memory
(without brining block to cache).

February 26, 2025 Veton Këpuska 38
Block Replacement Policy

Not needed with direct-mapped cache

Least Recently Used (LRU)

Track usage with a counter. Each time a block is
accessed:

Clear counter of accessed block

Increment counters with values less than the one
just cleared.

All others remain unchanged

When set is full, remove line with highest count

Random replacement—replace block at random

Even random replacement is a fairly effective
strategy

FIFO – Longest residing block replaced first.

Smaller cache memories:

Random worst, LRU best

Large cache memories:

Random, LRU comparable

February 26, 2025 Veton Këpuska 39
Performance and Cache Design
Direct-Mapped and Block-Set Associative Caches may
suffer from minor time penalties due to the need to
decode the group (set) field.
For equivalent amount of hardware (e.g., # number
of gates), direct-mapped and set-associative caches
will have more capacity.
Quantitative analysis of cache performance is quite
complex and is the subject of ongoing research.
Simplified mathematical analysis, however, will
provide sufficient insight into the issues involved:

February 26, 2025 Veton Këpuska 40
Performance and Cache Design

Definitions:
 h - hit ratio and (1-h) miss ratio.

T
C cache access time upon a cache hit.

T
M
access time upon a cache miss.

T
a average memory access during a particular machine execution interval:

Recall - Access time:
Ta = h •
 T
p + (1 - h) •
 T
s
for primary (p) and secondary (s) levels.

For T
p
=T
c
for cache and T
s
=T
M
for MM,
T
a = hT
C + (1-h)T
M

This equation is valid for read-through cache where the value
is forwarded from main memory to the CPU directly as it is
received without waiting for the entire cache line to be loaded
before initiating another cache read to forward the value.

If this is not the case T
M is replaced by the cache line fill time.

February 26, 2025 Veton Këpuska 41
Performance and Cache Design
Speedup S:
where
T
without is the time taken without the improvement,
e.g., without the cache, and
T
with is the time the process takes with the
improvement, e.g., with the cache.

Having a model for cache and MM access times
and cache line fill time, the speedup can be
calculated once the hit ratio is known.
with
without
T
T
S

February 26, 2025 Veton Këpuska 42
Performance and Cache Design
Example:
Changing a certain cache from direct mapped to two-
way set-associative caused the hit ratio measured for
a certain set of benchmarks to improve form 0.91 to
0.96 with T
C=20 ns, and T
M=70ns. What is the
speedup, assuming no change in T
c
from the set-
associative multiplexer.
Answer:
T
without
= 0.91x20 + (1-0.91)x70 = 24.5 ns;
T
with
= 0.96x20 + (1-0.96)x70 = 22.5 ns;
Thus the speedup is:

S=24.5/22.0=1.11  11% improvement in execution
time.

February 26, 2025 Veton Këpuska 43
Virtual Memory
The memory management unit, MMU, is responsible for mapping logical
addresses issued by the CPU to physical addresses that are presented
to the cache and main memory.
A word about addresses:
CPU
Main MemoryCache Disk
MMU
Logical
Address
Physical
Address
Mapping
Tables
Virtual
Address
CPU Chip
•Effective address — an address computed by the processor while
executing a program. Synonymous with logical address.
•The term effective address is often used when referring to
activity inside the CPU. Logical address is most often used
when referring to addresses when viewed from outside the CPU.
•Virtual address — the address generated from the logical address
by the memory management unit, MMU, prior to making address
translation.
•Physical address — the address presented to the memory unit.
(Note: Every address reference must be translated.)

February 26, 2025 Veton Këpuska 44
Virtual Addresses—Why?
The logical address provided by the
CPU is translated to a virtual address
by the MMU. Often the virtual address
space is larger than the logical
address, allowing program units to be
mapped to a much larger virtual
address space.

February 26, 2025 Veton Këpuska 45
Virtual Addressing—Advantages

Simplified addressing. Each program unit can be compiled
into its own memory space, beginning at address 0 and
potentially extending far beyond the amount of physical
memory present in the system.

No address relocation required at load time.

No need to fragment the program to accommodate memory
limitations.

Cost effective use of physical memory .

Less expensive secondary (disk) storage can replace primary
storage. (The MMU will bring portions of the program into
physical memory as required, e.g., not all portions of the
program does need to occupy physical memory at one time)

Access control. As each memory reference is translated, it
can be simultaneously checked for read, write, and execute
privileges.

This allows access/security control at the most fundamental
levels.

Can be used to prevent buggy programs and intruders from
causing damage to other users or the system( ).
 This is the origin of those “bus error” and “segmentation
fault” messages.

February 26, 2025 Veton Këpuska 46
Memory Management
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word+

February 26, 2025 Veton Këpuska 47
Memory Management by Segmentation

Notice that each
segment’s virtual
address starts at 0,
different from its
physical address.
Repeated movement
of segments into
and out of physical
memory will result
in gaps between
segments. This is
called external
fragmentation.
Compaction
(defragmenter)
routines must be
occasionally run to
remove these
fragments.

Main memory
Segment 1
Segment 5
Gap
Segment 6
Physical
memory
addresses
Virtual
memory
addresses
0000
0
0
0
0
0
FFF
Segment 9
Segment 3
Gap

February 26, 2025 Veton Këpuska 48
Segmentation Mechanism
The computation of
physical address
from virtual
address requires
an integer addition
for each memory
reference, and a
comparison if
segment limits are
checked.
Q: How does the
MMU switch
references from
one segment to
another?

Main memory
Segment 1
Segment 5
Gap
Segment 6
Offset in
segment
Segment
base
register
Segment
limit
register
No
Virtual
memory
address
from CPU
Bounds
error
Segment 9
Segment 3
Gap
+

February 26, 2025 Veton Këpuska 49
Memory Management by Paging

This figure shows the mapping between virtual memory pages,
physical memory pages, and pages in secondary memory. Page
n - 1 is not present in physical memory, but only in secondary
memory.

The MMU manages this mapping (logical to physical address).
Program
unit
0
Page 1
Page 2
Page n – 1
Virtual memory
Page 0
Physical memory
Secondary memory
.
.
Demand Paging :
Pages are brought
into physical
memory only as
they are needed.

February 26, 2025 Veton Këpuska 50
Virtual Address Translation in a Paged MMU

A page fault will result in 100,000 or more cycles passing before the page
has been brought from secondary storage to MM.

Page table
limit register
Page table
base register
No
Bounds
error
Access-
control bits:
presence bit,
dirty bit,
usage bits
Physical
page
number or
pointer to
secondary
storage
+
Offset in
page table
Hit.
Page in
primary memory.
Translate to
Disk address.
Miss
(page fault).
Page in
secondary
memory.
Page table
Desired word
Main memory
Virtual address from CPU
Page numberOffset in page Physical page
Physical address
Word
•1 table per
user per
program unit
•One
translation
per memory
access
•Potentially
large page
table

February 26, 2025 Veton Këpuska 51
Page Placement and Replacement
Page tables are direct mapped, since the
physical page is computed directly from the
virtual page number.
But physical pages can reside anywhere in
physical memory.
Page tables such as those on the previous slide
result in large page tables, since there must be a
page table entry for every page in the program
unit.
Some implementations resort to hash tables
instead, which need have entries only for those
pages actually present in physical memory.
Replacement strategies are generally LRU, or at
least employ a “use bit” to guide replacement.

February 26, 2025 Veton Këpuska 52
Fast Address Translation: Regaining Lost Ground
The concept of virtual memory is very
attractive, but leads to considerable
overhead:
There must be a translation for every
memory reference.
There must be two memory references for
every program reference:

One to retrieve the page table entry,

One to retrieve the value.
Most caches are addressed by physical
address, so there must be a virtual to
physical translation before the cache can
be accessed.

February 26, 2025 Veton Këpuska 53
Fast Address Translation: Regaining Lost Ground
The answer: a small cache in the
processor that retains the last few
virtual to physical translations: a
Translation Lookaside Buffer, TLB .
The TLB contains not only the virtual to
physical translations, but also the
valid, dirty, and protection bits, so a
TLB hit allows the processor to access
physical memory directly.
The TLB is usually implemented as a
fully associative cache:

February 26, 2025 Veton Këpuska 54
Translation Lookaside Buffer Structure and Operation

TLB
Desired word
Main memory or cache
Virtual address from CPU
Page number
Associative lookup
of virtual page
number in TLB
TLB miss.
Look for
physical page
in page table.
To page table
Virtual
page
number
Word Physical page
Physical address
Word
Hit
N
Y
Access-
control bits:
presence bit,
dirty bit,
valid bit,
usage bits
Physical
page
number
TLB hit.
Page is in
primary memory.

February 26, 2025 Veton Këpuska 55
Operation of the Memory Hierarchy

Virtual address
CPU Cache Main memory Secondary memory
Search TLB Search cache
Update cache
from MM
Return value
from cache
Generate physical
address
TLB
hit
Cache
hit
Search page table
Update MM, cache,
and page table
Page fault. Get page
from secondary
memory
Generate physical
address
Update TLB
Page
table
hit
Y Y
Y
Miss Miss Miss

February 26, 2025 Veton Këpuska 56
I/O Connection to a Memory with a Cache

The memory system is quite complex, and
affords many possible tradeoffs.

The only realistic way to chose among these
alternatives is to study a typical workload,
using either simulations or prototype systems.

Instruction and data accesses usually have
different patterns.

It is possible to employ a cache at the disk level,
using the disk hardware.

Traffic between MM and disk is I/O, and direct
memory access, DMA, can be used to speed the
transfers:
Cache
Main
memory
Paging
DMA
Disk
I/OI/O DMA
CPU

February 26, 2025 Veton Këpuska 57
Summary
Most memory systems are multileveled—cache,
main memory, and disk.
Static and dynamic RAM are fastest components,
and their speed has the strongest effect on
system performance.
Chips are organized into boards and modules.
Larger, slower memory is attached to faster
memory in a hierarchical structure.
The cache to main memory interface requires
hardware address translation.
Virtual memory—the main memory–disk
interface—can employ software for address
translation because of the slower speeds
involved.
The hierarchy must be carefully designed to
ensure optimum price-performance.
Tags