Virtual Memory: Topics in computer organization and architecture
7 views
51 slides
May 11, 2025
Slide 1 of 51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
About This Presentation
Virtual memory is a key concept in computer organization and architecture, enabling a system to use more memory than physically available. It works by using a combination of hardware and software to map virtual addresses (used by programs) to physical addresses in RAM and on secondary storage (like ...
Virtual memory is a key concept in computer organization and architecture, enabling a system to use more memory than physically available. It works by using a combination of hardware and software to map virtual addresses (used by programs) to physical addresses in RAM and on secondary storage (like a hard drive). This allows programs to run as if they have a continuous, large memory space, even if RAM is limited.
Here's a breakdown of key topics related to virtual memory:
1. Need for Virtual Memory:
Limited Physical Memory:
Virtual memory addresses the constraint of having a limited amount of RAM by allowing programs to use more memory than physically available.
Simplified Programming:
It simplifies programming by providing an abstraction layer where programs can operate as if they have a continuous memory space, without needing to worry about physical memory fragmentation or limitations.
Efficient System Performance:
Virtual memory enhances system performance by allowing multiple programs to run concurrently without needing to load everything into RAM at once.
2. Implementation Techniques:
Paging:
This is the most common technique, where virtual memory is divided into fixed-size blocks called pages, and physical memory into equally-sized frames. The operating system maintains a page table that maps virtual addresses to physical addresses.
Segmentation:
This technique divides the address space into variable-sized logical segments, which are then mapped to physical memory.
Demand Paging:
Pages are only loaded into RAM when they are actually needed by a running program, improving memory efficiency.
3. Key Concepts and Terminology:
Virtual Address:
The address used by the program to access memory, which is translated into a physical address by the Memory Management Unit (MMU).
Physical Address:
The actual address in RAM where data is stored.
Page Fault:
An event that occurs when a program tries to access a page that is not currently in RAM, requiring the operating system to load the page from secondary storage.
Page Table:
A data structure used by the MMU to map virtual addresses to physical addresses.
Swap Space:
A designated area on the hard drive where pages can be stored when they are not actively used in RAM.
Memory Management Unit (MMU):
A hardware component responsible for translating virtual addresses to physical addresses.
Translation Lookaside Buffer (TLB):
A cache that stores recently used virtual-to-physical address translations to speed up the translation process.
4. Address Translation:
The MMU uses a combination of hardware and software to translate virtual addresses to physical addresses.
This translation process involves using the page table to locate the physical location of the requested data in RAM.
If the requested page is not in RAM (page fault), the MMU triggers a page fault interrupt, and the operating system handles loading the page from secondary storage.
5. Memory Management Strategies:
Page Replacement
Size: 718.5 KB
Language: en
Added: May 11, 2025
Slides: 51 pages
Slide Content
CPE 631 Memory
Electrical and Computer Engineering
University of Alabama in Huntsville
Aleksandar Milenkovic [email protected]
http://www.ece.uah.edu/~milenka
3
AM
LaCASA
Another View of Memory Hierarchy
Regs
L2 Cache
Memory
Disk
Tape
Instructions, Operands
Blocks
Pages
Files
Upper Level
Lower Level
Faster
Larger
Cache
Blocks
Thus
far
{
Next:
Virtual
Memory
{
4
AM
LaCASA
Why Virtual Memory?
Today computers run multiple processes,
each with its own address space
Too expensive to dedicate a full-address-space
worth of memory for each process
Principle of Locality
allows caches to offer speed of cache memory
with size of DRAM memory
DRAM can act as a “cache” for secondary storage
(disk) Virtual Memory
Virtual memory – divides physical memory into
blocks and allocate them to different processes
5
AM
LaCASA
Virtual Memory Motivation
Historically virtual memory was invented when
programs became too large for physical memory
Allows OS to share memory and protect programs
from each other (main reason today)
Provides illusion of very large memory
sum of the memory of many jobs
greater than physical memory
allows each job to exceed the size of physical mem.
Allows available physical memory
to be very well utilized
Exploits memory hierarchy
to keep average access time low
6
AM
LaCASA
Mapping Virtual to Physical Memory
Program with 4 pages (A, B, C, D)
Any chunk of Virtual Memory assigned
to any chuck of Physical Memory (“page”)
Physical MemoryVirtual Memory
A
B
C
D
D
A
B
C
0
4 KB
8 KB
12 KB
0
4 KB
8 KB
12 KB
16 KB
20 KB
24 KB
28 KB
Disk
7
AM
LaCASA
Virtual Memory Terminology
Virtual Address
address used by the programmer;
CPU produces virtual addresses
Virtual Address Space
collection of such addresses
Memory (Physical or Real) Address
address of word in physical memory
Memory mapping or address translation
process of virtual to physical address translation
More on terminology
Page or Segment Block
Page Fault or Address Fault Miss
8
AM
LaCASA
Comparing the 2 levels of hierarchy
Parameter L1 Cache Virtual Memory
Block/Page 16B – 128B 4KB – 64KB
Hit time 1 – 3 cc 50 – 150 cc
Miss Penalty
(Access time)
(Transfer time)
8 – 150 cc
6 – 130 cc
2 – 20 cc
1M – 10M cc (Page Fault )
800K – 8M cc
200K – 2M cc
Miss Rate 0.1 – 10% 0.00001 – 0.001%
Placement: DM or N-way SA Fully associative (OS allows pages to
be placed anywhere in main memory)
Address
Mapping
25-45 bit physical address to
14-20 bit cache address
32-64 bit virtual address to 25-
45 bit physical address
Replacement:LRU or Random (HW cntr.) LRU (SW controlled)
Write PolicyWB or WT WB
9
AM
LaCASA
Paging vs. Segmentation
Two classes of virtual memory
Pages - fixed size blocks (4KB – 64KB)
Segments - variable size blocks
(1B – 64KB/4GB)
Hybrid approach: Paged segments –
a segment is an integral number of pages
Code Data
Paging
Segmentation
10
AM
LaCASA
Paging vs. Segmentation:
Pros and Cons
Page Segment
Words per address One Two (segment + offset)
Programmer visible? Invisible to AP May be visible to AP
Replacing a block Trivial (all blocks are
the same size)
Hard (must find contiguous,
variable-size unused portion
Memory use
inefficiency
Internal fragmentation
(unused portion of
page)
External fragmentation
(unused pieces of main
memory)
Efficient disk trafficYes (adjust page size
to balance access time
and transfer time)
Not always (small segments
transfer few bytes)
11
AM
LaCASA
Program
operates in
its virtual
address
space
HW
mapping
Physical
memory
(incl. caches)
virtual
address
(inst. fetch
load, store)
physical
address
(inst. fetch
load, store)
Virtual to Physical Addr. Translation
Each program operates in its own
virtual address space
Each is protected from the other
OS can decide where each goes in memory
Combination of HW + SW provides
virtual physical mapping
12
AM
LaCASA
Use table lookup (“Page Table”) for mappings:
Virtual Page number is index
Virtual Memory Mapping Function
Physical Offset = Virtual Offset
Physical Page Number (P.P.N. or “Page frame”)
= PageTable[Virtual Page Number]
Virtual Memory Mapping Function
Virtual Page No. Offset
Phys. Page No. Offset
translation
Virtual
Address
Physical
Address
31 ... 109...0
29 ... 109...0
13
AM
LaCASA
Address Mapping: Page Table
Virtual Address:
virtual page no.offset
index
into
Page
Table
Physical Address
Page Table
Valid
Access
Rights
Physical Page
Number
...
Page Table
Base Reg
physical page no.offset
14
AM
LaCASA
Page Table
A page table is an operating system structure which
contains the mapping of
virtual addresses to physical locations
There are several different ways,
all up to the operating system, to keep this data
around
Each process running in the operating system
has its own page table
“State” of process is PC, all registers, plus page table
OS changes page tables by changing contents of
Page Table Base Register
15
AM
LaCASA
Page Table Entry (PTE) Format
Valid bit indicates if page is in memory
OS maps to disk if Not Valid (V = 0)
Contains mappings for every possible virtual page
If valid, also check if have permission to use page:
Access Rights (A.R.) may be
Read Only, Read/Write, Executable
P.T.E.
V. A.R. P.P.T.
Valid Access
Rights
Physical Page
Number
V. A.R. P.P.T
... ... ....
Page Table
16
AM
LaCASA
Virtual Memory Problem #1
Not enough physical memory!
Only, say, 64 MB of physical memory
N processes, each 4GB of virtual memory!
Could have 1K virtual pages/physical page!
Spatial Locality to the rescue
Each page is 4 KB, lots of nearby references
No matter how big program is,
at any time only accessing a few pages
“Working Set”: recently used pages
17
AM
LaCASA
VM Problem #2: Fast Address
Translation
PTs are stored in main memory
Every memory access logically takes at least
twice as long, one access to obtain physical address
and second access to get the data
Observation: locality in pages of data, must be
locality in virtual addresses of those pages
Remember the last translation(s)
Address translations are kept in a special cache
called Translation Look-Aside Buffer or TLB
TLB must be on chip;
its access time is comparable to cache
18
AM
LaCASA
Typical TLB Format
Tag: Portion of virtual address
Data: Physical Page number
Dirty: since use write back, need to know whether or
not to write page to disk when replaced
Ref: Used to help calculate LRU on replacement
Valid: Entry is valid
Access rights: R (read permission), W (write perm.)
Virtual Addr.Physical
Addr.
DirtyRef Valid Access
Rights
19
AM
LaCASA
Translation Look-Aside Buffers
TLBs usually small, typically 128 - 256 entries
Like any other cache, the TLB can be fully
associative, set associative, or direct mapped
VA
Processor
PA
TLB
Lookup
Cache
Main
Memory
miss
hit
Data
hit
miss
Translation
20
AM
LaCASA
TLB Translation Steps
Assume 32 entries, fully-associative TLB
(Alpha AXP 21064)
1: Processor sends the virtual address to all
tags
2: If there is a hit (there is an entry in TLB
with that Virtual Page number and valid bit is
1) and there is no access violation, then
3: Matching tag sends the corresponding
Physical Page number
4: Combine Physical Page number and
Page Offset to get full physical address
21
AM
LaCASA
What if not in TLB?
Option 1: Hardware checks page table and loads
new Page Table Entry into TLB
Option 2: Hardware traps to OS, up to OS to decide
what to do
When in the operating system, we don't do translation
(turn off virtual memory)
The operating system knows which program caused
the TLB fault, page fault, and knows what the virtual
address desired was requested
So it looks the data up in the page table
If the data is in memory, simply add the entry to the
TLB, evicting an old entry from the TLB
22
AM
LaCASA
What if the data is on disk?
We load the page off the disk into
a free block of memory, using a DMA transfer
Meantime we switch to some other process
waiting to be run
When the DMA is complete, we get an
interrupt and update the process's page table
So when we switch back to the task,
the desired data will be in memory
23
AM
LaCASA
What if we don't have enough
memory?
We chose some other page belonging to a
program and transfer it onto the disk if it is
dirty
If clean (other copy is up-to-date),
just overwrite that data in memory
We chose the page to evict based on
replacement policy (e.g., LRU)
And update that program's page table to
reflect the fact that its memory moved
somewhere else
24
AM
LaCASA
Page Replacement Algorithms
First-In/First Out
in response to page fault, replace the page that has
been in memory for the longest period of time
does not make use of the principle of locality:
an old but frequently used page could be replaced
easy to implement
(OS maintains history thread through page table
entries)
usually exhibits the worst behavior
Least Recently Used
selects the least recently used page for replacement
requires knowledge of past references
more difficult to implement, good performance
25
AM
LaCASA
Page Replacement Algorithms (cont’d)
Not Recently Used
(an estimation of LRU)
A reference bit flag is associated to each page
table entry such that
Ref flag = 1 - if page has been referenced in recent
past
Ref flag = 0 - otherwise
If replacement is necessary, choose any page
frame such that its reference bit is 0
OS periodically clears the reference bits
Reference bit is set whenever a page is
accessed
26
AM
LaCASA
Selecting a Page Size
Balance forces in favor of larger pages versus those
in favoring smaller pages
Larger page
Reduce size PT (save space)
Larger caches with fast hits
More efficient transfer from the disk or possibly over
the networks
Less TLB entries or less TLB misses
Smaller page
better conserve space, less wasted storage
(Internal Fragmentation)
shorten startup time, especially with plenty of small
processes
27
AM
LaCASA
VM Problem #3: Page Table too big!
Example
4GB Virtual Memory ÷ 4 KB page
=> ~ 1 million Page Table Entries
=> 4 MB just for Page Table for 1 process,
25 processes => 100 MB for Page Tables!
Problem gets worse on modern 64-bits
machines
Solution is Hierarchical Page Table
28
AM
LaCASA
Page Table Shrink
Single Page Table Virtual Address
Multilevel Page Table Virtual Address
Only have second level page table for valid entries
of super level page table
If only 10% of entries of Super Page Table
are valid, then total mapping size is roughly 1/10-th of
single level page table
20 bits 12 bits
10 bits 10 bits 12 bits
Page Number Offset
Super Page Number Page Number Offset
30
AM
LaCASA
The Big Picture
TLB access
TLB hit?
Virtual address
try to read
from PT
page fault?
replace
page from
disk
TLB miss
stall
No Yes
Write?
try to read
from cache
Cache hit?
No Yes
Set in TLB
cache/buffer
mem. write
Deliver data to CPU
cache miss
stall
YesNo
No
Yes
31
AM
LaCASA
The Big Picture (cont’d)
L1-8K, L2-4M, Page-8K, cl-64B, VA-64b, PA-41b
28 ?
32
AM
LaCASA
Things to Remember
Apply Principle of Locality Recursively
Manage memory to disk? Treat as cache
Included protection as bonus, now critical
Use Page Table of mappings vs. tag/data in cache
Spatial locality means Working Set of pages is all that
must be in memory for process to run
Virtual memory to Physical Memory Translation
too slow?
Add a cache of Virtual to Physical Address
Translations, called a TLB
Need more compact representation to reduce
memory size cost of simple 1-level page table
(especially 32 64-bit address)
33
AM
LaCASA
Main Memory Background
Next level down in the hierarchy
satisfies the demands of caches + serves as the I/O interface
Performance of Main Memory:
Latency: Cache Miss Penalty
Access Time: time between when a read is requested and
when the desired word arrives
Cycle Time: minimum time between requests to memory
Bandwidth (the number of bytes read or written per unit time):
I/O & Large Block Miss Penalty (L2)
Main Memory is DRAM: Dynamic Random Access Memory
Dynamic since needs to be refreshed periodically (8 ms, 1% time)
Addresses divided into 2 halves (Memory as a 2D matrix):
RAS or Row Access Strobe + CAS or Column Access Strobe
Cache uses SRAM: Static Random Access Memory
No refresh (6 transistors/bit vs. 1 transistor)
34
AM
LaCASA
Memory Background:
Static RAM (SRAM)
Six transistors in cross connected fashion
Provides regular AND inverted outputs
Implemented in CMOS process
Single Port 6-T SRAM Cell
35
AM
LaCASA
SRAM cells exhibit high speed/poor density
DRAM: simple transistor/capacitor pairs in high
density form
Memory Background:
Dynamic RAM
Word Line
Bit Line
C
Sense Amp
.
.
.
37
AM
LaCASA
Memory Organizations
Simple: CPU,
Cache, Bus, Memory
same width
(32 or 64 bits)
Wide: CPU/Mux 1 word;
Mux/Cache, Bus, Memory
N words
(Alpha: 64 bits & 256 bits;
UtraSPARC 512)
Interleaved: CPU,
Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved
38
AM
LaCASA
1
st
Technique for Higher Bandwidth:
Wider Main Memory (cont’d)
Timing model (word size is 8bytes = 64bits)
4cc to send address, 56cc for access time per word,
4cc to send data
Cache Block is 4 words
Simple M.P.= 4 x (4+56+4) = 256cc (1/8 B/cc)
Wide M.P.(2W) = 2 x (4+56+4) = 128 cc (1/4 B/cc)
Wide M.P.(4W) = 4+56+4 = 64 cc (1/2 B/cc)
39
AM
LaCASA
2
nd
Technique for Higher Bandwidth:
Simple Interleaved Memory
Take advantage of potential parallelism of having many chips in a
memory system
Memory chips are organized in banks allowing multi-word read
or writes at a time
Interleaved M.P. = 4 + 56 + 4x4 = 76 cc (0.4B/cc)
Bank 0
0
4
8
12
Bank 1
1
5
9
13
Bank 2
2
6
10
14
Bank 3
3
7
11
15
40
AM
LaCASA
2
nd
Technique for Higher Bandwidth:
Simple Interleaved Memory (cont’d)
How many banks?
number banks number clocks to access word in
bank
For sequential accesses, otherwise will return to
original bank before it has next word ready
Consider the following example:
10cc to read a word from a bank, 8 banks
Problem#1: Chip size increase
512MB DRAM using 4Mx4bits: 256 chips =>
easy to organize in 16 banks with 16 chips
512MB DRAM using 64Mx4bits: 16 chips => 1 bank?
Problem#2: Difficulty in main memory expansion
41
AM
LaCASA
3
rd
Technique for Higher Bandwidth:
Independent Memory Banks
Memory banks for independent accesses
vs. faster sequential accesses
Multiprocessor
I/O
CPU with Hit under n Misses, Non-blocking Cache
Superbank: all memory active
on one block transfer (or Bank)
Bank: portion within a superbank that is word
interleaved (or Subbank)
Superbank number
Superbank offset
Bank number Bank offset
42
AM
LaCASA
Avoiding Bank Conflicts
Lots of banks
Even with 128 banks,
since 512 is multiple of 128,
conflict on word accesses
SW: loop interchange or
declaring array not power of 2 (“array padding”)
HW: Prime number of banks
bank number = address mod number of banks
address within bank = address / number of words in bank
modulo & divide per memory access with prime no. banks?
address within bank = address mod number words in bank
bank number? easy if 2N words per bank
int x[256][512];
for (j = 0; j < 512; j = j+1)
for (i = 0; i < 256; i = i+1)
x[i][j] = 2 * x[i][j];
43
AM
LaCASA
Fast Bank Number
Chinese Remainder Theorem - As long as two sets of integers a
i
and b
i
follow these rules
a
i
and a
j
are co-prime if i j,
then the integer x has only one solution (unambiguous mapping):
bank number = b0, number of banks = a0 (= 3 in example)
address within bank = b1, number of words in bank = a1 (= 8 in ex.)
N word address 0 to N-1, prime no. banks, words power of 2
Seq. Interleaved Modulo Interleaved
Bank Number: 0 1 2 0 1 2
Address within
Bank: 0 0 1 2 0 16 8
1 3 4 5 9 1 17
2 6 7 8 18 10 2
3 9 10 11 3 19 11
4 12 13 14 12 4 20
5 15 16 17 21 13 5
6 18 19 20 6 22 14
7 21 22 23 15 7 23
...,,
210
00 aaaxabaMODxb
iiii
44
AM
LaCASA
DRAM logical organization (64 Mbit)
Square root of bits per RAS/CAS
45
AM
LaCASA
4 Key DRAM Timing Parameters
t
RAC
: minimum time from RAS line falling to the valid data output
Quoted as the speed of a DRAM when buy
A typical 4Mb DRAM t
RAC
= 60 ns
Speed of DRAM since on purchase sheet?
t
RC
: minimum time from the start of one row access to the start
of the next
t
RC
= 110 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
t
CAC: minimum time from CAS line falling to valid data output
15 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
t
PC
: minimum time from the start of one column access to the
start of the next
35 ns for a 4Mbit DRAM with a t
RAC of 60 ns
46
AM
LaCASA
A
D
OE_L
256K x 8
DRAM
9 8
WE_LCAS_LRAS_L
DRAM Read Timing
Every DRAM access begins at:
The assertion of the RAS_L
2 ways to read:
early or late v. CAS
OE_L
ARow Address
WE_L
Junk
Read Access
Time
Output Enable
Delay
CAS_L
RAS_L
Col Address Row Address JunkCol Address
DHigh Z Data Out
DRAM Read Cycle Time
Early Read Cycle: OE_L asserted before CAS_LLate Read Cycle: OE_L asserted after CAS_L
Junk Data Out High Z
47
AM
LaCASA
DRAM Performance
A 60 ns (t
RAC) DRAM can
perform a row access only every 110 ns (t
RC)
perform column access (t
CAC
) in 15 ns, but
time between column accesses is at least 35
ns (t
PC
).
In practice, external address delays and turning
around buses make it 40 to 50 ns
These times do not include the time to drive
the addresses off the microprocessor nor the
memory controller overhead!
48
AM
LaCASA
Improving Memory Performance in
Standard DRAM Chips
Fast Page Mode
allow repeated access to the row buffer
without another row access
49
AM
LaCASA
Improving Memory Performance in
Standard DRAM Chips (cont’d)
Synchronous DRAM
add a clock signal to the DRAM interface
DDR – Double Data Rate
transfer data on both the rising and falling edge of the
clock signal
50
AM
LaCASA
Improving Memory Performance via a
New DRAM Interface: RAMBUS
(cont’d)
RAMBUS provides a new interface – memory
chip now acts more like a system
First generation: RDRAM
Protocol based RAM w/ narrow (16-bit) bus
High clock rate (400 Mhz), but long latency
Pipelined operation
Multiple arrays w/ data transferred on both
edges of clock
Second generation: direct RDRAM
(DRDRAM) offers up to 1.6 GB/s
51
AM
LaCASA
Improving Memory Performance via a
New DRAM Interface: RAMBUS
RDRAM Memory System
RAMBUS Bank