Virtual Memory: Topics in computer organization and architecture

7 views 51 slides May 11, 2025
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

Virtual memory is a key concept in computer organization and architecture, enabling a system to use more memory than physically available. It works by using a combination of hardware and software to map virtual addresses (used by programs) to physical addresses in RAM and on secondary storage (like ...


Slide Content

CPE 631 Memory
Electrical and Computer Engineering
University of Alabama in Huntsville
Aleksandar Milenkovic
[email protected]
http://www.ece.uah.edu/~milenka

2
AM
LaCASA
Virtual Memory: Topics

Why virtual memory?

Virtual to physical address translation

Page Table

Translation Lookaside Buffer (TLB)

3
AM
LaCASA
Another View of Memory Hierarchy
Regs
L2 Cache
Memory
Disk
Tape
Instructions, Operands
Blocks
Pages
Files
Upper Level
Lower Level
Faster
Larger
Cache
Blocks
Thus
far
{
Next:
Virtual
Memory
{

4
AM
LaCASA
Why Virtual Memory?

Today computers run multiple processes,
each with its own address space

Too expensive to dedicate a full-address-space
worth of memory for each process

Principle of Locality

allows caches to offer speed of cache memory
with size of DRAM memory

DRAM can act as a “cache” for secondary storage
(disk)  Virtual Memory

Virtual memory – divides physical memory into
blocks and allocate them to different processes

5
AM
LaCASA
Virtual Memory Motivation
Historically virtual memory was invented when
programs became too large for physical memory
Allows OS to share memory and protect programs
from each other (main reason today)
Provides illusion of very large memory

sum of the memory of many jobs
greater than physical memory

allows each job to exceed the size of physical mem.
Allows available physical memory
to be very well utilized
Exploits memory hierarchy
to keep average access time low

6
AM
LaCASA
Mapping Virtual to Physical Memory

Program with 4 pages (A, B, C, D)

Any chunk of Virtual Memory assigned
to any chuck of Physical Memory (“page”)
Physical MemoryVirtual Memory
A
B
C
D
D
A
B
C
0
4 KB
8 KB
12 KB
0
4 KB
8 KB
12 KB
16 KB
20 KB
24 KB
28 KB
Disk

7
AM
LaCASA
Virtual Memory Terminology
Virtual Address

address used by the programmer;
CPU produces virtual addresses
Virtual Address Space

collection of such addresses
Memory (Physical or Real) Address

address of word in physical memory
Memory mapping or address translation

process of virtual to physical address translation
More on terminology

Page or Segment  Block
Page Fault or Address Fault  Miss

8
AM
LaCASA
Comparing the 2 levels of hierarchy
Parameter L1 Cache Virtual Memory
Block/Page 16B – 128B 4KB – 64KB
Hit time 1 – 3 cc 50 – 150 cc
Miss Penalty
(Access time)
(Transfer time)
8 – 150 cc
6 – 130 cc
2 – 20 cc
1M – 10M cc (Page Fault )
800K – 8M cc
200K – 2M cc
Miss Rate 0.1 – 10% 0.00001 – 0.001%
Placement: DM or N-way SA Fully associative (OS allows pages to
be placed anywhere in main memory)
Address
Mapping
25-45 bit physical address to
14-20 bit cache address
32-64 bit virtual address to 25-
45 bit physical address
Replacement:LRU or Random (HW cntr.) LRU (SW controlled)
Write PolicyWB or WT WB

9
AM
LaCASA
Paging vs. Segmentation

Two classes of virtual memory

Pages - fixed size blocks (4KB – 64KB)

Segments - variable size blocks
(1B – 64KB/4GB)

Hybrid approach: Paged segments –
a segment is an integral number of pages
Code Data
Paging
Segmentation

10
AM
LaCASA
Paging vs. Segmentation:
Pros and Cons
Page Segment
Words per address One Two (segment + offset)
Programmer visible? Invisible to AP May be visible to AP
Replacing a block Trivial (all blocks are
the same size)
Hard (must find contiguous,
variable-size unused portion
Memory use
inefficiency
Internal fragmentation
(unused portion of
page)
External fragmentation
(unused pieces of main
memory)
Efficient disk trafficYes (adjust page size
to balance access time
and transfer time)
Not always (small segments
transfer few bytes)

11
AM
LaCASA
Program
operates in
its virtual
address
space
HW
mapping
Physical
memory
(incl. caches)
virtual
address
(inst. fetch
load, store)
physical
address
(inst. fetch
load, store)
Virtual to Physical Addr. Translation

Each program operates in its own
virtual address space

Each is protected from the other

OS can decide where each goes in memory

Combination of HW + SW provides
virtual  physical mapping

12
AM
LaCASA
Use table lookup (“Page Table”) for mappings:
Virtual Page number is index
Virtual Memory Mapping Function

Physical Offset = Virtual Offset

Physical Page Number (P.P.N. or “Page frame”)
= PageTable[Virtual Page Number]
Virtual Memory Mapping Function
Virtual Page No. Offset
Phys. Page No. Offset
translation
Virtual
Address
Physical
Address
31 ... 109...0
29 ... 109...0

13
AM
LaCASA
Address Mapping: Page Table
Virtual Address:
virtual page no.offset
index
into
Page
Table
Physical Address
Page Table
Valid
Access
Rights
Physical Page
Number
...
Page Table
Base Reg
physical page no.offset

14
AM
LaCASA
Page Table

A page table is an operating system structure which
contains the mapping of
virtual addresses to physical locations

There are several different ways,
all up to the operating system, to keep this data
around

Each process running in the operating system
has its own page table

“State” of process is PC, all registers, plus page table

OS changes page tables by changing contents of
Page Table Base Register

15
AM
LaCASA
Page Table Entry (PTE) Format

Valid bit indicates if page is in memory

OS maps to disk if Not Valid (V = 0)

Contains mappings for every possible virtual page

If valid, also check if have permission to use page:
Access Rights (A.R.) may be
Read Only, Read/Write, Executable
P.T.E.
V. A.R. P.P.T.
Valid Access
Rights
Physical Page
Number
V. A.R. P.P.T
... ... ....
Page Table

16
AM
LaCASA
Virtual Memory Problem #1

Not enough physical memory!

Only, say, 64 MB of physical memory

N processes, each 4GB of virtual memory!

Could have 1K virtual pages/physical page!

Spatial Locality to the rescue

Each page is 4 KB, lots of nearby references

No matter how big program is,
at any time only accessing a few pages

“Working Set”: recently used pages

17
AM
LaCASA
VM Problem #2: Fast Address
Translation
PTs are stored in main memory
 Every memory access logically takes at least
twice as long, one access to obtain physical address
and second access to get the data
Observation: locality in pages of data, must be
locality in virtual addresses of those pages
Remember the last translation(s)
Address translations are kept in a special cache
called Translation Look-Aside Buffer or TLB
TLB must be on chip;
its access time is comparable to cache

18
AM
LaCASA
Typical TLB Format
Tag: Portion of virtual address
Data: Physical Page number
Dirty: since use write back, need to know whether or
not to write page to disk when replaced
Ref: Used to help calculate LRU on replacement
Valid: Entry is valid
Access rights: R (read permission), W (write perm.)
Virtual Addr.Physical
Addr.
DirtyRef Valid Access
Rights

19
AM
LaCASA
Translation Look-Aside Buffers

TLBs usually small, typically 128 - 256 entries

Like any other cache, the TLB can be fully
associative, set associative, or direct mapped
VA
Processor
PA
TLB
Lookup
Cache
Main
Memory
miss
hit
Data
hit
miss
Translation

20
AM
LaCASA
TLB Translation Steps
Assume 32 entries, fully-associative TLB
(Alpha AXP 21064)
1: Processor sends the virtual address to all
tags
2: If there is a hit (there is an entry in TLB
with that Virtual Page number and valid bit is
1) and there is no access violation, then
3: Matching tag sends the corresponding
Physical Page number
4: Combine Physical Page number and
Page Offset to get full physical address

21
AM
LaCASA
What if not in TLB?

Option 1: Hardware checks page table and loads
new Page Table Entry into TLB

Option 2: Hardware traps to OS, up to OS to decide
what to do

When in the operating system, we don't do translation
(turn off virtual memory)

The operating system knows which program caused
the TLB fault, page fault, and knows what the virtual
address desired was requested

So it looks the data up in the page table

If the data is in memory, simply add the entry to the
TLB, evicting an old entry from the TLB

22
AM
LaCASA
What if the data is on disk?

We load the page off the disk into
a free block of memory, using a DMA transfer

Meantime we switch to some other process
waiting to be run

When the DMA is complete, we get an
interrupt and update the process's page table

So when we switch back to the task,
the desired data will be in memory

23
AM
LaCASA
What if we don't have enough
memory?

We chose some other page belonging to a
program and transfer it onto the disk if it is
dirty

If clean (other copy is up-to-date),
just overwrite that data in memory

We chose the page to evict based on
replacement policy (e.g., LRU)

And update that program's page table to
reflect the fact that its memory moved
somewhere else

24
AM
LaCASA
Page Replacement Algorithms
First-In/First Out

in response to page fault, replace the page that has
been in memory for the longest period of time
does not make use of the principle of locality:
an old but frequently used page could be replaced
easy to implement
(OS maintains history thread through page table
entries)

usually exhibits the worst behavior
Least Recently Used

selects the least recently used page for replacement

requires knowledge of past references

more difficult to implement, good performance

25
AM
LaCASA
Page Replacement Algorithms (cont’d)
Not Recently Used
(an estimation of LRU)

A reference bit flag is associated to each page
table entry such that

Ref flag = 1 - if page has been referenced in recent
past

Ref flag = 0 - otherwise

If replacement is necessary, choose any page
frame such that its reference bit is 0

OS periodically clears the reference bits

Reference bit is set whenever a page is
accessed

26
AM
LaCASA
Selecting a Page Size
Balance forces in favor of larger pages versus those
in favoring smaller pages
Larger page

Reduce size PT (save space)

Larger caches with fast hits

More efficient transfer from the disk or possibly over
the networks

Less TLB entries or less TLB misses
Smaller page

better conserve space, less wasted storage
(Internal Fragmentation)

shorten startup time, especially with plenty of small
processes

27
AM
LaCASA
VM Problem #3: Page Table too big!

Example

4GB Virtual Memory ÷ 4 KB page
=> ~ 1 million Page Table Entries
=> 4 MB just for Page Table for 1 process,
25 processes => 100 MB for Page Tables!

Problem gets worse on modern 64-bits
machines

Solution is Hierarchical Page Table

28
AM
LaCASA
Page Table Shrink
Single Page Table Virtual Address
Multilevel Page Table Virtual Address
Only have second level page table for valid entries
of super level page table

If only 10% of entries of Super Page Table
are valid, then total mapping size is roughly 1/10-th of
single level page table
20 bits 12 bits
10 bits 10 bits 12 bits
Page Number Offset
Super Page Number Page Number Offset

29
AM
LaCASA
0
Physical
Memory
64 MB
Virtual Memory
Code
Static
Heap
Stack
...
2nd Level
Page Tables
Super
PageTable
2-level Page Table

30
AM
LaCASA
The Big Picture
TLB access
TLB hit?
Virtual address
try to read
from PT
page fault?
replace
page from
disk
TLB miss
stall
No Yes
Write?
try to read
from cache
Cache hit?
No Yes
Set in TLB
cache/buffer
mem. write
Deliver data to CPU
cache miss
stall
YesNo
No
Yes

31
AM
LaCASA
The Big Picture (cont’d)
L1-8K, L2-4M, Page-8K, cl-64B, VA-64b, PA-41b
28 ?

32
AM
LaCASA
Things to Remember
Apply Principle of Locality Recursively
Manage memory to disk? Treat as cache

Included protection as bonus, now critical

Use Page Table of mappings vs. tag/data in cache
Spatial locality means Working Set of pages is all that
must be in memory for process to run
Virtual memory to Physical Memory Translation
too slow?

Add a cache of Virtual to Physical Address
Translations, called a TLB
Need more compact representation to reduce
memory size cost of simple 1-level page table
(especially 32  64-bit address)

33
AM
LaCASA
Main Memory Background
Next level down in the hierarchy

satisfies the demands of caches + serves as the I/O interface
Performance of Main Memory:

Latency: Cache Miss Penalty

Access Time: time between when a read is requested and
when the desired word arrives

Cycle Time: minimum time between requests to memory

Bandwidth (the number of bytes read or written per unit time):
I/O & Large Block Miss Penalty (L2)
Main Memory is DRAM: Dynamic Random Access Memory

Dynamic since needs to be refreshed periodically (8 ms, 1% time)

Addresses divided into 2 halves (Memory as a 2D matrix):

RAS or Row Access Strobe + CAS or Column Access Strobe
Cache uses SRAM: Static Random Access Memory

No refresh (6 transistors/bit vs. 1 transistor)

34
AM
LaCASA
Memory Background:
Static RAM (SRAM)

Six transistors in cross connected fashion

Provides regular AND inverted outputs

Implemented in CMOS process
Single Port 6-T SRAM Cell

35
AM
LaCASA

SRAM cells exhibit high speed/poor density

DRAM: simple transistor/capacitor pairs in high
density form
Memory Background:
Dynamic RAM
Word Line
Bit Line
C
Sense Amp
.
.
.

36
AM
LaCASA
Techniques for Improving Performance

1. Wider Main Memory

2. Simple Interleaved Memory

3. Independent Memory Banks

37
AM
LaCASA
Memory Organizations
Simple: CPU,
Cache, Bus, Memory
same width
(32 or 64 bits)
Wide: CPU/Mux 1 word;
Mux/Cache, Bus, Memory
N words
(Alpha: 64 bits & 256 bits;
UtraSPARC 512)
Interleaved: CPU,
Cache, Bus 1 word:
Memory N Modules
(4 Modules); example is
word interleaved

38
AM
LaCASA
1
st
Technique for Higher Bandwidth:
Wider Main Memory (cont’d)

Timing model (word size is 8bytes = 64bits)

4cc to send address, 56cc for access time per word,
4cc to send data

Cache Block is 4 words

Simple M.P.= 4 x (4+56+4) = 256cc (1/8 B/cc)

Wide M.P.(2W) = 2 x (4+56+4) = 128 cc (1/4 B/cc)

Wide M.P.(4W) = 4+56+4 = 64 cc (1/2 B/cc)

39
AM
LaCASA
2
nd
Technique for Higher Bandwidth:
Simple Interleaved Memory

Take advantage of potential parallelism of having many chips in a
memory system

Memory chips are organized in banks allowing multi-word read
or writes at a time

Interleaved M.P. = 4 + 56 + 4x4 = 76 cc (0.4B/cc)
Bank 0
0
4
8
12
Bank 1
1
5
9
13
Bank 2
2
6
10
14
Bank 3
3
7
11
15

40
AM
LaCASA
2
nd
Technique for Higher Bandwidth:
Simple Interleaved Memory (cont’d)
How many banks?

number banks  number clocks to access word in
bank
For sequential accesses, otherwise will return to
original bank before it has next word ready
Consider the following example:
10cc to read a word from a bank, 8 banks
Problem#1: Chip size increase

512MB DRAM using 4Mx4bits: 256 chips =>
easy to organize in 16 banks with 16 chips

512MB DRAM using 64Mx4bits: 16 chips => 1 bank?
Problem#2: Difficulty in main memory expansion

41
AM
LaCASA
3
rd
Technique for Higher Bandwidth:
Independent Memory Banks

Memory banks for independent accesses
vs. faster sequential accesses

Multiprocessor

I/O

CPU with Hit under n Misses, Non-blocking Cache

Superbank: all memory active
on one block transfer (or Bank)

Bank: portion within a superbank that is word
interleaved (or Subbank)
Superbank number
Superbank offset
Bank number Bank offset

42
AM
LaCASA
Avoiding Bank Conflicts

Lots of banks

Even with 128 banks,
since 512 is multiple of 128,
conflict on word accesses

SW: loop interchange or
declaring array not power of 2 (“array padding”)

HW: Prime number of banks

bank number = address mod number of banks

address within bank = address / number of words in bank

modulo & divide per memory access with prime no. banks?

address within bank = address mod number words in bank

bank number? easy if 2N words per bank
int x[256][512];
for (j = 0; j < 512; j = j+1)
for (i = 0; i < 256; i = i+1)
x[i][j] = 2 * x[i][j];

43
AM
LaCASA
Fast Bank Number
Chinese Remainder Theorem - As long as two sets of integers a
i
and b
i

follow these rules


a
i
and a
j
are co-prime if i  j,
then the integer x has only one solution (unambiguous mapping):

bank number = b0, number of banks = a0 (= 3 in example)

address within bank = b1, number of words in bank = a1 (= 8 in ex.)

N word address 0 to N-1, prime no. banks, words power of 2
Seq. Interleaved Modulo Interleaved
Bank Number: 0 1 2 0 1 2
Address within
Bank: 0 0 1 2 0 16 8
1 3 4 5 9 1 17
2 6 7 8 18 10 2
3 9 10 11 3 19 11
4 12 13 14 12 4 20
5 15 16 17 21 13 5
6 18 19 20 6 22 14
7 21 22 23 15 7 23
...,, 
210
00 aaaxabaMODxb
iiii

44
AM
LaCASA
DRAM logical organization (64 Mbit)
Square root of bits per RAS/CAS

45
AM
LaCASA
4 Key DRAM Timing Parameters
t
RAC
: minimum time from RAS line falling to the valid data output

Quoted as the speed of a DRAM when buy
A typical 4Mb DRAM t
RAC
= 60 ns

Speed of DRAM since on purchase sheet?
t
RC
: minimum time from the start of one row access to the start
of the next
t
RC
= 110 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
t
CAC: minimum time from CAS line falling to valid data output
15 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
t
PC
: minimum time from the start of one column access to the
start of the next
35 ns for a 4Mbit DRAM with a t
RAC of 60 ns

46
AM
LaCASA
A
D
OE_L
256K x 8
DRAM
9 8
WE_LCAS_LRAS_L
DRAM Read Timing
Every DRAM access begins at:
The assertion of the RAS_L

2 ways to read:
early or late v. CAS
OE_L
ARow Address
WE_L
Junk
Read Access
Time
Output Enable
Delay
CAS_L
RAS_L
Col Address Row Address JunkCol Address
DHigh Z Data Out
DRAM Read Cycle Time
Early Read Cycle: OE_L asserted before CAS_LLate Read Cycle: OE_L asserted after CAS_L
Junk Data Out High Z

47
AM
LaCASA
DRAM Performance
A 60 ns (t
RAC) DRAM can
perform a row access only every 110 ns (t
RC)
perform column access (t
CAC
) in 15 ns, but
time between column accesses is at least 35
ns (t
PC
).

In practice, external address delays and turning
around buses make it 40 to 50 ns

These times do not include the time to drive
the addresses off the microprocessor nor the
memory controller overhead!

48
AM
LaCASA
Improving Memory Performance in
Standard DRAM Chips

Fast Page Mode

allow repeated access to the row buffer
without another row access

49
AM
LaCASA
Improving Memory Performance in
Standard DRAM Chips (cont’d)
Synchronous DRAM

add a clock signal to the DRAM interface
DDR – Double Data Rate
transfer data on both the rising and falling edge of the
clock signal

50
AM
LaCASA
Improving Memory Performance via a
New DRAM Interface: RAMBUS
(cont’d)

RAMBUS provides a new interface – memory
chip now acts more like a system

First generation: RDRAM

Protocol based RAM w/ narrow (16-bit) bus

High clock rate (400 Mhz), but long latency

Pipelined operation

Multiple arrays w/ data transferred on both
edges of clock

Second generation: direct RDRAM
(DRDRAM) offers up to 1.6 GB/s

51
AM
LaCASA
Improving Memory Performance via a
New DRAM Interface: RAMBUS
RDRAM Memory System
RAMBUS Bank