CO-UNIT-V-zxzxxxxxxxxxxxxzxxxxzxzz-MA.ppt

AnanyaSingh813245 12 views 91 slides Mar 05, 2025
Slide 1
Slide 1 of 91
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91

About This Presentation

data


Slide Content

Computer Organization
Department of CSE, SSE Mukka
The Memory System
www.bookspar.com | Website for students |
VTU NOTES

Chapter Objectives
Basic memory circuits
Organization of the main memory
Cache memory concept –
Shortens the effective memory access time
Virtual memory mechanism –
Increases the apparent size of the main memory
Secondary storage
Magnetic disks
Optical disks
Magnetic tapes
www.bookspar.com | Website for students |
VTU NOTES

Basic Memory Concepts
The maximum size of the Main Memory (MM) that can be
used in any computer is determined by its addressing
scheme.
For eg.,
16 – bit computer that generates 16-bit addresses is
capable of addressing up to ?
32 – bit computer with 32-bit address can address _____
memory locations
40 – bit computer can address _______ memory
locations
www.bookspar.com | Website for students |
VTU NOTES

Word addressability and byte-
addressability
 If the smallest addressable unit of information is a
memory word, the machine is called word-addressable.
If individual memory bytes are assigned distinct addresses,
the computer is called byte-addressable.
Most of the commercial machines are byte-addressable.
For example in a byte-addressable 32-bit computer, each
memory word contains 4 bytes.
A possible word-address assignment would be:
 Word Address Byte Address
 0 0 1 2 3
 4 4 5 6 7
 8 8 9 1011
www.bookspar.com | Website for students |
VTU NOTES

Basic Memory Concepts
Word length of a computer is the number of bits
actually stored or retrieved in one memory access
For eg., a byte addressable 32-bit computer, whose
instructions generate 32-bit addresses
High order 30 bit to determine which word in memory
Low order 2 bits to determine which byte in that word
Suppose we want to fetch only one byte from a word.
In case of Read operation, other bytes are discarded by
processor
In case of Write operation, care should be taken not to
overwrite other bytes
www.bookspar.com | Website for students |
VTU NOTES

Basic Memory concepts
Data transfer between memory and the processor takes
place through the use of 2 processor registers
MAR – Memory address register
MDR – Memory Data Register
If MAR is k bits long and MDR n bits long
Memory unit may contain up to 2
k
addressable locations
During a memory cycle, n bits of data are transferred between
memory and the processor
No of address lines and data lines in processor?
There are additional control lines read/write, MFC, no of
bytes to be transferred etc
www.bookspar.com | Website for students |
VTU NOTES

Up to 2
k
addressable
MDR
MAR
Figure 5.1. Connection of the memory to the processor.
k-bit
address bus
n-bit
data bus
Control lines
( , MFC, etc.)
Processor
Memory
locations
Word length = n bits
WR/
www.bookspar.com | Website for students |
VTU NOTES

How processor reads data from the
memory ?
Loads the address of the required memory location into
MAR
Sets R/W line to 1
The memory responds by placing the requested data on
data lines
Confirms this action by asserting MFC signal
Upon receipt of MFC signal, processor loads the data on
the data lines in to the MDR register
www.bookspar.com | Website for students |
VTU NOTES

How processor Writes Data into
memory?
Loads the address of the location into MAR
Loads the data into MDR
Indicates Write operation by setting R/W line to 0
www.bookspar.com | Website for students |
VTU NOTES

Some concepts
Memory Access Times: -
 It is a useful measure of the speed of the memory unit. It is
the time that elapses between the initiation of an operation
and the completion of that operation (for example, the time
between READ and MFC).
Memory Cycle Time :-
 It is an important measure of the memory system. It
is the minimum time delay required between the initiations of
two successive memory operations (for example, the time
between two successive READ operations). The cycle time is
usually slightly longer than the access time.
 
www.bookspar.com | Website for students |
VTU NOTES

Random Access Memory (RAM)
A memory unit is called a Random Access Memory if
any location can be accessed for a READ or WRITE operation
in some fixed amount of time that is independent of the
location’s address.
Main memory units are of this type.
This distinguishes them from serial or partly serial access
storage devices such as magnetic tapes and disks which
are used as the secondary storage device.
www.bookspar.com | Website for students |
VTU NOTES

Cache Memory
The CPU processes instructions and data faster than they can
be fetched from compatibly priced main memory unit.
Memory cycle time becomes the bottleneck in the system.
One way to reduce the memory access time is to use cache
memory.
Its a small and fast memory that is inserted between the larger,
slower main memory and the CPU.
Holds the currently active segments of a program and its data.
Because of the locality of address references,
CPU finds the relevant information mostly in the cache memory itself
(cache hit)
infrequently needs access to the main memory (cache miss)
With suitable size of the cache memory, cache hit rates of
over 90% are possible
www.bookspar.com | Website for students |
VTU NOTES

Memory Interleaving
This technique divides the memory system into a number
of memory modules
Arranges addressing so that successive words in the
address space are placed in different modules.
When requests for memory access involve consecutive
addresses, the access will be to different modules.
Since parallel access to these modules is possible, the average
rate of fetching words from the Main Memory can be
increased
www.bookspar.com | Website for students |
VTU NOTES

Virtual Memory
In a virtual memory System, the addresses generated by the
program may be different from the actual physical address
the address generated by the CPU is referred to as a virtual or
logical address.
The required mapping between physical memory and logical
address space is implemented by a special memory control
unit, called the memory management unit.
The mapping function may be changed during program
execution according to system requirements.
The logical (virtual) address space
can be as large as the addressing capability of the CPU
The physical address space
the actual physical memory can be much smaller.
www.bookspar.com | Website for students |
VTU NOTES

Virtual memory
Only the active portion of the virtual address space is
mapped onto the physical memory
the rest of the virtual address space is mapped onto the bulk
storage device like magnetic disks( hard disks)
If the addressed information is in the Main Memory (MM),
it is accessed and execution proceeds.
Otherwise, an exception is generated, in response to
which
the memory management unit transfers a contiguous
block of words containing the desired word from the
bulk storage unit to the MM,
displacing some block that is currently inactive.
www.bookspar.com | Website for students |
VTU NOTES

FF
Figure 5.2. Organization of bit cells in a memory chip.
circuit
Sense / Write
Address
decoder
FF
CS
cells
Memory
circuit
Sense / Write Sense / Write
circuit
Data input/output lines:
A0
A
1
A
2
A
3
W
0
W1
W15
b
7 b
1 b
0
WR/
b
7 b
1 b
0
b
7 b
1 b
0


















•••
•••
•••
www.bookspar.com | Website for students |
VTU NOTES

An example of memory organization
A memory chip consisting of 16 words of 8 bits each, which is
usually referred to as a 16 x 8 organization.
  The data input and the data output of each
Sense/Write circuit are connected to a single bi-directional
data line in order to reduce the number of pins required.
One control line, the R/W (Read/Write) input is used a specify
the required operation and
another control line, the CS (Chip Select) input is used to
select a given chip in a multichip memory system.
This circuit requires 14 external connections, and allowing 2
pins for power supply and ground connections, can be
manufactured in the form of a 16-pin chip.
It can store 16 x 8 = 128 bits.
www.bookspar.com | Website for students |
VTU NOTES

Figure 5.3. Organization of a 1K  1 memory chip.
CS
Sense/Write
circuitry
array
memory cell
address
5-bit row
input/output
Data
5-bit
decoder
address
5-bit column
address
10-bit
output multiplexer
32-to-1
input demultiplexer
3232
WR/
W0
W1
W31
and
www.bookspar.com | Website for students |
VTU NOTES

1K X 1 memory chip
The 10-bit address is divided into two groups of 5 bits
each to form the row and column addresses for the cell
array.
A row address selects a row of 32 cells, all of which are
accessed in parallel.
One of these, selected by the column address, is
connected to the external data lines by the input and
output multiplexers.
This structure can store 1024 bits, can be implemented in
a 16-pin chip.
www.bookspar.com | Website for students |
VTU NOTES

Static memories
Memories that consist of circuits capable of retaining
their state as long as power is applied – static memories
Static rams can be accessed very quickly – few nanosecs
www.bookspar.com | Website for students |
VTU NOTES

YX
Word line
Bit lines
Figure 5.4. A static RAM cell.
b
T
2T
1
b
Two inverters
Bit line
2 transistors T1 and T2
When word
line is at
ground level
transistors
are turned off,
and latch
retains its
state
www.bookspar.com | Website for students |
VTU NOTES

Read and Write operation in SRAM
Read
Word line is activated – to close switches T1 and T2
If cell is in state 1, the signal on bit line b is high and signal on
bit line b’ is low
Opposite holds if cell is in state 0
Sense/Write circuits at the end of the bit lines monitor the
states of b and b’ and sets output
Write
State of cell is set by placing appropriate value on bit line ba dn
b’ and then word line is activated
This forces the cell into corresponding state
Required signals on bit lines are generated by Sense/Write
circuit
www.bookspar.com | Website for students |
VTU NOTES

Word line
b
Bit lines
Figure 5.5.An example of a CMOS memory cell.
T1 T2
T
6T
5
T4T3
YX
V
supply
b
www.bookspar.com | Website for students |
VTU NOTES

Dynamic RAMs
Static RAMs are fast but come at a higher cost
Their cells require several transistors
Less expensive RAMs using less no of transistors,
But their cells cannot retain their state indefinitely
Called as Dynamic RAMs
Information stored in the form of charge on a capacitor
This charge can be maintained only for tens of milliseconds
Contents must be periodically refreshed
www.bookspar.com | Website for students |
VTU NOTES

Figure 5.6. A single-transistor dynamic memory cell
T
C
Word line
Bit line
Dynamic RAM – needs to refreshed
periodically to hold data
www.bookspar.com | Website for students |
VTU NOTES

A 16-Mbit DRAM chip, configured as
2M X 8
The cells are organized as 4K X 4K array
The 4096 cells in each row are divided into 512 groups of
8
A row can hence store 512 bytes of data
12 Address bits are required to select a row
9 bits needed to specify a group of 8 bits in the selected row
www.bookspar.com | Website for students |
VTU NOTES

Column
CS
Sense / Write
circuits
cell array
latch
address
Row
Column
latch
decoder
Row
decoder
address
R/W
A
209-
A
80-

D
0
D
7
RAS
CAS
Figure 5.7. Internal organization of a 2M  8 dynamic memory chip.
4096X(512X8)
Timing is
controlled
asynchronously.
Specialized
memory controller
circuit to provide
the necessary
control signals
CAS and RAS,
that govern
the timing.
Hence it is
asynchronous
DRAM
www.bookspar.com | Website for students |
VTU NOTES

Fast Page mode
All bits of a row are sensed but only 8 bits are placed
This byte is selected by column address bits
A simple modification can make it access other bytes of the same
row without having to reselect the row
Add a latch to the output of the sense amplifier in each column
The application of a row address will load latches corresponding to
all bits in a selected row
Need only different column addresses to place the different bytes on the
data lines
Most useful arrangement is to transfer bytes in sequential order
Apply a consecutive sequence of column addresses under the control of
successive CAS signals.
This scheme allows transferring a block of data at a much faster rate
than can be achieved for transfers involving random addresses
This block transfer capability is called as fast page mode
www.bookspar.com | Website for students |
VTU NOTES

SYNCHRONOUS DRAMs
Operation directly synchronized with a clock signal
Called as SDRAMs
The cell array is the same as in Asynchronous DRAMs.
The address and data connections are buffered by means of
registers
Output of each sense amplifier is connected to a latch
A read operation causes the contents of all cells in the selected
row to be loaded into these latches
If an access is made for refreshing purposes only, it wont change
the contents of these latches
Data held in the latches that correspond to the selected
column(s) are transferred into the data output register
www.bookspar.com | Website for students |
VTU NOTES

R/W
RAS
CAS
CS
Clock
Cell array
latch
address
Row
decoder
Row
Figure 5.8. Synchronous DRAM.
decoder
Column Read/Write
circuits & latches
counter
address
Column
Row/Column
address
Data input
register
Data output
register
Data
Refresh
counter
Mode register
and
timing control
www.bookspar.com | Website for students |
VTU NOTES

SYNCHRONOUS DRAMs
SDRAMs have several different modes of operation
Selected by writing control information into a mode register
Can specify burst operations of different lengths
In SDRAMs, it is not necessary to provide externally
generated pulses on the CAS line to select successive
columns
Necessary signals are provided using a column counter and
clock signal
Hence new data can be placed on data lines at each clock cycle
All actions triggered by rising edge of the clock
www.bookspar.com | Website for students |
VTU NOTES

R/W
RAS
CAS
Clock
Figure 5.9. Burst read of length 4 in an SDRAM.
Row Col
D0 D1 D2 D3
Address
Data
www.bookspar.com | Website for students |
VTU NOTES

Burst read of length 4 in an SDRAM.
Row address latched under control of RAS signal
Memory takes about 2-3 cycles to activate selected row
Column address is latched under control of CAS signal
After delay of 1 cycle, first set of data bits placed on data
lines
SDRAM automatically increments column address to access
the next 3 sets of bits in the selected row, placed on data lines
in successive clock cycles
SDRAMs have built in refresh circuitry
Provides the addresses of rows that are selected for refreshing
Each row must be refreshed at least every 64ns
www.bookspar.com | Website for students |
VTU NOTES

Latency and Bandwidth
The parameters that indicate the performance of the
memory
Memory latency – amount of time it takes to transfer a
word of data to or from memory
In block transfers, latency is used to denote the time it takes to
transfer the first word of data
This is longer than the time needed to transfer each subsequent word
of a block
In prev diagram, access cycle begins with assertion of RAS and
first word is transferred 5 cycles later.
Hence latency is 5 clock cycles
www.bookspar.com | Website for students |
VTU NOTES

Bandwidth
Bandwidth usually is the no of bits or bytes that can be
transferred in one sec
Depends on
Speed of memory access
Transfer capability of the links – speed of the bus
No of bits that can be accessed in parallel
Bandwidth is product of the rate at which data are
transferred ( and accessed) and width of the data bus
www.bookspar.com | Website for students |
VTU NOTES

Double – Data – Rate SDRAM ( DDR
SDRAMs)
The standard SDRAM performs all actions on the rising edge
of the clock signal
DDR SDRAMs access the cell array in same way but transfers
data on both the edges of the clock
The latency is the same as standard SDRAMs
But since they transfer data on both the edges of clock,
bandwidth is essentially doubled for long burst transfers
To make this possible, the cell array is organized into 2 banks
Each bank can be accessed separately
Consecutive words of a given block are stored in different banks
Efficiently used in applications where block transfers are
prevalent
Eg., main memory to and from processor caches
www.bookspar.com | Website for students |
VTU NOTES

Questions for assignment
1. Explain how processor reads and writes data froma dn to
memory
2. explain organization of 1K X 1 memory chip
3. Explain a single SRAM cell with diagram. How read and
write operations are carried out?
4. Explain DRAM cell with diagram. How read and write
operations are carried out?
5. Explain 2M X 8 DRAM chip. How can you modify this for
fast page mode
6. Explain SDRAMs with help of a diagram
7. Explain the terms latency and bandwidth
8. Explain the burst length read of 4 in SDRAM with timing
diagram
9. Explain DDR SDRAMs
www.bookspar.com | Website for students |
VTU NOTES

Structure of larger memories
Memory systems connected to form larger memories
There are 2 types of memory systems
Static memory systems
Dynamic memory systems
www.bookspar.com | Website for students |
VTU NOTES

Static Memory systems
Following is the diagram for implementation of 2M X 32 memory using 16
512K X 8 static memory chips
There are 4 columns, each column containing 4 chips to implement one
byte position
Only selected chips ( using chip select input ) place data on output lines
21 address bits are needed to select a 32 bit word in this memory
high order 2 bits used to determine which of the 4 chip select signals should be
activated
19 bits used to access specific byte locations inside each chip of selected row
R/W inputs of each chip are tied together to form a single R/W signal
Dynamic memory systems are organized much in the same manner as
static
Physical implementation more conveniently done in the form of memory
modules
www.bookspar.com | Website for students |
VTU NOTES

Figure 5.10. Organization of a 2M  32 memory module using 512K  8 static memory chips.
19-bit internal chip address
Chip select
memory chip
decoder
2-bit
addresses
21-bit
19-bit
address
512 K X 8
A
0
A
1
A
19
memory chip
A
20
D
31-24
D
7-0
D
23-16
D
15-8
512 K X 8
8-bit data
input/output
www.bookspar.com | Website for students |
VTU NOTES

Memory System Considerations
The choice of a RAM for a given system depends on several
factors
Cost
Speed
Power dissipation
Size of chip
Static RAMs are used when very fast operation is the primary
requirement
Used mostly in cache memories
Dynamic RAMs are predominant choice for computer main
memories
High densities achievable make larger memories economically
feasible
www.bookspar.com | Website for students |
VTU NOTES

Memory Controller
To reduce number of pins, dynamic memory chips use
multiplexed address inputs
Address divided into 2 parts
High-order address bits, to select a row in a cell array, are
provided first and latched into memory under control of RAS
Low-order address bits, to select a column, are provided on
the same address pins and latched under CAS signal
Processor issues all bits of address at the same time
The required multiplexing of address bits are performed
by a memory controller circuit,
www.bookspar.com | Website for students |
VTU NOTES

Processor
RAS
CAS
R/W
Clock
Address
Row/Column
address
Memory
controller
R/W
Clock
Request
CS
Data
Memory
Figure 5.11. Use of a memory controller.
www.bookspar.com | Website for students |
VTU NOTES

Memory controller functions
Interposed between processor and memory
Processor sends Request signal
Accepts complete address and R/W signal from the processor
The controller forwards the row and column portions of
address to the memory
Generates the RAS and CAS signals
Also sends R/W and CS signals to the memory
Data lines are directly connected between the processor and
the memory
When used with DRAM chips, the memory controller
provides all the information needed to control the refreshing
process
Contains a refresh counter – to refresh all rows within the time limit
specified for a device
www.bookspar.com | Website for students |
VTU NOTES

RAMBUS Memory
To increase the system bandwidth we need to increase
system bus width or system bus speed
A wide bus is expensive and required lot of space on
motherboard
Rambus – narrow bus but much faster
Key feature is fast signaling method used to transfer
information between chips
Uses the concept of differential signaling
Instead of either 0 volts or Vsupply ( 5 Volts ), uses 0.3 volt differences
from a reference voltage called as Vref
www.bookspar.com | Website for students |
VTU NOTES

READ-ONLY Memories (ROMs)
Both SRAMs and DRAMs are volatile
Loses data if power is turned off
Many applications need to retain data even if power is off
E.g., a hard disk used to store information, including OS
When system is turned on , need to load OS from hard disk to
memory
Need to execute a program that boots OS
That boot program, since is large, is stored on disk
Processor must execute some instructions that load boot program
into memory
So we need a small amount of non volatile memory that holds
instructions needed to load boot program into RAM
Special type of writing process to place info into non volatile
memories
Called as ROM – Read Only Memory
www.bookspar.com | Website for students |
VTU NOTES

Not connected to store a 1
Connected to store a 0
Figure 5.12. A ROM cell.
Word line
P
Bit line
T
www.bookspar.com | Website for students |
VTU NOTES

ROM
Transistor is connected to ground at point P then 0 is
stored
Else 1 is stored
Bit line connected to a power supply through a resistor
To read, word line is activated
If voltage drops down – then 0
If voltage remains same – then 1
www.bookspar.com | Website for students |
VTU NOTES

PROM
Allows data to be loaded by the user
Achieved by inserting a fuse at point P in the prev figure
Before it is programmed, memory contains all 0s
The user can insert 1at required locations using high-
current pulses
Process is irreversible
www.bookspar.com | Website for students |
VTU NOTES

EPROM
Allows the stored data to be erased and new data to be
loaded
Erasable, reprogrammable ROM – called as EPROM
Can be used when memory is being developed
So that it can accommodate changes
Cell structure is similar to ROM
The connection to ground is always made at point P
A special transistor is used – ability to function either as a normal
transistor or as a disabled transistor which is always turned off
Can be programmed to behave as permanently open switch
Can erase by exposing the chip to UV light which dissipate the
charges trapped in transistor memory cells
www.bookspar.com | Website for students |
VTU NOTES

EEPROM
Disadvantage of EPROMs
Chip must be physically removed from the circuit for
reprogramming
Entire contents are erased from UV light
EEPROM – another version of Erasable PROM that can
be both programmed and erased electrically
Need not be removed for erasure
Can erase cell contents selectively
Disadvantage
Different voltages needed for erasing , writing and reading
stored data
www.bookspar.com | Website for students |
VTU NOTES

Flash Memory
An approach similar to EEPROM
A flash cell is based on a single transistor controlled by trapped
charge
In EEPROM can read and write a single cell
In Flash memory – can read the contents of a single cell but
can write only to a block of cells
Flash devices have greater density
Higher capacity
Lower cost per bit
Require a single power supply voltage
Consumes less power in operation
Used in portable equipment that is battery driven – handheld
computers, cell phones, digital cameras, MP3 players
www.bookspar.com | Website for students |
VTU NOTES

Flash Cards and Flash Drives
Single flash chips do not provide sufficient storage capacity
Larger memory modules are required – flash cards and flash
drives
Flash cards
Mount flash chips on a small card
A card is simply plugged into a conveniently accessible slot
Variety of sizes
Flash Drives
Larger modules to replace hard disk drives
Designed to fully emulate hard disks – not yet possible
Storage capacity is significantly lower
Have shorter seek and access times hence faster response
Lower power consumption
Insensitive to vibration
www.bookspar.com | Website for students |
VTU NOTES

Speed, Size and Cost
Ideal memory – fast, large and inexpensive
Very fast memory if SRAM chips are used
These chips are expensive
So impractical to build a large Memory using SRAM chips
DRAM chips are cheaper
But also slower
Solution for space is to provide large secondary storage
devices
Very large disks at reasonable prices
For main memory – use DRAMs
Use SRAMs in smaller memories like cache memory
www.bookspar.com | Website for students |
VTU NOTES

Processor
Primary
cache
Secondary
cache
Main
Magnetic disk
memory
Increasing
size
Increasing
speed
Figure 5.13. Memory hierarchy.
secondary
memory
Increasing
cost per bit
Registers
L1
L2
www.bookspar.com | Website for students |
VTU NOTES

Cache Memories
Speed of main memory slower than modern processors
Processor cannot spend time wasting to access instructions
and data in main memory
Use a cache memory which is much faster and makes the main
memory appear faster to processor than it really is
Effectiveness of cache based on locality of reference – many
instructions in localized areas of the program are executed
repeatedly during some time period, and the remainder of the
program is accessed relatively infrequently. Two ways
Temporal – a recently executed instruction is likely to be executed
again very soon
Spatial – instructions in close proximity to a recently executed
instruction ( with respect to instruction’s address ) are likely to be
executed soon
www.bookspar.com | Website for students |
VTU NOTES

Operation of a cache
If the active segments of the program can be placed in fast
cache memory – can reduce total execution time significantly
Memory control circuitry designed to take advantage of
locality of reference
Temporal – whenever an item( instruction or data) is first needed,
this item is brought into the cache – remains there till needed again
Spatial – instead of fetching just one item from the main memory to
the cache, fetch several items that reside at adjacent addresses. –
referred to as block or cache line.
Replacement algorithm – to decide which block of data to be
moved back from cache to main memory so that a new block
can be accommodated
www.bookspar.com | Website for students |
VTU NOTES

Figure 5.14. Use of a cache memory.
Cache
Main
memory
Processor
www.bookspar.com | Website for students |
VTU NOTES

Operation of a cache
Read request from processor
The contents of a block of memory words containing the location
specified are transferred into the cache one word at a time
When the program references any of the locations in this block, the
desired contents are read directly from the cache.
The cache can store reasonable no of words; but it is small
compared to main memory
The correspondence between main memory blocks and those in the
cache is specified by a mapping function
When the cache is full and a memory word that’s not in cache
is referenced, the cache control hardware decides which block
must be removed to create space for newly arrived block
The collection of rules for this operation is called replacement
algorithm
www.bookspar.com | Website for students |
VTU NOTES

Cache operation
Processor does not explicitly need to know existence of cache
It issues read/write requests using memory addresses
the cache control circuitry determines whether the requested word
is currently in cache.
If in cache, the read/write operation is preformed on appropriate cache
location
In this case, a read or write hit is said to have occurred
If its read operation, then main memory is not involved
For a write operation – there are 2 options
Write-through protocol – both cache and main memory updated
simultaneously
Write-back or copy-back protocol – only cache will be updated during
write operation. ( denote this using a dirty or modified bit) later when
we are moving this block back to main memory – update the main
memory.
www.bookspar.com | Website for students |
VTU NOTES

Limitations of write-through and write
back protocols
The write-through protocol is simpler but it results in
unnecessary write operations in the main memory when
a given cache word is updated several times during its
cache residency
The write-back protocol may also result in unnecessary
write operations because when a cache block is written
back to the memory all words of the block are written
back, even if only a single word has been changed while
the block was in cache
www.bookspar.com | Website for students |
VTU NOTES

Read miss
When the addressed word is not present in cache
The block of words that contains the requested word is
copied from the main memory into cache
After that, the requested word is sent to processor
Alternatively, this word may be sent to the processor as
soon as it is read from the memory
Called as load-through or early restart
Reduces the processor’s waiting period
But needs more complex circuitry
www.bookspar.com | Website for students |
VTU NOTES

Write miss
Occurs if the addressed word is not in cache
If write-through protocol is used, the information is
written directly into the main memory
If write-back protocol is used,
the block containing the addressed word is first brought into
the cache
The desired word in the cache is overwritten with new info
www.bookspar.com | Website for students |
VTU NOTES

Mapping functions
Correspondence between main memory blocks and those in
the cache
3 techniques
Direct mapping
Associative mapping
Set-associative mapping
Consider a cache
128 blocks of 16 words each
Total of 2K words ( 2048)
Consider Main memory
has 16 bit address
64Kwords
4K blocks of 16 words each
Consecutive address refers to consecutive memory locations
www.bookspar.com | Website for students |
VTU NOTES

Direct Mapping
The simplest way to determine cache locations in which to
store memory blocks is the direct-mapping technique
Block j of main memory maps onto block j modulo 128 of the
cache. ( refer following figure )
Whenever one of main memory blocks 0,128,256,.. is loaded
in the cache, it is stored in cache block 0.
Blocks 1,129,257,… are stored in cache block ?
Since more than one memory block is mapped onto a given
cache block position, contention may arise even when the
cache is not full
Eg., instructions of a program may start at block 1 and continue in
block 129 ( possibly after a branch)
Can resolve the contention by allowing new block to overwrite the
currently resident block
www.bookspar.com | Website for students |
VTU NOTES

tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory address
Tag Block Word
Figure 5.15. Direct-mapped cache.
5
www.bookspar.com | Website for students |
VTU NOTES

Direct mapping contd..
Placement of a block in the cache is determined from the
memory address.
Memory address divided into 3 fields
Low order 4 bits – 1 out of 16 words
7 bit cache block field – to determine which cache block this new
block is stored
High- order 5 bits – tag bits associated with the location in cache
Identifies which of the 32 blocks that are mapped into this cache position
are currently resident in cache
If they match then the desired word is in the cache,
If there is no match, the block containing the required word must first be
read from the main memory and loaded into cache
Direct mapping is easy but not flexible
www.bookspar.com | Website for students |
VTU NOTES

Associative mapping
A main memory block can be placed into any cache block
position
12 tag bits to identify a memory block when it is resident in
the cache
The tag bits of an address received from the processor are
compared to the tag bits of each block of the cache to see if
desired block is present
Called as associative - mapping technique
Gives complete freedom in choosing the cache location in
which to place the memory block
New block has to replace an existing block only if the cache is full
Need replacement algorithms to choose which block to replace
Cost of this mapping technique is higher than direct mapping
as we need to search all 128 tag patterns – called as
associative search
www.bookspar.com | Website for students |
VTU NOTES

4
tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Blocki
Block 4095
Block 0
Block 1
Block 127
12 Main memory address
Figure 5.16. Associative-mapped cache.
Tag Word
www.bookspar.com | Website for students |
VTU NOTES

Set-associative mapping
Combination of direct mapping and associative mapping
Blocks of the cache are grouped into sets
Mapping allows a block of the main memory to reside in any block of
a specific set.
So, we have got a few choices where to place the block, the problem
of contention of the direct method is eased
The hardware cost is reduced by decreasing the size of the
associative search.
Following figure is a example – with 2 blocks per set.
Memory blocks 0,64,128,….,4032 map into cache set 0, and they can
occupy either of the two block positions within this set.
Total 64 sets, so we need 6 bits to choose a set
Compare tag field with tags of the cache blocks to check if the
desired block is present
www.bookspar.com | Website for students |
VTU NOTES

tag
tag
tag
Cache
Main
memory
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
tag
tag
Block 2
Block 3
tag
Block 127
Main memory address6 6 4
Tag Set Word
Set 0
Set 1
Set 63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
www.bookspar.com | Website for students |
VTU NOTES

Set associative mapping contd…
No of blocks per set is parameter that can be selected to
suit the requirements of the computer
Four blocks per set can be accommodated by a 5-bit set field.
Eight blocks per set can be accommodated by 4-bit set field
128 blocks per set? Requires no set bits and is fully associative
technique , with 12 tag bits
Other extreme of one block per set is direct-mapping method
A cache with k blocks per set is called as k-way set-
associative cache.
www.bookspar.com | Website for students |
VTU NOTES

Valid bit and cache coherence problem
A control bit called as valid bit is provided for each block
Indicates whether the block contains a valid data
Is different from the dirty or modified bit
Dirty bit is required only in systems that don’t use write-through
method
Valid bit is initially 0 when
power is applied to system
Main memory is loaded with new programs and data from the
disk
Transfers from the disk to the main memory are carried
out by a DMA mechanism
Normally DMA transfers bypass cache ( cost and performance)
www.bookspar.com | Website for students |
VTU NOTES

Valid bit and cache coherence problem
Valid bit of a block is set to 1 the first time block is loaded from
main memory
Whenever a main memory block is updated by a source that
bypasses cache
A check is made to determine whether the block being loaded is
currently in cache
If so, then its valid bit is cleared to 0.
This ensures that stale data does not exist in cache
Whenever a DMA transfer made from the main memory to the
disk, and cache uses write-back protocol
Data in the memory might not reflect the changes that have been made
in the cached copy.
Solution : flush the cache by forcing the dirty data to be written back to
the memory before DMA transfer takes place
Need to ensure that two different entities ( processor and DMA in
this case ) uses the same copies of data is referred to as a cache
coherence problem
www.bookspar.com | Website for students |
VTU NOTES

Replacement algorithms
In direct mapping method, the position of each block is pre-
determined
No replacement strategy exists
In associative and set-associative strategy, there is some
flexibility
If cache is full when a new block arrives, the cache controller must
decide which of the old blocks to overwrite
This decision is very important and determines system performance
Keep the blocks in the cache that may be referenced in the near future
Some algorithms
LRU block
Oldest block
Random block
www.bookspar.com | Website for students |
VTU NOTES

Least Recently Used ( LRU )
replacement algorithm
Uses the property of locality of reference
High probability that the blocks that have been referenced
recently will be referenced soon
So when a block needs to be overwritten, overwrite the one
that has gone the longest time without being referenced
This block is called as least recently used block
The cache controller must track references to all the blocks
Uses a 2-bit counter for a set of 4 blocks
When hit occurs – the block’s counter is made 0
Lower values are incremented by 1
Higher values are unchanged
When a miss occurs –
Set is not full – new block is loaded and assigned counter value 0
Set is full – block with counter value 3 is removed and new block put
in its place. Other 3 blocks’ counters are incremented by 1
www.bookspar.com | Website for students |
VTU NOTES

Reading assignment :
Go through the examples of mapping techniques in the
text book
www.bookspar.com | Website for students |
VTU NOTES

Performance considerations
2 key factors in success of a computer – cost and performance
The objective is to achieve best possible performance at the lowest
possible cost
Challenge in design alternative is to improve performance without
increasing cost
Measure of success – price/performance ratio
Performance depends on how fast instructions can be brought
into the processor for execution and how fast they can be
executed
In this unit, we will focus on the first aspect
In case of memory we need shorter access time and larger
capacity
If we have a slow and faster unit – it is beneficial if we can transfer
data at the rate of faster unit – to achieve this we go for parallel
access using a technique called as interleaving
www.bookspar.com | Website for students |
VTU NOTES

Interleaving
Main memory of a computer is structured as a collection of
physically separate modules
Each with its own address buffer register ( ABR )and data buffer
register ( DBR )
Memory access operations may proceed in more than one module at
the same time.
Two ways of implementing interleaving
High order k bits name one of n modules and low order m bits name a
particular word in that module
When consecutive locations are accessed only one module is involved
Devices with DMA capability can access info from other memory
modules
Low – order k bits select a module and high order m bits name a location
within that module
Consecutive addresses are located in successive modules
Hence faster access and higher average utilization
Called as memory interleaving – more effective way
www.bookspar.com | Website for students |
VTU NOTES

m bits
Address in module MM address
Figure 5.25. Addressing multiple-module memory systems.
(b) Consecutive words in consecutive modules
i
k bits
0
Module
ModuleModule
Module MM address
DBRABRABR DBRABR DBR
Address in module
(a) Consecutive words in a module
i
k bits
Module Module Module
Module
DBRABR DBRABR ABR DBR
0
2
k
1-
n1-
m bits
Go through the example in the text for better understanding
www.bookspar.com | Website for students |
VTU NOTES

Problem
A cache with 8-word blocks, on a read miss, the block that contains
the desired word must be copied from main memory into cache.
Assume – it takes one clock cycle to send an address to main memory
Memory built using DRAM chips – first word access takes 8 cc and
subsequent words in same block can be accessed in 4cc per word.
One CC needed to send one word to cache
Using a single memory module – time needed to load desired block
into cache is
1 + 8 +(7X4) + 1 = 38 CC
Using memory interleaving – 4 words accessed in 8 CC and
transferred in next 4 CC word by word, during which remaining 4
words are read and stored in DBR. These 4 words are transferred
one word at a time to cache
So time required to transfer a block is
1+8+4+4 = 17 CC
www.bookspar.com | Website for students |
VTU NOTES

Hit rate and miss penalty
The number of hits stated as fraction of all attempted
accesses is called the hit rate
The number of misses stated as a fraction of all
attempted accesses is called as miss rate
Hit rates well over 0.9 are essential for high-performance
computers
Performance is adversely affected by the actions that must
be taken after a miss
the extra time needed to bring the desired info into the cache
is called as miss penalty
Miss penalty is the time needed to bring a block of data from a
slower unit in memory hierarchy to a faster unit
Interleaving can reduce miss penalty substantially
www.bookspar.com | Website for students |
VTU NOTES

Problem
Let h be hit rate, M the miss penalty – ( time to access info from main
memory), and C – the time to access information in the cache. Average
access time is
t
ave
= hC + (1-h)M
Consider same parameters as previous problem
If computer has no cache, then using a fast processor and a typical DRAM main
memory, it takes 10 clock cycles for each memory read access
Suppose computer has a cache that holds 8-word blocks and an interleaved main
memory.
Then it requires 17 cycles ( as discussed before ) to load one block to cache
Suppose 30 percent of the instructions in a program perform read or write
operation – 130 memory accesses for every 100 instructions
Assume hit rates are .95 for instructions and .9 for data
Assume miss penalty is same for both read and write accesses
An estimate of improvement in performance is –
Time without cache / time with cache =
(130 X 10) / ( 100(.95X1 + .05X17) + 30(.9X1 + .1X17) ) = 5.04
So computer with cache performs 5 time better ( considering processor clock
and system bus have same speed)
www.bookspar.com | Website for students |
VTU NOTES

Caches on the processor chip
From speed point of view, optimal space for cache is on the
processor chip
Since space on processor chip Is required for many other functions,
this limits the size of cache that can be accommodated
Either a combined cache(offers greater flexibility in mapping) for
instructions and data or separate caches(increases parallel access of
information but more complex circuitry) for instructions and data.
Normally 2 levels of caches are used
L1 and L2 cache
L1 designed to allow very fast access by processor
Its access time will have a very large effect on clock rate of processor
L2 can be slower but it should be much larger to ensure high hit rate
A work station computer may include L1 cache with capacity 10s of
kilobytes and L2 cache with capacity several megabytes
Including L2 cache further reduces the impact of main
memory speed on the performance of the computer
www.bookspar.com | Website for students |
VTU NOTES

Cache on processor chip
Average access time experienced by the processor with 2
levels of caches is
t
ave
= h
1
C
1
+ (1-h
1
)h
2
C
2
+ (1-h
1
)(1-h
2
)M
h1 – hit rate in L1
h2 – hit rate in L2
C1 – time to access info in L1 cache
C2 – time to access info in L2 cache
M – time to access info in main memory
No of misses in L2 cache must be very low
www.bookspar.com | Website for students |
VTU NOTES

Write buffers
Temporary storage area for write requests
Usage when write-through protocol is used
Each write operation results in writing new value to the memory
If processor waits for memory function to be completed, then
processor is slowed down
Processor immediately does not require results of write operation
Processor instead of waiting for write operation to complete, places
the write requests into this buffer and continues execution of next
instruction
The write requests are sent to memory whenever read requests are
not serviced by memory
Because read requests must be serviced immediately – else processor
cannot proceed without the data to be read
www.bookspar.com | Website for students |
VTU NOTES

Write buffers
Write buffer holds a number of write requests
A read request may refer to data that are still in write buffer
So, addresses of data to be read from memory are compared with
addresses of the data in the write buffer
In case of match , data in write buffers are used
Usage of write buffers when write-back protocol is used
Write operations are simply performed on the corresponding word
in the cache
If a new block of data comes into cache as a result of read miss, it
replaces an existing block which has some dirty data ( modified data)
which has to be written into main memory
If write-back operation is performed first, then processor has to wait
longer for new block to be read into the cache
So to read the data first, provide a fast write buffer for temporary
storage of dirty block that is ejected
Afterwards contents of write buffer are written into memory
www.bookspar.com | Website for students |
VTU NOTES

Prefetching
New data are brought into cache when they are first needed
The processor has to pause until the new data arrive
To avoid this stalling of the processor, it is possible to prefetch the data
into the cache before they are needed
Prefetching done either through software or hardware
In software – include a separate prefetch instruction – which loads the
data into cache by the time data are required in the program
Allows overlapping of accesses to main memory and computation of
the processor
Prefetching instructions inserted either by compiler or by
programmer – compiler insertion is better
In hardware – adding circuitry that attempts to discover a pattern in
memory references and prefetches data according to this pattern
www.bookspar.com | Website for students |
VTU NOTES

Lock-up free cache
The software Prefetching does not work well if it
interferes with normal execution of instructions
If action of prefetching stops other accesses to the cache until
the prefetch is completed
A cache of this type is said to be locked while it services a miss
Allow the processor to access the cache while the miss is
being serviced
A cache that allows multiple outstanding misses is called lockup-
free cache
Since it can service only one miss at a time, it must have
circuitry to keep track of all outstanding misses
By including special registers to hold pertinent information
www.bookspar.com | Website for students |
VTU NOTES

VIRTUAL Memories
Refer to slides given separately
www.bookspar.com | Website for students |
VTU NOTES

SECONDARY Storage
Magnetic hard disks
Organization and accessing of data on a disk
Access time
Typical disks
Data buffer/cache
Disk controller
Floppy disks
RAID Disk arrays
Commodity disk considerations
Optical disks
CD technology
CD-ROM
CD-Recordable
CD-Rewritable
DVD Technology
DVD-RAM
Magnetic tape systems
www.bookspar.com | Website for students |
VTU NOTES
Tags