Multiprocessor Systems

vampugani 8,893 views 48 slides Sep 08, 2017
Slide 1
Slide 1 of 48
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48

About This Presentation

Introduction to Multiprocessor Systems


Slide Content

Introduction to Introduction to
Multiprocessor SystemsMultiprocessor Systems
V.V. SubrahmanyamV.V. Subrahmanyam
SOCIS, IGNOUSOCIS, IGNOU
Date: 23Date: 23
rdrd
April, 2011 April, 2011
Time: 13-00 to 13-45Time: 13-00 to 13-45

ObjectivesObjectives

IntroductionIntroduction

Processor CouplingProcessor Coupling
•Loosely Coupled SystemsLoosely Coupled Systems
•Tightly Coupled SystemsTightly Coupled Systems

Multiprocessor InterconnectionsMultiprocessor Interconnections

Types of Multiprocessor O/STypes of Multiprocessor O/S

Functions of Multiprocessor O/SFunctions of Multiprocessor O/S

Multiprocessor SynchronizationMultiprocessor Synchronization

Multiprocessor SystemsMultiprocessor Systems

A multiprocessor system is a A multiprocessor system is a
collection of a number of standard collection of a number of standard
processors put together in an processors put together in an
innovative way to improve the innovative way to improve the
performance / speed of computer performance / speed of computer
hardware. hardware.

The main feature of this architecture The main feature of this architecture
is to provide high speed at low cost is to provide high speed at low cost
in comparison to uniprocessor. in comparison to uniprocessor.

Contd…Contd…

Multiprocessor operating systems aim to Multiprocessor operating systems aim to
support high performance through multiple support high performance through multiple
CPUs. CPUs.

An important goal is to make the number An important goal is to make the number
of CPUs transparent to the application. of CPUs transparent to the application.

Achieving such transparency is relatively Achieving such transparency is relatively
easy because the communication between easy because the communication between
different (parts of) applications uses the different (parts of) applications uses the
same primitives asthose in multitasking same primitives asthose in multitasking
uni-processor operating systems.uni-processor operating systems.

Contd…Contd…

Multiprogramming is more appropriate to Multiprogramming is more appropriate to
describe this concept, which is describe this concept, which is
implemented mostly in software, whereas implemented mostly in software, whereas
multiprocessing is more appropriate to multiprocessing is more appropriate to
describe the use of multiple hardware describe the use of multiple hardware
CPUs.CPUs.

The multiprocessor system is generally The multiprocessor system is generally
characterised by - characterised by - increased system increased system
throughputthroughput and and application speedupapplication speedup - -
parallel processing. parallel processing.

Throughput and Application Throughput and Application
SpeedupSpeedup

ThroughputThroughput can be improved, in a can be improved, in a time-time-
sharing environmentsharing environment, by executing a , by executing a
number of unrelated user processor on number of unrelated user processor on
different processors in parallel. As a result different processors in parallel. As a result
a large number of different tasks can be a large number of different tasks can be
completed in a unit of time without explicit completed in a unit of time without explicit
user direction. user direction.

AApplication speeduppplication speedup is possible by is possible by
creating a multiple processor scheduled to creating a multiple processor scheduled to
work on different processors. work on different processors.

Processor CouplingProcessor Coupling

Multiprocessor systems have more Multiprocessor systems have more
than one processing unit sharing than one processing unit sharing
memory/peripheral devices. They memory/peripheral devices. They
have greater computing power, and have greater computing power, and
higher reliability. Multiprocessor higher reliability. Multiprocessor
systems are classified into two:systems are classified into two:
•Tightly-coupledTightly-coupled
•Loosely-coupledLoosely-coupled

Tightly- Coupled SystemsTightly- Coupled Systems

Each processor is assigned a specific duty Each processor is assigned a specific duty
but processors work in close association, but processors work in close association,
possibly sharing one memory module.possibly sharing one memory module.

Tightly-coupled multiprocessor systems Tightly-coupled multiprocessor systems
contain multiple CPUs that are connected contain multiple CPUs that are connected
at the bus level. at the bus level.

These CPUs may have access to a central These CPUs may have access to a central
shared memory (SMP), or may participate shared memory (SMP), or may participate
in a memory hierarchy with both local and in a memory hierarchy with both local and
shared memory (NUMA). shared memory (NUMA). The IBM p690 The IBM p690
RegattaRegatta is an example of a high end SMP is an example of a high end SMP
system.system.

Contd…Contd…

Mainframe systems with multiple Mainframe systems with multiple
processors are often tightly-coupled.processors are often tightly-coupled.

Tightly-coupled systems perform better Tightly-coupled systems perform better
and are physically smaller than loosely-and are physically smaller than loosely-
coupled systems, but have historically coupled systems, but have historically
required greater initial investments and required greater initial investments and
may depreciate rapidly.may depreciate rapidly.

Contd…Contd…

Tightly-coupled systems tend to Tightly-coupled systems tend to
be much more energy efficient be much more energy efficient
than clusters. This is due to fact than clusters. This is due to fact
that considerable economies can that considerable economies can
be realised by designing be realised by designing
components to work together components to work together
from the beginning in tightly-from the beginning in tightly-
coupled systems.coupled systems.

Loosely-Coupled SystemsLoosely-Coupled Systems

Loosely-coupled multiprocessor Loosely-coupled multiprocessor
systems often referred to as clusters systems often referred to as clusters
are based on multiple standalone are based on multiple standalone
single or dual processor commodity single or dual processor commodity
computers interconnected via a high computers interconnected via a high
speed communication system. speed communication system.

A Linux BeowulfA Linux Beowulf is an example of a is an example of a
loosely-coupled system.loosely-coupled system.

Contd…Contd…

Nodes in a loosely-coupled system Nodes in a loosely-coupled system
are usually inexpensive commodity are usually inexpensive commodity
computers and can be recycled as computers and can be recycled as
independent machines upon independent machines upon
retirement from the cluster.retirement from the cluster.

Loosely coupled systems are not Loosely coupled systems are not
much energy efficient when much energy efficient when
compared with tightly coupled.compared with tightly coupled.

Multiprocessor InterconnectionsMultiprocessor Interconnections

Bus-oriented SystemBus-oriented System

Crossbar-connected SystemCrossbar-connected System

Hyper cubesHyper cubes

Multistage Switch-based System Multistage Switch-based System

Bus-Oriented Multiprocessor Bus-Oriented Multiprocessor
InterconnectionInterconnection
P 2 P 3 P 4
Shared Memory
P2 P3 P4P1

Contd…Contd…

Processors and memory are Processors and memory are
connected by a common bus. connected by a common bus.

Communication between processors Communication between processors
(P1, P2, P3 and P4) and with globally (P1, P2, P3 and P4) and with globally
shared memory is possible over a shared memory is possible over a
shared bus. shared bus.

DisadvantageDisadvantage

The above architecture gives rise to The above architecture gives rise to
a problem of a problem of ContentionContention at two at two
points, one is shared bus and the points, one is shared bus and the
other is shared memory.other is shared memory.

Two Ways to OvercomeTwo Ways to Overcome

Cache along with the Shared MemoryCache along with the Shared Memory

Cache associated with each Cache associated with each
individual processorindividual processor

Cache with Shared MemoryCache with Shared Memory
P 2 P 3 P 4
Cache
Shared Memory
P1 P2 P3 P4

Cache with each individual Cache with each individual
processorprocessor
P 2 P 3 P 4
Shared Memory
P1 P2 P3 P4
Cache Cache Cache Cache

TechniquesTechniques

Techniques which can be employed Techniques which can be employed
to decrease the impact of to decrease the impact of busbus and and
memory saturationmemory saturation in bus-oriented in bus-oriented
system.system.
•Wider Bus TechniqueWider Bus Technique
•Split Request / Reply ProtocolsSplit Request / Reply Protocols

Wider Bus TechniqueWider Bus Technique

A bus is made wider so that more A bus is made wider so that more
bytes can be transferred in a single bytes can be transferred in a single
bus cycle. In other words, a wider bus cycle. In other words, a wider
parallel bus increases the bandwidth parallel bus increases the bandwidth
by transferring more bytes in a by transferring more bytes in a
single bus cycle. single bus cycle.

Split Request / Reply ProtocolsSplit Request / Reply Protocols

The memory request and reply are The memory request and reply are
split into two individual works and split into two individual works and
are treated as separate bus are treated as separate bus
transactions. As soon as a transactions. As soon as a
processor requests a block, the bus processor requests a block, the bus
released to other user, meanwhile it released to other user, meanwhile it
takes for memory to fetch and takes for memory to fetch and
assemble the related group of assemble the related group of
items.items.

Other SchemesOther Schemes

Individual processors may or may not Individual processors may or may not
have private/cache memory.have private/cache memory.

Individual processors may or may not Individual processors may or may not
attach with input/output devices.attach with input/output devices.

Input/output devices may be attached to Input/output devices may be attached to
shared bus.shared bus.

Shared memory implemented in the form Shared memory implemented in the form
of multiple physical banks connected to of multiple physical banks connected to
the shared bus.the shared bus.

Crossbar-Connected SystemCrossbar-Connected System
M (n-1)M1 M0
P0
P1
P (n-1)

ContdContd

It is a grid structure of processor and It is a grid structure of processor and
memory modules. memory modules.

The every cross point of grid structure is The every cross point of grid structure is
attached with switch. By looking at the attached with switch. By looking at the
Figure,Figure, it shows simultaneous access it shows simultaneous access
between processor and memory modules between processor and memory modules
as as NN number of processors are provided number of processors are provided
with with NN number of memory modules. Thus number of memory modules. Thus
each processor accesses a different each processor accesses a different
memory module. memory module.

Contd…Contd…

Crossbar needs Crossbar needs NN
22
switches for fully switches for fully
connected network between connected network between
processors and memory. processors and memory.

Processors may or may not have Processors may or may not have
their private memories. their private memories.

Hypercubes SystemHypercubes System
110 111


100
010 011
101

000 001

Contd…Contd…

This architecture has some advantages This architecture has some advantages
over other architectures of multiprocessing over other architectures of multiprocessing
system. system.

In an In an n-degreen-degree hypercube architecture, we hypercube architecture, we
have:have:
•22
nn
nodes (Total number of processors) nodes (Total number of processors)
•Nodes are arranged in n-dimensional Nodes are arranged in n-dimensional
cube, i.e. each node is connected to cube, i.e. each node is connected to n n
number of nodes.number of nodes.
•Each node is assigned with a unique Each node is assigned with a unique
address which lies between 0 to 2address which lies between 0 to 2
nn
–1 –1
•The adjacent nodes (n-1) are differing in The adjacent nodes (n-1) are differing in
1 bit and the 1 bit and the nthnth node is having node is having
maximum ‘maximum ‘nn’ inter-node distance.’ inter-node distance.

ExampleExample

3-degree hypercube will have 23-degree hypercube will have 2
nn
nodes nodes
i.e., 2i.e., 2
33
= 8 nodes = 8 nodes

Nodes are arranged in 3-dimensional Nodes are arranged in 3-dimensional
cube, that is, each node is connected to cube, that is, each node is connected to
3 no. of nodes.3 no. of nodes.

Each node is assigned with a unique Each node is assigned with a unique
address, which lies between 0 to 7 (2address, which lies between 0 to 7 (2
nn

–1), i.e., 000, 001, 010, 011, 100, 101, –1), i.e., 000, 001, 010, 011, 100, 101,
110, 111110, 111

Two adjacent nodes differing in 1 bit Two adjacent nodes differing in 1 bit
(001, 010) and the 3(001, 010) and the 3
rdrd
(n (n
thth
) node is ) node is
having maximum ‘3’ inter-node distance having maximum ‘3’ inter-node distance
(100).(100).

Contd…Contd…

Hypercube provide a good basis for scalable Hypercube provide a good basis for scalable
system because its communication length grows system because its communication length grows
logarithmically with the number of nodes. logarithmically with the number of nodes.

It provides a bi-directional communication It provides a bi-directional communication
between two processors. between two processors.

It is usually used in loosely coupled It is usually used in loosely coupled
multiprocessor system because the transfer of multiprocessor system because the transfer of
data between two processors goes through data between two processors goes through
several intermediate processors. several intermediate processors.

The longest internodes delay is The longest internodes delay is n-degreen-degree..

To increase the input/output bandwidth the To increase the input/output bandwidth the
input/output devices can be attached with every input/output devices can be attached with every
node (processor).node (processor).

Multistage Switch-based SystemMultistage Switch-based System

Multistage Switch Based System Multistage Switch Based System
permits simultaneous connection permits simultaneous connection
between several input-output pairs. between several input-output pairs.

It consists of several stages of It consists of several stages of
switches which provide multistage switches which provide multistage
interconnection network. interconnection network.

Contd…Contd…

A A NN input-output connections contains input-output connections contains
K= log2N stages of K= log2N stages of N/2N/2 switches at each switches at each
stage. In simple words, stage. In simple words, N*NN*N processor- processor-
memory interconnection network requires memory interconnection network requires
(N/2) x = log(N/2) x = log
22N switches.N switches.

In a 8X8 process-memory interconnection In a 8X8 process-memory interconnection
network requires (8/2* lognetwork requires (8/2* log
228) = 4*3= 12 8) = 4*3= 12
switches. Each switch acts as 2X2 switches. Each switch acts as 2X2
crossbar.crossbar.

Types of Multiprocessor O/S Types of Multiprocessor O/S

The multiprocessor o/s are complex The multiprocessor o/s are complex
in comparison to multiprograms on in comparison to multiprograms on
an uniprocessor operating system an uniprocessor operating system
because multiprocessor executes because multiprocessor executes
tasks concurrently. tasks concurrently.

Therefore, it must be able to support Therefore, it must be able to support
the concurrent execution of multiple the concurrent execution of multiple
tasks to increase processors tasks to increase processors
performance.performance.

Contd…Contd…

Depending upon the control structure Depending upon the control structure
and its organisation the three basic and its organisation the three basic
types of multiprocessor operating types of multiprocessor operating
system are:system are:
•Separate SupervisorSeparate Supervisor
•Master-SlaveMaster-Slave
•Symmetric SupervisionSymmetric Supervision

Separate SupervisorSeparate Supervisor

In separate supervisor system each In separate supervisor system each
process behaves independently. process behaves independently.

Each system has its own operating system Each system has its own operating system
which manages local input/output devices, which manages local input/output devices,
file system and memory well as keeps its file system and memory well as keeps its
own copy of kernel, supervisor and data own copy of kernel, supervisor and data
structures, whereas some common data structures, whereas some common data
structures also exist for communication structures also exist for communication
between processors. between processors.

The access protection is maintained, The access protection is maintained,
between processor, by using some between processor, by using some
synchronization mechanism like synchronization mechanism like
semaphores. semaphores.

LimitationsLimitations
Such architecture will face the Such architecture will face the
following problems:following problems:

Little coupling among processors.Little coupling among processors.

Parallel execution of single task.Parallel execution of single task.

During process failure it degrades.During process failure it degrades.

Inefficient configuration as the Inefficient configuration as the
problem of replication arises between problem of replication arises between
supervisor/kernel/data structure supervisor/kernel/data structure
code and each processor.code and each processor.

Master-SlaveMaster-Slave

In master-slave, out of many processors one In master-slave, out of many processors one
processor behaves as a master whereas others processor behaves as a master whereas others
behave as slaves. behave as slaves.

The master processor is dedicated to executing The master processor is dedicated to executing
the operating system. the operating system.

It works as scheduler and controller over slave It works as scheduler and controller over slave
processors. It schedules the work and also processors. It schedules the work and also
controls the activity of the slaves. Therefore, controls the activity of the slaves. Therefore,
usually data structures are stored in its private usually data structures are stored in its private
memory. memory.

Slave processors are often identified and work Slave processors are often identified and work
only as a schedulable pool of resources, in other only as a schedulable pool of resources, in other
words, the slave processors execute application words, the slave processors execute application
programmes.programmes.

Contd…Contd…

This arrangement allows the parallel This arrangement allows the parallel
execution of a single task by allocating execution of a single task by allocating
several subtasks to multiple processors several subtasks to multiple processors
concurrently. Since the operating system concurrently. Since the operating system
is executed by only master processors this is executed by only master processors this
system is relatively simple to develop and system is relatively simple to develop and
efficient to use. Limited scalability is the efficient to use. Limited scalability is the
main limitation of this system, because main limitation of this system, because
the master processor become a bottleneck the master processor become a bottleneck
and will consequently fail to fully utilize and will consequently fail to fully utilize
slave processors.slave processors.

SymmetricSymmetric

In symmetric organization all processors In symmetric organization all processors
configuration are identical. configuration are identical.

All processors are autonomous and are treated All processors are autonomous and are treated
equally. To make all the processors functionally equally. To make all the processors functionally
identical, all the resources are pooled and are identical, all the resources are pooled and are
available to them. available to them.

This operating system is also symmetric as any This operating system is also symmetric as any
processor may execute it. In other words there is processor may execute it. In other words there is
one copy of kernel that can be executed by all one copy of kernel that can be executed by all
processors concurrently. To that end, the whole processors concurrently. To that end, the whole
process is needed to be controlled for proper process is needed to be controlled for proper
interlocks for accessing scarce data structure and interlocks for accessing scarce data structure and
pooled resources. pooled resources.

Contd…Contd…

The simplest way to achieve this is to treat the The simplest way to achieve this is to treat the
entire operating system as a critical section and entire operating system as a critical section and
allow only one processor to execute the operating allow only one processor to execute the operating
system at one time. system at one time.

This method is called ‘floating master’ method This method is called ‘floating master’ method
because in spite of the presence of many because in spite of the presence of many
processors only one operating system exists. processors only one operating system exists.

The processor that executes the operating system The processor that executes the operating system
has a special role and acts as a master. has a special role and acts as a master.

As the operating system is not bound to any As the operating system is not bound to any
specific processor, therefore, it floats from one specific processor, therefore, it floats from one
processor to another.processor to another.

Contd…Contd…

Parallel execution of different applications Parallel execution of different applications
is achieved by maintaining a queue of is achieved by maintaining a queue of
ready processors in shared memory. ready processors in shared memory.
Processor allocation is then reduced to Processor allocation is then reduced to
assigning the first ready process to first assigning the first ready process to first
available processor until either all available processor until either all
processors are busy or the queue is processors are busy or the queue is
emptied. Therefore, each idled processor emptied. Therefore, each idled processor
fetches the next work item from the fetches the next work item from the
queue.queue.

Multiprocessor O/S Functions and Multiprocessor O/S Functions and
RequirementsRequirements

A multiprocessor operating system A multiprocessor operating system
manages all the available resources and manages all the available resources and
schedule functionality to form an schedule functionality to form an
abstraction.abstraction.

It will facilitates programme execution and It will facilitates programme execution and
interaction with users.interaction with users.

A processor is one of the important and A processor is one of the important and
basic types of resources that need to be basic types of resources that need to be
managed. For effective use of managed. For effective use of
multiprocessors the processor scheduling multiprocessors the processor scheduling
is necessary. is necessary.

Processor SchedulingProcessor Scheduling

Processors scheduling undertakes the Processors scheduling undertakes the
following tasks:following tasks:
•Allocation of processors among applications in Allocation of processors among applications in
such a manner that will be consistent with such a manner that will be consistent with
system design objectives. It affects the system system design objectives. It affects the system
throughput. Throughput can be improved by throughput. Throughput can be improved by
co-scheduling several applications together, co-scheduling several applications together,
thus availing fewer processors to each.thus availing fewer processors to each.
•Ensure efficient use of processors allocation to Ensure efficient use of processors allocation to
an application. This primarily affects the an application. This primarily affects the
speedup of the system.speedup of the system.

Contd…Contd…
The two primary facets of OS support The two primary facets of OS support
for multiprocessing are:for multiprocessing are:
•Flexible and efficient interprocess and Flexible and efficient interprocess and
interprocessor synchronization interprocessor synchronization
mechanism, andmechanism, and
•Efficient creation and management of a Efficient creation and management of a
large number of threads of activity, such large number of threads of activity, such
as processes or threads.as processes or threads.

Memory ManagementMemory Management

In multiprocessors system memory In multiprocessors system memory
management is highly dependent on the management is highly dependent on the
architecture and inter-connection scheme.architecture and inter-connection scheme.
•In loosely coupled systems memory is usually In loosely coupled systems memory is usually
handled independently on a pre-processor handled independently on a pre-processor
basis whereas in multiprocessor system shared basis whereas in multiprocessor system shared
memory may be simulated by means of a memory may be simulated by means of a
message passing mechanism.message passing mechanism.
•In shared memory systems the operating In shared memory systems the operating
system should provide a flexible memory system should provide a flexible memory
model that facilitates safe and efficient access model that facilitates safe and efficient access
to share data structures and synchronization to share data structures and synchronization
variables.variables.

Device ManagementDevice Management

The third basic resource is The third basic resource is Device Device
ManagementManagement but it has received little but it has received little
attention in multiprocessor systems to attention in multiprocessor systems to
date, because earlier the main focus point date, because earlier the main focus point
is speedup of compute intensive is speedup of compute intensive
application, which generally do not application, which generally do not
generate much input/output after the generate much input/output after the
initial loading. initial loading.

Now, multiprocessors are applied for more Now, multiprocessors are applied for more
balanced general-purpose applications, balanced general-purpose applications,
therefore, the input/output requirement therefore, the input/output requirement
increases in proportion with the realised increases in proportion with the realised
throughput and speed. throughput and speed.

Multiprocessor SynchronizationMultiprocessor Synchronization

Multiprocessor system facilitates parallel Multiprocessor system facilitates parallel
program execution and read/write sharing program execution and read/write sharing
of data and thus may cause the processors of data and thus may cause the processors
to concurrently access location in the to concurrently access location in the
shared memory. Therefore, a correct and shared memory. Therefore, a correct and
reliable mechanism is needed to serialize reliable mechanism is needed to serialize
this access. This is called synchronization this access. This is called synchronization
mechanism. mechanism.

The mechanism should make access to a The mechanism should make access to a
shared data structure appear atomic with shared data structure appear atomic with
respect to each other. respect to each other.

TechniquesTechniques

Test-and-SetTest-and-Set

Compare-and-SwapCompare-and-Swap

Fetch-and-AddFetch-and-Add