Lecture Notes Unit3 chapter21 - parallel databases

Murugan146644 155 views 19 slides Oct 15, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Description:
Welcome to the comprehensive guide on Relational Database Management System (RDBMS) concepts, tailored for final year B.Sc. Computer Science students affiliated with Alagappa University. This document covers fundamental principles and advanced topics in RDBMS, offering a structured appr...


Slide Content

RDBMS -Unit III
Chapter 21
Parallel Databases
Prepared By
Dr.S.Murugan, Associate Professor
Department of Computer Science,
AlagappaGovernmentArts College, Karaikudi.
(Affiliated by AlagappaUniversity)
Mailid: [email protected]
Reference Book:
Database System Concepts by Abraham Silberschatz, Henry
F.Korth, S. Sudharshan

21.1 Introduction
➢Alargenumberofcomputersusedbytheorganization.
➢Organizationsareusinglargevolumesofdata-suchas
dataaboutwhatitemspeoplebuy,whatWeblinks
usersclickon,andwhenpeoplemaketelephonecalls-
toplantheiractivitiesandpricing.
➢Asmicroprocessorshavebecomecheap,parallel
machineshavebecomecommonandrelatively
inexpensive

21.2 lOParallelism
➢I/Oparallelismreferstoreducingthetimerequiredto
retrieverelationsfromdiskbypartitioningthe
relationsonmultipledisks.
➢Themostcommonformofdatapartitioningis
horizontalpartitioning.
➢Inhorizontalpartitioning,thetuplesofarelationare
divided(ordeclustered)amongmanydisks.

21.2.1 Partitioning Techniques
➢Therearethreebasicdata-partitioningstrategiesare
discussedhere.Assumethattherearendisks,Do,D
1,.
..,D
n-1,acrosswhichthedataaretobepartitioned.
➢Round-robin.Thisstrategyscanstherelationinany
orderandsendsthei
th
tupletodisknumberD
imodn.
Eachdiskhasapproximatelythesamenumberof
tuplesastheothers.(Numberofrecordsdividedwith
modfunction)

21.2.1 Partitioning Techniques
Round-robin:
Records
Mod
FunctionDisk Block
1 1 mod 3 1
2 2 mod 3 2
3 3 mod 3 0
4 4 mod 3 1
5 5 mod 3 2
6 6 mod 3 0
7 7 mod 3 1
8 8 mod 3 2
9 9 mod 3 0
Disk BlockRecords
0 3,6,9
1 1,4,7
2 2,5,8

21.2.1 Partitioning Techniques
➢Hashpartitioning.Thisdeclusteringstrategy
designatesoneormoreattributesfromthegiven
relation'sschemaasthepartitioningattributes.Ahash
functionischosenwhoserangeis{0,1,...,n-1}.
Numberofattributesdivided.
➢Forex,Assumethatthetablecontains9Attributes
with100records.Disk0maycontains100records
withfirst3attribute;Disk1maycontains100records
withnext3attribute;Disk2maycontains100records
withlast3attribute;

21.2.1 Partitioning Techniques
➢Rangepartitioning.Thisstrategydistributes
contiguousattribute-valuerangestoeachdisk.
➢Forexample,rangepartitioningwiththreedisks
numbered0,1,and2mayassigntupleswithvalues
lessthan5todisk0,valuesbetween5and40todisk
1,andvaluesgreaterthan40todisk2.(Numberof
recordsdividedwithrangefunction)

21.2.2 Comparison of Partitioning Techniques
➢Oncearelationhasbeenpartitionedamongseveral
disks,wecanretrieveitinparallel,usingallthedisks.
➢Similarly,whenarelationisbeingpartitioned,itcan
bewrittentomultipledisksinparallel.
➢Thetransferratesforreadingorwritinganentire
relationaremuchfasterwithI/Oparallelism.
➢Readinganentirerelation,orscanningarelation,is
onlyonekindofaccesstodata.

21.2.2 Comparison of Partitioning Techniques
➢Accesstodatacanbeclassifiedasfollows:
1.Scanningtheentirerelation
2.Locatingatupleassociatively(forexample,
employee_nam=e"Campbell");
thesequeries,calledpointqueries,seektuplesthat
haveaspecifiedvalueforaspecificattribute.
3.Locatingalltuplesforwhichthevalueofagiven
attributelieswithinaspecifiedrange(forexample,
10000Isalary<20000);thesequeriesarecalled
rangequeries

21.2.3 Handling of Skew
➢Whenarelationispartitioned(byatechniqueother
thanround-robin),theremaybeaskewinthe
distributionoftuples,withahighpercentageoftuples
placedinsomepartitionsandfewertuplesinother
partitions.
➢Theskewmaybeclassifiedas:
➢Attribute-valueskew
➢Partitionskew

21.2.3 Handling of Skew
➢Attribute-valueskewreferstothefactthatsome
valuesappearinthepartitioningattributesofmany
tuples.
➢Partitionskewreferstothefactthattheremaybe
loadimbalanceinthepartitioning.

21.2.3 Handling of Skew
➢Abalancedrange-partitioningvectorcanbe
constructedbysorting:
➢Therelationisfirstsortedonthepartitioning
attributes.
➢Therelationisthenscannedinsortedorder.
➢Afterevery1/noftherelationhasbeenread,thevalue
ofthepartitioningattributeofthenexttupleisadded
tothepartitionvector.Here,ndenotesthenumberof
partitionstobeconstructed.

21.2.3 Handling of Skew
➢Incasetherearemanytupleswiththesamevalueforthe
partitioningattribute,thetechniquecanstillresultin
someskew.
➢ThemaindisadvantageofthismethodistheextraI/O
overheadincurredindoingtheinitialsort.
➢TheI/Ooverheadforconstructingbalancedrange-
partitionvectorscanbereducedbyconstructingand
storingafrequencytable,orhistogram,oftheattribute
valuesforeachattributeofeachrelation.
➢Figure21.1showsanexampleofahistogramforan
integer-valuedattributethattakesvaluesintherange1
to25.

21.2.3 Handling of Skew

21.2.3 Handling of Skew
➢Incasetherearemanytupleswiththesamevalueforthe
partitioningattribute,thetechniquecanstillresultin
someskew.
➢ThemaindisadvantageofthismethodistheextraI/O
overheadincurredindoingtheinitialsort.
➢TheI/Ooverheadforconstructingbalancedrange-
partitionvectorscanbereducedbyconstructingand
storingafrequencytable,orhistogram,oftheattribute
valuesforeachattributeofeachrelation.
➢Figure21.1showsanexampleofahistogramforan
integer-valuedattributethattakesvaluesintherange1
to25.

21.3 InterqueryParallelism
➢Ininterqueryparallelism,differentqueriesor
transactionsexecuteinparallelwithoneanother.
➢Transactionthroughputandscaleupcanbeincreased
bythisformofparallelism.
➢Interqueryparallelismistheeasiestformof
parallelismtosupportinadatabasesystem-
particularlyinashared-memoryparallelsystem.Lock
tablesandLoginformationaremaintainedinthesame
memory.
➢Supportinginterqueryparallelismismorecomplicated
inashared-diskorsharednothingarchitecture.

21.3 InterqueryParallelism
➢Processorshavetoperformsometasks,suchas
lockingandlogging,inacoordinatedfashion,andthat
requiresthattheypassmessagestoeachother.
➢Aparalleldatabasesystemmustalsoensurethattwo
processorsdonotupdatethesamedataindependently
atthesametime.
➢Whenaprocessoraccessesorupdatesdata,the
databasesystemmustensurethattheprocessorhasthe
latestversionofthedatainitsbufferpool.The
problemofensuringthattheversionisthelatestis
knownasthecache-coherencyproblem.

21.4 IntraqueryParallelism
ItistheformofparallelismwhereSingleQueryis
executedinparallelonmanyprocessors.
Advantages
➢Tospeedupasinglecomplexlongrunningqueries.
➢Bestsuitedforcomplexscientificcalculations
(queries).
SupportedParallelDatabaseArchitectures
SharedMemory,SharedDiskandSharedNothingparallel
architecturesaresupported.
Weneednotworryaboutlockingandloggingasbecause
itinvolvesparallelizingsinglequery.

21.4 IntraqueryParallelism
Types
Intra-operationparallelism–theprocessofspeedingupa
querythroughparallelizingtheexecutionofindividual
operations.
TheoperationswhichcanbeparallelizedareSort,Join,
Projection,Selectionandsoon.
Inter-operationparallelism–theprocessofspeedingupa
querythroughparallelizingvariousoperationswhicharepart
ofthequery.
Forexample,aquerywhichinvolvesjoinof4tablescanbe
executedinparallelintwoprocessorsinsuchawaythateach
processorshalljointworelationslocallyandtheresult1and
result2canbejoinedfurthertoproducethefinalresult.