Lecture Notes Unit3 chapter22 - distributed databases

Murugan146644 221 views 12 slides Oct 15, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

Description:
Welcome to the comprehensive guide on Relational Database Management System (RDBMS) concepts, tailored for final year B.Sc. Computer Science students affiliated with Alagappa University. This document covers fundamental principles and advanced topics in RDBMS, offering a structured appr...


Slide Content

RDBMS -Unit III
Chapter 22
Distributed Databases
Prepared By
Dr.S.Murugan, Associate Professor
Department of Computer Science,
AlagappaGovernmentArts College, Karaikudi.
(Affiliated by AlagappaUniversity)
Mailid: [email protected]
Reference Book:
Database System Concepts by Abraham Silberschatz, Henry
F.Korth, S. Sudharshan

22.1 Homogeneous and Heterogeneous Databases
➢Inahomogeneousdistributeddatabasesystem,all
siteshaveidenticaldatabasemanagementsystem
software,areawareofoneanother,andagreeto
cooperateinprocessingusers'requests.(Ex:All
systemusesoracleDB)
➢Inaheterogeneousdistributeddatabase,differentsites
mayusedifferentschemas,anddifferentdatabase-
managementsystemsoftware.
➢Thesitesmaynotbeawareofoneanother.(Ex:One
systemusesoracleothersystemmayuseAccess)

22.2 Distributed Data Storage
➢Considerarelationrthatistobestoredinthe
database.Therearetwoapproachestostoringthis
relationinthedistributeddatabase:
➢Replication:Thesystemmaintainsseveralidentical
copiesoftherelation,andstoreseachrelationin
differentsite.
➢Fragmentation:Thesystempartitionstherelation
intoseveralfragments,andstoreseachfragmentata
differentsite.(Horizondalorverticalfragmentation)

22.2.1 Data Replication
➢Ifrelationrisreplicated,acopyofrelationrisstored
intwoormoresites.
➢Thereareanumberofadvantagesanddisadvantages
toreplication.
Advantage:
Availability. If one of the sites containing relation r faiIs,
then the relation r can be found in another site.
Disadvantage:
Increased parallelism: If we want to reading of the
relation r, then several sites can process queries involving
r in parallel.
Increased overhead on update: If we want to update of
the relation r, then several sites can update the relation r
in parallel.

22.2.2 Data Fragmentation
➢Ifrelationrisfragmented,risdividedintoanumber
offragmentsr
1,r
2,...,r
n.
➢Therearetwodifferentschemesforfragmentinga
relation:horizontalfragmentationandvertical
fragmentation.
➢Horizontalfragmentationsplitstherelationby
assigningeachtupleofrtooneormorefragments.
➢Verticalfragmentationsplitstherelationby
decomposingtheschemeRofrelationr.

22.2.2 Data Fragmentation
➢Weshallillustratetheseapproachesbyfragmenting
therelationaccount,withtheschema
➢Account-schema=(account_number,branch_name,
balance)
➢Inhorizontalfragmentation,arelationrispartitioned
intoanumberofsubsets,r
1,r
2,...,r
n.Eachtupleof
relationrmustbelongtoatleastoneofthefragments.

22.2.2 Data Fragmentation
➢Forexample,theaccountrelationcanbedividedinto
severaldifferentfragments,eachofwhichconsistsof
tuplesofaccountsbelongingtoaparticularbranch.
➢Ifthebankingsystemhasonlytwobranches-Hillside
andValleyview-thentherearetwodifferent
fragments:
➢Wereconstructtherelationrbytakingtheunionofall
fragments

22.2.2 Data Fragmentation
➢Verticalfragmentationofr(R)involvesthedefinition
ofseveralsubsetsofattributesR
1,R
2,...,R
nofthe
schemaRsothat
➢Forex,considerauniversitydatabasewitharelation
employee-info=employee-id,name,designation,salary.
➢Theemployee_inforelationmaybefragmentedinto
tworelation.
Employee_private-info=employee-id,salary
employee_public-info=employee-id,name,designation.
➢Thesemaybestoredatdifferentsites.

22.3 Distributed Transactions
➢Accesstothevariousdataitemsinadistributed
systemisusuallyaccomplishedthroughtransactions,
whichmustpreservetheACIDproperties.
➢Therearetwotypesoftransactionthatweneedto
consider.
➢Thelocaltransactionsarethosethataccessandupdate
datainonlyonelocaldatabase;
➢Theglobaltransactionsarethosethataccessand
updatedatainseverallocaldatabases.

22.7 Distributed Query Processing
➢Thereareseveraltechniquesforchoosingastrategy
forprocessingaquerythatminimizetheamountof
timethatittakestocomputetheanswer.
➢Forcentralizedsystems,theprimarycriterionfor
measuringthecostofaparticularstrategyisthe
numberofdiskaccesses.
➢Inadistributedsystem,wemusttakeintoaccount
severalothermatters,including
➢Thecostofdatatransmissionoverthenetwork.
➢Thepotentialgaininperformancefromhaving
severalsitesprocesspartsofthequeryin
parallel.

22.7 Distributed Query Processing
➢Ingeneral,wecannotfocussolelyondiskcostsoron
networkcosts.Rather,wemustfindagoodtrade-off
betweenthetwo.

22.7.1 Query Transformation
➢Consideranextremelysimplequery:"Findallthe
tuplesintheaccountrelation.“
➢Iftheaccountrelationisreplicated,wehaveachoice
ofreplicatomake.
➢ifareplicaisfragmented,thechoiceisnotsoeasyto
make,sinceweneedtocomputeseveraljoinsor
unionstoreconstructtheaccountrelation.