11. dfs

sandpoonia 16,133 views 37 slides Nov 18, 2013
Slide 1
Slide 1 of 37
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37

About This Presentation

A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.


Slide Content

Distributed Operating Systems
FILE SYSTEM
Sandeep Kumar Poonia
Head of Dept. CS/IT
B.E., M.Tech., UGC-NET
LM-IAENG, LM-IACSIT,LM-CSTA, LM-AIRCC, LM-SCIEI, AM-UACEE

2
Introduction
Filesystemwereoriginallydevelopedfor
centralizedcomputersystemsanddesktop
computers.
Filesystemwasasanoperatingsystem
facilityprovidinga convenient
programminginterfacetodiskstorage.

3
DISTRIBUTED FILE SYSTEMS
DEFINITIONS:
•ADistributedFileSystem(DFS)issimplyaclassicalmodelofa
filesystemdistributedacrossmultiplemachines.Thepurpose
istopromotesharingofdispersedfiles.
•Theresourcesonaparticularmachinearelocaltoitself.
Resourcesonothermachinesareremote.
•Afilesystemprovidesaserviceforclients.Theserverinterface
isthenormalsetoffileoperations:create,read,etc.onfiles.

4
DISTRIBUTED FILE SYSTEMS
Clients,servers,andstoragearedispersedacrossmachines.
Configurationandimplementationmayvary–
a)Serversmayrunondedicatedmachines,OR
b)Serversandclientscanbeonthesamemachines.
c)TheOSitselfcanbedistributed(withthefilesystema
partofthatdistribution.
d)Adistributionlayercanbeinterposedbetweena
conventionalOSandthefilesystem.
ClientsshouldviewaDFSthesamewaytheywouldacentralized
FS;thedistributionishiddenatalowerlevel.
Performanceisconcernedwiththroughputandresponsetime.
Definitions

DISTRIBUTED FILE SYSTEMS
Distributed file system support:
•RemoteInformationSharing-Allowsafiletobe
transparentlyaccessedbyprocessesofanynodeofthesystem
irrespectiveofthefile’slocation
•User Mobility-User have flexibility to work on different node at
different time
•Availability-better fault tolerance
•Diskless Workstations
5

DISTRIBUTED FILE SYSTEMS
Distributed File System provide following type of
services:
•Storage Service
•True File Service
•Name Service
6

DISTRIBUTED FILE SYSTEMS
Desirablefeaturesofagooddistributedfilesystem
Transparency
Structuretransparency
AccessTransparency
NamingTransparency
ReplicationTransparency
UserMobility
Performance
Simplicityandeaseofuse
Scalability
HighAvailability
HighReliability
DataIntegrity
Security
Heterogeneity
7

File Models
Criteria: Structure and Modifiability
Structured and Unstructured Files
–Structured Files: A file appear to the file server as an
ordered sequence of records.
•Files with Indexed Records
•Files With non-indexed records
–Unstructured files: No substructure known to the
file server
8

File Models
Mutable and Immutable Files
•Mutable
–An update performed on a file overwrites on its old
contents
–A file is represented as a single stored sequence that is
altered by each update operation.
•Immutable Files
–A file cannot be modified once it has been created
–File versioning approach used to implement file
updates
–It support consistent sharing therefore it is easier to
support file caching and replication
9

File Accessing Models
FileAccessingModelsofDFSmainlydependson:Methodused
foraccessingremotefilesandtheunitofdataaccess
AccessingRemoteFiles
–RemoteServiceModel:
•Client’srequestprocessedatserver’snode
•InthiscasePackingandcommunicationoverheadcanbesignificant
–DataCachingModel:
•Client’srequestprocessedontheclient’snodeitselfbyusingthecacheddata.
•Thismodelgreatlyreducesnetworktraffic
•Cacheconsistencyproblemmayoccur
LOCUSandNFSusetheremoteservicemodelbutaddcaching
frobetterperformance
Spriteusedatacachingmodelbutemploystheremoteservice
modelundercertaincircumstances
10

File Accessing Models
Unit of Data Transfer
–File Level Transfer Model( Ex. Amoeba, AFS)
•The whole file is moved when an operation requires file data
•It is simple, It has better scalability
•Disk access routines on the servers can be better optimized
•But it requires sufficient storage space on client’s node
–Block Level Transfer Model( Ex. LOCUS, Sprite)
•Data transferred in units of file blocks
•It does not require client node to have large storage space
•It can be used in diskless workstations
•Network traffic may be significant
–Byte Level Transfer Model( Cambridge file server)
•Data transfers in units of bytes
•Low Storage requires but difficulty in cache management
–Record Level Transfer Model( Research Storage System)
•Suitable for Structured model
11

12
DISTRIBUTED FILE SYSTEMS
Namingisthemappingbetweenlogicalandphysicalobjects.
–Example:Auserfilenamemapsto<cylinder,sector>.
–Inaconventionalfilesystem,it'sunderstoodwherethefileactuallyresides;the
systemanddiskareknown.
–InatransparentDFS,thelocationofafile,somewhereinthenetwork,ishidden.
–Filereplicationmeansmultiplecopiesofafile;mappingreturnsaSEToflocations
forthereplicas.
Locationtransparency-
a)Thenameofafiledoesnotrevealanyhintofthefile'sphysicalstoragelocation.
b)Filenamestilldenotesaspecific,althoughhidden,setofphysicaldiskblocks.
c)Thisisaconvenientwaytosharedata.
d)Canexposecorrespondencebetweencomponentunitsandmachines.
Naming and Transparency

13
DISTRIBUTED FILE SYSTEMS
Locationindependence-
–Thenameofafiledoesn'tneedtobechangedwhenthefile'sphysicalstorage
locationchanges.Dynamic,one-to-manymapping.
–Betterfileabstraction.
–Promotessharingthestoragespaceitself.
–Separatesthenaminghierarchyfromthestoragedeviceshierarchy.
MostDFSstoday:
–Supportlocationtransparentsystems.
–DoNOTsupportmigration;(automaticmovementofafilefrommachineto
machine.)
–Filesarepermanentlyassociatedwithspecificdiskblocks.
Naming and Transparency

14
DISTRIBUTED FILE SYSTEMS
TheANDREWDFSASANEXAMPLE:
–Islocationindependent.
–Supportsfilemobility.
–SeparationofFSandOSallowsfordisk-lesssystems.Thesehavelowercostand
convenientsystemupgrades.Theperformanceisnotasgood.
NAMINGSCHEMES:
Therearethreemainapproachestonamingfiles:
1.Filesarenamedwithacombinationofhostandlocalname.
–Thisguaranteesauniquename.NEITHERlocationtransparentNORlocation
independent.
–Samenamingworksonlocalandremotefiles.TheDFSisaloosecollectionof
independentfilesystems.
Naming and Transparency

15
DISTRIBUTED FILE SYSTEMS
NAMINGSCHEMES:
2.Remotedirectoriesaremountedtolocaldirectories.
–Soalocalsystemseemstohaveacoherentdirectorystructure.
–Theremotedirectoriesmustbeexplicitlymounted.Thefilesarelocation
independent.
–SUNNFSisagoodexampleofthistechnique.
3.Asingleglobalnamestructurespansallthefilesinthesystem.
–TheDFSisbuiltthesamewayasalocalfilesystem.Locationindependent.
Naming and Transparency

16
MountingRemote Directories (NFS)

17
DISTRIBUTED FILE SYSTEMS
IMPLEMENTATIONTECHNIQUES:
–CanMapdirectoriesorlargeraggregatesratherthanindividualfiles.
–Anon-transparentmappingtechnique:
name----><system,disk,cylinder,sector>
–Atransparentmappingtechnique:
name---->file_identifier----><system,disk,cylinder,sector>
–Sowhenchangingthephysicallocationofafile,onlythefileidentifier
needbemodified.Thisidentifiermustbe"unique"intheuniverse.
Naming and Transparency

18
DISTRIBUTED FILE SYSTEMS
CACHING
•Reducenetworktrafficbyretainingrecentlyaccesseddiskblocksinacache,sothat
repeatedaccessestothesameinformationcanbehandledlocally.
•Ifrequireddataisnotalreadycached,acopyofdataisbroughtfromtheservertothe
user.
•Performaccessesonthecachedcopy.
•Filesareidentifiedwithonemastercopyresidingattheservermachine,
•Copiesof(partsof)thefilearescatteredindifferentcaches.
•CacheConsistencyProblem--Keepingthecachedcopiesconsistentwiththemaster
file.
•Aremoteservice((RPC)hasthesecharacteristicsteps:
a)Theclientmakesarequestforfileaccess.
b)Therequestispassedtotheserverinmessageformat.
c)Theservermakesthefileaccess.
d)Returnmessagesbringtheresultbacktotheclient.
Thisisequivalenttoperformingadiskaccessforeachrequest.

19
DISTRIBUTED FILE SYSTEMS
CACHELOCATION:
•Cachingisamechanismformaintainingdiskdataonthelocalmachine.Thisdatacan
bekeptinthelocalmemoryorinthelocaldisk.Cachingcanbeadvantageousbothfor
readaheadandreadagain.
•ThecostofgettingdatafromacacheisafewHUNDREDinstructions;diskaccesses
costTHOUSANDSofinstructions.
•Themastercopyofafiledoesn'tmove,butcachescontainreplicasofportionsofthe
file.
•Cachingbehavesjustlike"networkedvirtualmemory".
•Whatshouldbecached?<<blocks<--->files>>.Biggersizesgiveabetterhitrate;
smallergivebettertransfertimes.
•Cachingondiskgives:
—Betterreliability.
•Cachinginmemorygives:
—Thepossibilityofdisklessworkstations,
—Greaterspeed,
•Sincetheservercacheisinmemory,itallowstheuseofonlyonemechanism.

20
DISTRIBUTED FILE SYSTEMS CACHE UPDATE POLICY:
Awritethroughcache
•Whenacacheentryismodified,thenewvalueisimmediatelysettoserver
forupdatingmastercopyoffile
•Ithasgoodreliability.Buttheusermustwaitforwritestogettotheserver.
UsedbyNFS.
Delayedwrite
•Modifiedvaluewrittenonlytothecacheandclientmakeanote
•Allupdategatheredandsenttoserveratatime
–Writeonejectionfromcache
–Periodicwrite
–Writeonclose
•writerequestscompletemorerapidly.Datamaybewrittenovertheprevious
cachewrite,savingaremotewrite.Poorreliabilityonacrash.

21
DISTRIBUTED FILE SYSTEMS
CACHECONSISTENCY:
Thebasicissueis,howtodeterminethattheclient-cacheddataisconsistentwithwhat's
ontheserver.
•Client-initiatedapproach-
TheclientaskstheserverifthecacheddataisOK.Whatshouldbethefrequencyof
"asking"?Beforeeveryaccess,Onfileopen,atfixedtimeinterval,...?
•Server-initiatedapproach-
Possibilities:AandBbothhavethesamefileopen.WhenAclosesthefile,B
"discards"itscopy.ThenBmuststartover.
Theserverisnotifiedoneveryopen.Ifafileisopenedforwriting,thendisable
cachingbyotherclientsforthatfile.
Getread/writepermissionforeachblock;thendisablecachingonlyforparticular
blocks.

22
DISTRIBUTED FILE SYSTEMS
COMPARISONOFCACHINGANDREMOTESERVICE:
•Manyremoteaccessescanbehandledbyalocalcache.There'sagreatdealoflocalityof
referenceinfileaccesses.Serverscanbeaccessedonlyoccasionallyratherthanforeachaccess.
•Cachingcausesdatatobemovedinafewbigchunksratherthaninmanysmallerpieces;this
leadstoconsiderableefficiencyforthenetwork.
•Diskaccessescanbebetteroptimizedontheserverifit'sunderstoodthatrequestsarealwaysfor
largecontiguouschunks.
•Cacheconsistencyisthemajorproblemwithcaching.Whenthereareinfrequentwrites,caching
isawin.Inenvironmentswithmanywrites,theworkrequiredtomaintainconsistency
overwhelmscachingadvantages.
•Cachingworksbestonmachineswithconsiderablelocalstore-eitherlocaldisksorlarge
memories.Withneitherofthese,useremote-service.
•Cachingrequiresawholeseparatemechanismtosupportacquiringandstorageoflargeamounts
ofdata.Remoteservicemerelydoeswhat'srequiredforeachcall.Assuch,cachingintroducesan
extralayerandmechanismandismorecomplicatedthanremoteservice.

23
DISTRIBUTED FILE SYSTEMS
STATEFULVS.STATELESSSERVICE:
Stateful:Aserverkeepstrackofinformationaboutclientrequests.
–Itmaintainswhatfilesareopenedbyaclient;connectionidentifiers;servercaches.
–Memorymustbereclaimedwhenclientclosesfileorwhenclientdies.
Stateless:Eachclientrequestprovidescompleteinformationneededbytheserver(i.e.,filename,
fileoffset).
–Theservercanmaintaininformationonbehalfoftheclient,butit'snotrequired.
–UsefulthingstokeepincludefileinfoforthelastNfilestouched.

24
DISTRIBUTED FILE SYSTEMS
STATEFULVS.STATELESSSERVICE:
Performanceisbetterforstateful.
–Don'tneedtoparsethefilenameeachtime,or"open/close"fileoneveryrequest.
–Statefulcanhavearead-aheadcache.
FaultTolerance:Astatefulserverloseseverythingwhenitcrashes.
–Servermustpollclientsinordertorenewitsstate.
–Clientcrashesforcetheservertocleanupitsencachedinformation.
–Statelessremembersnothingsoitcanstarteasilyafteracrash.

25
DISTRIBUTED FILE SYSTEMS
FILEREPLICATION:
•Duplicatingfilesonmultiplemachinesimprovesavailabilityandperformance.
•Placedonfailure-independentmachines(theywon'tfailtogether).
Replicationmanagementshouldbe"location-opaque".
•Themainproblemisconsistency-whenonecopychanges,howdoothercopiesreflectthat
change?Oftenthereisatradeoff:consistencyversusavailabilityandperformance.
•Example:
"Demandreplication"islikewhole-filecaching;readingafilecausesittobecachedlocally.
Updatesaredoneonlyontheprimaryfileatwhichtimeallothercopiesareinvalidated.
•Atomicandserializedinvalidationisn'tguaranteed(messagecouldgetlost/machinecouldcrash.
)

26
OVERVIEW:
•RunsonSUNOS-NFSisbothanimplementationandaspecificationofhowtoaccessremote
files.It'sbothadefinitionandaspecificinstance.
•Thegoal:toshareafilesysteminatransparentway.
•Usesclient-servermodel(forNFS,anodecanbebothsimultaneously.)Canactbetweenanytwo
nodes(nodedicatedserver.)Mountmakesaserverfile-systemvisiblefromaclient.
mountserver:/usr/sharedclient:/usr/local
•Then,transparently,arequestfor/usr/local/dir-serveraccessesafilethatisontheserver.
•Themountiscontrolledby:(1)accessrights,(2)serverspecificationofwhat'smountable.
•Canuseheterogeneousmachines-differenthardware,operatingsystems,networkprotocols.
•UsesRPCforisolation-thusallimplementationsmusthavethesameRPCcalls.TheseRPC's
implementthemountprotocolandtheNFSprotocol.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

27
THEMOUNTPROTOCOL:
Thefollowingoperationsoccur:
1.Theclient'srequestissentviaRPCtothemountserver(onservermachine.)
2.Mountserverchecksexportlistcontaining
a)filesystemsthatcanbeexported,
b)legalrequestingclients.
c)It'slegitimatetomountanydirectorywithinthelegalfilesystem.
3.Serverreturns"filehandle"toclient.
4.Servermaintainslistofclientsandmounteddirectories--thisisstateinformation!Butthis
dataisonlya"hint"andisn'ttreatedasessential.
5.Mountingoftenoccursautomaticallywhenclientorserverboots.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

28
THENFSPROTOCOL:
RPC’ssupporttheseremotefileoperations:
a)Searchforfilewithindirectory.
b)Readasetofdirectoryentries.
c)Manipulatelinksanddirectories.
d)Read/writefileattributes.
e)Read/writefiledata.
Note:
–Openandcloseareconspicuouslyabsentfromthislist.NFSserversarestateless.Each
requestmustprovideallinformation.Withaservercrash,noinformationislost.
–Modifieddatamustactuallygettoserverdiskbeforeclientisinformedtheactioniscomplete.
Usingacachewouldimplystateinformation.
–AsingleNFSwriteisatomic.AclientwriterequestmaybebrokenintoseveralatomicRPC
calls,sothewholethingisNOTatomic.Sincelockmanagementisstateful,NFSdoesn'tdoit.A
higherlevelmustprovidethisservice.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

29
NFSARCHITECTURE:
Followlocalandremoteaccessthroughthisfigure:
DISTRIBUTED FILE SYSTEMS
SUN Network File System

30
NFSARCHITECTURE:
1.UNIXfilesystemlayer-doesnormalopen/read/etc.commands.
2.Virtualfilesystem(VFS)layer-
a)Givescleanlayerbetweenuserandfilesystem.
b)Actsasdeflectionpointbyusingglobalvnodes.
c)Understandsthedifferencebetweenlocalandremotenames.
d)Keepsinmemoryinformationaboutwhatshouldbedeflected(mounteddirectories)and
howtogettotheseremotedirectories.
3.Systemcallinterfacelayer-
a)PresentssanitizedvalidatedrequestsinauniformwaytotheVFS.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

31
PATH-NAMETRANSLATION:
•Breakthecompletepathnameintocomponents.
•Foreachcomponent,doanNFSlookupusingthe
componentname+directoryvnode.
•Afteramountpointisreached,eachcomponentpiecewillcauseaserveraccess.
•Can'thandthewholeoperationtoserversincetheclientmayhaveasecondmountona
subsidiarydirectory(amountonamount).
•Adirectorynamecacheontheclientspeedsuplookups.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

32
CACHESOFREMOTEDATA:
•Theclientkeeps:
Fileblockcache-(thecontentsofafile)
Fileattributecache-(fileheaderinfo(inodeinUNIX)).
•Thelocalkernelhangsontothedataaftergettingitthefirsttime.
•Onanopen,localkernel,itcheckswithserverthatcacheddataisstillOK.
•Cachedattributesarethrownawayafterafewseconds.
•Datablocksusereadaheadanddelayedwrite.
•Mechanismhas:
Serverconsistencyproblems.
Goodperformance.
DISTRIBUTED FILE SYSTEMS
SUN Network File System

33
AdistributedenvironmentatCMU.Strongestcharacteristicisscalability.
OVERVIEW:
•Machinesareeitherserversorclients.
•Clientsseealocalnamespaceandasharednamespace.
•Servers
runvicewhichpresentsahomogeneous,locationtransparentdirectorystructuretoallclients.
•Clients(workstations):
Runvirtueprotocoltocommunicatewithvice.
Havelocaldisks(1)forlocalnamespace,(2)tocacheshareddata.
•Forscalability,offloadworkfromserverstoclients.Useswholefilecaching.
•NOclientsortheirprogramsareconsideredtrustworthy.
DISTRIBUTED FILE SYSTEMS
Andrew File System

34
SHAREDNAMESPACE:
•Theserverfilespaceisdividedintovolumes.Volumescontainfilesofonlyoneuser.It'sthese
volumesthatarethelevelofgranularityattachedtoaclient.
•Avicefilecanbeaccessedusingafid=<volumenumber,vnode>.Thefiddoesn'tdependon
machinelocation.Aclientqueriesavolume-locationdatabaseforthisinformation.
•Volumescanmigratebetweenserverstobalancespaceandutilization.Oldserverhas
"forwarding"instructionsandhandlesclientupdatesduringmigration.
•Read-onlyvolumes(systemfiles,etc.)canbereplicated.Thevolumedatabaseknowshowtofind
these.
DISTRIBUTED FILE SYSTEMS
Andrew File System

35
FILEOPERATIONSANDCONSISTENCYSEMANTICS:
•Ifafileisremote,theclientoperatingsystempassescontroltoaclientuser-levelprocessnamed
Venus.
•TheclienttalkstoViceserveronlyduringopen/close;reading/writingareonlytothelocalcopy.
•Afurtheroptimization-ifdataislocallycached,it'sassumedtobegooduntiltheclientistold
otherwise.
•Aclientissaidtohaveacallbackonafile.
•Whenaclientencachesafile,theservermaintainsstateforthisfact.
•Beforeallowingawritetoafile,theserverdoesacallbacktoanyoneelsehavingthisfileopen;all
othercachedcopiesareinvalidated.
•Whenaclientisrebooted,allcacheddataissuspect.
•Iftoomuchstorageusedbyserverforcallbackstate,theservercanbreaksomecallbacks.
•Thesystemclearlyhasconsistencyconcerns.
DISTRIBUTED FILE SYSTEMS
Andrew File System

36
IMPLEMENTATION:
• Deflectionofopen/close:
• Theclientkernelismodifiedtodetectreferencestovicefiles.
• TherequestisforwardedtoVenuswiththesesteps:
• Venusdoespathnametranslation.
• AsksViceforthefile
• Movesthefiletolocaldisk
• Passesinodeoffilebacktoclientkernel.
• Venusmaintainscachesforstatus(inmemory)anddata(onlocaldisk.)
• Aserveruser-levelprocesshandlesclientrequests.
• AlightweightprocesshandlesconcurrentRPCrequestsfromclients.
• Stateinformationiscachedinthisprocess.
• Susceptibletoreliabilityproblems.
DISTRIBUTED FILE SYSTEMS
Andrew File System

37
Inthissectionwehavelookedathowfilessystemsareimplementedacrosssystems.Of
specialconcernisconsistency,caching,andperformance.
DISTRIBUTED FILE SYSTEMS
Wrap Up