Google File System

JunyoungJung8 6,773 views 48 slides Nov 10, 2017

Slide 1 of 48

About This Presentation

GFS 논문을 읽고 난 발표자료 입니다.

Size: 759.51 KB

Language: en

Added: Nov 10, 2017

Slides: 48 pages

Slide Content

GOOGLE
FILE
SYSTEM
Presented by
Junyoung Jung (2012104030)
Jaehong Jeong (2011104050)
Dept. of Computer Engineering
Kyung Hee Univ.
Big Data Programming (Prof. Lee Hae Joon), 2017-Fall

2
Contents
1.INTRODUCTION
2.DESIGN
3.SYSTEM INTERACTIONS
4.MASTER OPERATION
5.HIGH AVAILABILITY
6.MEASUREMENTS
7.CONCLUSIONS

INTRODUCTION
This paper…
Background
Different Points in the Design space
3
1

1.1 THIS PAPER…
4
The Google File System
-Sanjay Ghemeawat, Howard Gobioff, and Shun-Tak Leung
-Google
-19
th
ACM Symposium on Operating System Principles, 2003

Previous Distributed File System
1.2 BACKGROUND
5
Performance Scalability Reliability Availability

Previous Distributed File System
1.2 BACKGROUND
6
Performance Scalability Reliability Availabililty
GFS(Google File System) has the same goal.
However, it reflects a marked departurefrom some earlier file system.

1.Component failures are the norm rather than the exception.
2.Files are huge by traditional standards.
3.Most files are mutated by appending new data.
4.Co-design applications and file system API.
5.Sustained bandwidth more critical than low latency.
1.3 DIFFERENT POINTS IN THE DESIGN SPACE
7

DESIGN
Interface
Architecture
8
2

Familiar Interface
-create
-delete
-open
-close
-read
-write
2.1 INTERFACE
9
Moreover…
-Snapshot
ㆍLow cost
ㆍMake a copy of a file/directory tree
ㆍDuplicate metadata
-Atomic Record append
ㆍAtomicity with multiple concurrent writes

2.2 ARCHITECTURE
10

2.2 ARCHITECTURE
11
Chunk
-Files are divided into fixed-size chunk
-64MB
-Larger than typical file system block sizes
Advantages from large chunk size
-Reduce interaction between client and master
-Client can perform many operations on a given chunk
-Reduce size of metadata stored on the master

2.2 ARCHITECTURE
12
GFS chunkservers
-Store chunks on local disks as Linux files.
-Read/Write chunk data specified by a chunk handle & byte range

2.2 ARCHITECTURE
13
GFS master
-Maintains all file system metadata
ㆍNamespace
ㆍAccess-control information
ㆍChunk locations
ㆍ‘Lease’ management

2.2 ARCHITECTURE
14
GFS client
-GFS client code linked into each application.
-GFS client code implements the file system API.
-GFS client code communicates with the master & chunkservers.

2.2 ARCHITECTURE
15
Using fixed chunk size, translate filename & byte offset to chunk index.
Send request to master

2.2 ARCHITECTURE
16
Replies with chunk handle & location of chunkserverreplicas
(including which is ‘primary’)

2.2 ARCHITECTURE
17
Cache info using filename & chunk index as key

2.2 ARCHITECTURE
18
Request data from nearest chunkserver
“chunk handle & index into chunk”

2.2 ARCHITECTURE
19
No Need to talk more about this 64MB chunk
Until cached info expires or file re-opend.

2.2 ARCHITECTURE
20
Metadata only
Data only

2.2 ARCHITECTURE
21
Metadata only
Data only
Metadata
-Master stores three types
-Stored in memory
-Kept persistent thru logging

SYSTEM INTERACTIONS
Leases & Mutation Order
Data Flow
Atomic Appends
Snapshot
22
3

3.1 LEASES & MUTATION ORDER
23
Objective
-Ensure data consistent & defined.
-Minimize load on master.
Master grants ‘lease’ to one replica
-Called ‘primary’ chunkserver.
Primary defines a mutation order between mutations
-All secondaries follows this order

3.2 DATAFLOW
24

3.3 ATOMIC APPENDS
25
The Client Specifies only the data
Similar to writes
-Mutation order is determined by the primary
-All secondaries use the same mutation order
GFS appends data to the file at least once atomically

3.4 SNAPSHOT
26
Goals
-To quickly create branch copies of huge data sets
-To easily checkpoint the current state
Copy-on-write technique
-Metadata for the source file or directory tree is duplicated.
-Reference count for chunks are incremented.
-Chunks are copied later at the first write.

MASTER OPERATION
Namespace Management and Locking
Replica Placement
Creation, Re-replication, Rebalancing
Garbage Collection
Stale Replica Detection 27
4

“
The master executes all
namespace operations and
manages chunk replicas
throughout the system.
2828

4.1 Namespace Management and Locking
▰Namespaces are represented as a lookup table mapping full
pathnames to metadata
▰Use locks over regions of the namespace to ensure proper
serialization
▰Each master operation acquires a set of locks before it runs
29

4.1 Namespace Management and Locking
▰Example of Locking Mechanism
▻Preventing /home/user/foo from being created while /home/user is being snapshotted to /save/user
▻Snapshot operation
▻-Read locks on /home and /save
▻-Write locks on /home/user and /save/user
▻File creation
▻-Read locks on /home and /home/user
▻-Write locks on /home/user/foo
▻Conflict locks on /home/user
▰Locking scheme is that it allows concurrent mutations in
the same directory
30

4.2 Replica Placement
▰The chunk replica placement policy serves two purposes:
▻Maximize data reliability and availability
▻Maximize network bandwidth utilization.
31

4.3 Creation, Re-replication, Rebalancing
▰Creation
▻Disk space utilization
▻Number of recent creations on each chunkserver
▰Re-replication
▻Prioritized: How far it is from its replication goal…
▻The highest priority chunk is cloned first by copying the chunk data directly from an existing replica
▰Rebalancing
▻Periodically
32

4.4 Garbage Collection
▰Deleted files
▻Deletion operation is logged
▻File is renamed to a hidden name, then may be removed later or get recovered
▰Orphaned chunks (unreachable chunks)
▻Identified and removed during a regular scan of the chunk namespace
33

4.5 Stale Replica Detection
▰Stale replicas
▻Chunk version numbering
▻The client or the chunkserververiﬁes the version number when it performs the
operation so that it is always accessing up-to-date data.
34

FAULT TOLERANCE AND DIAGNOSIS
High Availability
Data Integrity
Diagnostic Tools
35
5

“We cannot completely trust the
machines, nor can we completely
trust the disks.
3636

5.1 High Availability
▰Fast Recovery
▻-Operation log and Checkpointing
▰Chunk Replication
▻-Each chunk is replicated on multiple chunkserverson different racks
▰Master Replication
▻-Operation log and check points are replicated on multiple machines
37

5.2 Data Integrity
▰Checksummingto detect corruption of stored data
▰Each chunkserverindependently verifies the integrity
38

5.3 Diagnostic Tools
▰Extensive and detailed diagnostic logging has helped
immeasurably in problem isolation, debugging, and
performance analysis, while incurring only a minimal cost .
▰RPC requests and replies, etc..
39

MEASUREMENTS
Micro-benchmarks
Real World Clusters
Workload Breakdown
40
6

“A few micro-benchmarks to
illustrate the bottlenecks
inherent in the GFS architecture
and implementation
4141

6.1 Micro-benchmarks
▰One master, two master replicas, 16 chunkservers, and 16
clients. (2003)
▰All the machines are conﬁgured with dual 1.4 GHz PIII
processors, 2 GB of memory, two 80 GB 5400 rpm disks,
and a 100 Mbps full-duplex Ethernet connection to an HP
2524 switch.
42

6.1 Micro-benchmarks
43

6.2 Real World Clusters
44

6.3 Workload Breakdown
▰Methodology and Caveats
▰Chunkserver Workload
▰Appends versus Writes
▰Mster Workload
45

CONCLUSIONS
46
7

Conclusions
▰Different than previous file systems
▰Satisfies needs of the application
▰Fault tolerance
47

Google File System

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Google File System

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......