GOOGLE
FILE
SYSTEM
Presented by
Junyoung Jung (2012104030)
Jaehong Jeong (2011104050)
Dept. of Computer Engineering
Kyung Hee Univ.
Big Data Programming (Prof. Lee Hae Joon), 2017-Fall
INTRODUCTION
This paper…
Background
Different Points in the Design space
3
1
1.1 THIS PAPER…
4
The Google File System
-Sanjay Ghemeawat, Howard Gobioff, and Shun-Tak Leung
-Google
-19
th
ACM Symposium on Operating System Principles, 2003
Previous Distributed File System
1.2 BACKGROUND
6
Performance Scalability Reliability Availabililty
GFS(Google File System) has the same goal.
However, it reflects a marked departurefrom some earlier file system.
1.Component failures are the norm rather than the exception.
2.Files are huge by traditional standards.
3.Most files are mutated by appending new data.
4.Co-design applications and file system API.
5.Sustained bandwidth more critical than low latency.
1.3 DIFFERENT POINTS IN THE DESIGN SPACE
7
DESIGN
Interface
Architecture
8
2
Familiar Interface
-create
-delete
-open
-close
-read
-write
2.1 INTERFACE
9
Moreover…
-Snapshot
ㆍLow cost
ㆍMake a copy of a file/directory tree
ㆍDuplicate metadata
-Atomic Record append
ㆍAtomicity with multiple concurrent writes
2.2 ARCHITECTURE
10
2.2 ARCHITECTURE
11
Chunk
-Files are divided into fixed-size chunk
-64MB
-Larger than typical file system block sizes
Advantages from large chunk size
-Reduce interaction between client and master
-Client can perform many operations on a given chunk
-Reduce size of metadata stored on the master
2.2 ARCHITECTURE
12
GFS chunkservers
-Store chunks on local disks as Linux files.
-Read/Write chunk data specified by a chunk handle & byte range
2.2 ARCHITECTURE
13
GFS master
-Maintains all file system metadata
ㆍNamespace
ㆍAccess-control information
ㆍChunk locations
ㆍ‘Lease’ management
2.2 ARCHITECTURE
14
GFS client
-GFS client code linked into each application.
-GFS client code implements the file system API.
-GFS client code communicates with the master & chunkservers.
2.2 ARCHITECTURE
15
Using fixed chunk size, translate filename & byte offset to chunk index.
Send request to master
2.2 ARCHITECTURE
16
Replies with chunk handle & location of chunkserverreplicas
(including which is ‘primary’)
2.2 ARCHITECTURE
17
Cache info using filename & chunk index as key
2.2 ARCHITECTURE
18
Request data from nearest chunkserver
“chunk handle & index into chunk”
2.2 ARCHITECTURE
19
No Need to talk more about this 64MB chunk
Until cached info expires or file re-opend.
2.2 ARCHITECTURE
20
Metadata only
Data only
2.2 ARCHITECTURE
21
Metadata only
Data only
Metadata
-Master stores three types
-Stored in memory
-Kept persistent thru logging
SYSTEM INTERACTIONS
Leases & Mutation Order
Data Flow
Atomic Appends
Snapshot
22
3
3.1 LEASES & MUTATION ORDER
23
Objective
-Ensure data consistent & defined.
-Minimize load on master.
Master grants ‘lease’ to one replica
-Called ‘primary’ chunkserver.
Primary defines a mutation order between mutations
-All secondaries follows this order
3.2 DATAFLOW
24
3.3 ATOMIC APPENDS
25
The Client Specifies only the data
Similar to writes
-Mutation order is determined by the primary
-All secondaries use the same mutation order
GFS appends data to the file at least once atomically
3.4 SNAPSHOT
26
Goals
-To quickly create branch copies of huge data sets
-To easily checkpoint the current state
Copy-on-write technique
-Metadata for the source file or directory tree is duplicated.
-Reference count for chunks are incremented.
-Chunks are copied later at the first write.
MASTER OPERATION
Namespace Management and Locking
Replica Placement
Creation, Re-replication, Rebalancing
Garbage Collection
Stale Replica Detection 27
4
“
The master executes all
namespace operations and
manages chunk replicas
throughout the system.
2828
4.1 Namespace Management and Locking
▰Namespaces are represented as a lookup table mapping full
pathnames to metadata
▰Use locks over regions of the namespace to ensure proper
serialization
▰Each master operation acquires a set of locks before it runs
29
4.1 Namespace Management and Locking
▰Example of Locking Mechanism
▻Preventing /home/user/foo from being created while /home/user is being snapshotted to /save/user
▻Snapshot operation
▻-Read locks on /home and /save
▻-Write locks on /home/user and /save/user
▻File creation
▻-Read locks on /home and /home/user
▻-Write locks on /home/user/foo
▻Conflict locks on /home/user
▰Locking scheme is that it allows concurrent mutations in
the same directory
30
4.2 Replica Placement
▰The chunk replica placement policy serves two purposes:
▻Maximize data reliability and availability
▻Maximize network bandwidth utilization.
31
4.3 Creation, Re-replication, Rebalancing
▰Creation
▻Disk space utilization
▻Number of recent creations on each chunkserver
▰Re-replication
▻Prioritized: How far it is from its replication goal…
▻The highest priority chunk is cloned first by copying the chunk data directly from an existing replica
▰Rebalancing
▻Periodically
32
4.4 Garbage Collection
▰Deleted files
▻Deletion operation is logged
▻File is renamed to a hidden name, then may be removed later or get recovered
▰Orphaned chunks (unreachable chunks)
▻Identified and removed during a regular scan of the chunk namespace
33
4.5 Stale Replica Detection
▰Stale replicas
▻Chunk version numbering
▻The client or the chunkserververifies the version number when it performs the
operation so that it is always accessing up-to-date data.
34
FAULT TOLERANCE AND DIAGNOSIS
High Availability
Data Integrity
Diagnostic Tools
35
5
“We cannot completely trust the
machines, nor can we completely
trust the disks.
3636
5.1 High Availability
▰Fast Recovery
▻-Operation log and Checkpointing
▰Chunk Replication
▻-Each chunk is replicated on multiple chunkserverson different racks
▰Master Replication
▻-Operation log and check points are replicated on multiple machines
37
5.2 Data Integrity
▰Checksummingto detect corruption of stored data
▰Each chunkserverindependently verifies the integrity
38
5.3 Diagnostic Tools
▰Extensive and detailed diagnostic logging has helped
immeasurably in problem isolation, debugging, and
performance analysis, while incurring only a minimal cost .
▰RPC requests and replies, etc..
39
MEASUREMENTS
Micro-benchmarks
Real World Clusters
Workload Breakdown
40
6
“A few micro-benchmarks to
illustrate the bottlenecks
inherent in the GFS architecture
and implementation
4141
6.1 Micro-benchmarks
▰One master, two master replicas, 16 chunkservers, and 16
clients. (2003)
▰All the machines are configured with dual 1.4 GHz PIII
processors, 2 GB of memory, two 80 GB 5400 rpm disks,
and a 100 Mbps full-duplex Ethernet connection to an HP
2524 switch.
42
6.1 Micro-benchmarks
43
6.2 Real World Clusters
44
6.3 Workload Breakdown
▰Methodology and Caveats
▰Chunkserver Workload
▰Appends versus Writes
▰Mster Workload
45
CONCLUSIONS
46
7
Conclusions
▰Different than previous file systems
▰Satisfies needs of the application
▰Fault tolerance
47