Large Scale Sharing GFS and PAST

About This Presentation

Title:

Large Scale Sharing GFS and PAST

Description:

Single master maintains metadata. Master, Chunkservers, Clients: Linux workstations, ... For each chunk's replica set, Master gives one replica primary lease ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 36

Provided by: mahes5

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Scale Sharing GFS and PAST

1
Large Scale Sharing GFS and PAST

Mahesh Balakrishnan

2
Distributed File Systems

Traditional Definition
Data and/or metadata stored at remote locations,
accessed by client over the network.
Various degrees of centralization from NFS to
xFS.
GFS and PAST
Unconventional, specialized functionality
Large-scale in data and nodes

3
The Google File System

Specifically designed for Googles backend needs
Web Spiders append to huge files
Application data patterns
Multiple producer multiple consumer
Many-way merging
GFS ?? Traditional File Systems

4
Design Space Coordinates

Commodity Components
Very large files Multi GB
Large sequential accesses
Co-design of Applications and File System
Supports small files, random access writes and
reads, but not efficiently

5
GFS Architecture

Interface
Usual create, delete, open, close, etc
Special snapshot, record append
Files divided into fixed size chunks
Each chunk replicated at chunkservers
Single master maintains metadata
Master, Chunkservers, Clients Linux
workstations, user-level process

6
Client File Request

Client finds chunkid for offset within file
Client sends ltfilename, chunkidgt to Master
Master returns chunk handle and chunkserver
locations

7
Design Choices Master

Single master maintains all metadata
Simple Design
Global decision making for chunk replication
and placement
Bottleneck?
Single Point of Failure?

8
Design Choices Master

Single master maintains all metadata in memory!
Fast master operations
Allows background scans of entire data
Memory Limit?
Fault Tolerance?

9
Relaxed Consistency Model

File Regions are -
Consistent All clients see the same thing
Defined After mutation, all clients see exactly
what the mutation wrote
Ordering of Concurrent Mutations
For each chunks replica set, Master gives one
replica primary lease
Primary replica decides ordering of mutations and
sends to other replicas

10
Anatomy of a Mutation

1 2 Client gets chunkserver locations from
master
3 Client pushes data to replicas, in a chain
4 Client sends write request to primary
primary assigns sequence number to write and
applies it
5 6 Primary tells other replicas to apply write
7 Primary replies to client

11
Connection with Consistency Model

Secondary replica encounters error while applying
write (step 5) region Inconsistent.
Client code breaks up single large write into
multiple small writes region Consistent, but
Undefined.

12
Special Functionality

Atomic Record Append
Primary appends to itself, then tells other
replicas to write at that offset
If secondary replica fails to write data (step
5),
duplicates in successful replicas, padding in
failed ones
region defined where append successful,
inconsistent where failed
Snapshot
Copy-on-write chunks copied lazily to same
replica

13
Master Internals

Namespace management
Replica Placement
Chunk Creation, Re-replication, Rebalancing
Garbage Collection
Stale Replica Detection

14
Dealing with Faults

High availability
Fast master and chunkserver recovery
Chunk replication
Master state replication read-only shadow
replicas
Data Integrity
Chunk broken into 64KB blocks, with 32 bit
checksum
Checksums stored in memory, logged to disk
Optimized for appends, since no verifying required

15
Micro-benchmarks
16
Storage Data for real clusters
17
Performance
18
Workload Breakdown
of operations for given size
of bytes transferred for given operation size
19
GFS Conclusion

Very application-specific more engineering than
research

20
PAST

Internet-based P2P global storage utility
Strong persistence
High availability
Scalability
Security
Not a conventional FS
Files have unique id
Clients can insert and retrieve files
Files are immutable

21
PAST Operations

Nodes have random unique nodeIds
No searching, directory lookup, key distribution
Supported Operations
Insert (name, key, k, file) ? fileId
Stores on k nodes closest in id space
Lookup (fileId) ? file
Reclaim (fileId, key)

22
Pastry

P2P routing substrate
route (key, msg) routes to numerically closest
node in less than log2b N steps
Routing Table Size (2b - 1) log2b N 2l
b determines tradeoff between per node state
and lookup order
l failure tolerance delivery guaranteed unless
l/2 adjacent nodeIds fail

23
10233102 Routing Table

L/2 larger and L/2 smaller nodeIds
Routing Entries
M closest nodes

24
PAST operations/security

Insert
Certificate created with fileId, file content
hash, replication factor and signed with private
key
File and certificate routed through Pastry
First node in k closest accepts file and forwards
to other k-1
Security Smartcards
Public/Private key
Generate and verify certificates
Ensure integrity of nodeId and fileId assignments

25
Storage Management

Design Goals
High global storage utilization
Graceful degradation near max utilization
PAST tries to
Balance free storage space amongst nodes
Maintain k closest nodes replication invariant
Storage Load Imbalance
Variance in number of files assigned to node
Variance in size distribution of inserted files
Variance in storage capacity of PAST nodes

26
Storage Management

Large capacity storage nodes have multiple
nodeIds
Replica Diversion
If node A cannot store file, it stores pointer to
file at leaf set node B which is not in k closest
What if A or B fail? Duplicate pointer in k1
closest node
Policies for directing and accepting replicas
tpri and tdiv thresholds for file size / free
space.
File Diversion
If insert fails, client retries with different
fileId

27
Storage Management

Maintaining replication invariant
Failures and joins
Caching
k-replication in PAST for availability
Extra copies stored to reduce client latency,
network traffic
Unused disk space utilized
Greedy Dual-Size replacement policy

28
Performance

Workloads
8 Web Proxy Logs
Combined file systems
k5, b4
of nodes 2250
Without replica and file diversion
51.1 insertions failed
60.8 global utilization

4 normal distributions of node storage sizes
29
Effect of Storage Management
30
Effect of tpri
Lower tpri Better utilization, More failures
tdiv 0.05 tpri varied
31
Effect of tdiv
Trend similar to tpri
tpri 0.1 tdiv varied
32
File and Replica Diversions
Ratio of replica diversions vs utilization
Ratio of file diversions vs utilization
33
Distribution of Insertion Failures
File system trace
Web logs trace
34
Caching
35
Conclusion

Write a Comment

User Comments (0)