Title: Distributed%20File%20Systems%20(DFS)
1Distributed File Systems (DFS)
2- Problem facilitate access to remote data
- Uniform access to data from multiple, network
connected nodes - Aggregate the storage offered by multiple nodes
- DFS in charge with
- Organization
- Retrieval
- Storage sharing
- Naming
- Protection
3Distributed File System Goals
- Access transparency
- Clients unaware files are remote
- Location transparency
- Consistent name space (local and remote)
- Concurrency transparency
- Modifications are coherent
- Failure transparency
- Client and client programs should operate
correctly after server failure. - One client failure should not impact the
others - Heterogeneity
- File service should be provided across
different hardware and software platforms
4Distributed File System Goals
- Scalability
- Scale from a few machines to many (tens of
thousands?) - Replication transparency
- Clients unaware of data replication
- Coherence maintained
- Migration transparency
- Files should be able to move around without
clients knowledge - Fine grained distribution of data
- Locate objects near processes that use them
5A few terms
- File service
- - Specification of what the file system offers
to clients - File
- name, data, attributes
- Immutable file
- Cannot be changed once created
- - Easier to cache and replicate
- Protection
- Capabilities
- Access control lists
6File service types
- Upload/download model
- Read file copy file from server to client
- Write file copy file from client to server
- Advantage
- Simple
- Problems
- Wasteful what if client needs small piece?
- Problematic what if client doesnt have enough
space? - Consistency what if others need to modify the
same file?
7File service types
- Remote access model
- File service provides functional interface
- create, delete, read bytes, write bytes, etc
- Advantages
- Client gets only whats needed
- Server can manage coherent view of file system
- Problems
- Possible server and network congestion
- Servers are accessed for duration of file access
- Same data may be requested repeatedly
8File service types
- Data caching model
- File access local file access, client caches a
local copy - Advantage reduces communication overhead
- Problem data consistency
9File-Accessing Granularity
Transfer level Merits Problems
File Simple, less communication overhead, and immune to server crashes Client required to have large storage space
Block less storage space at client More network traffic/overhead
Byte Flexibility maximized Difficult cache management to handle the variable-length data
Record Handling structured and indexed files More network traffic More overhead to re-construct a file.
10File-Sharing Semantics
- Define when modifications of the file data made
by a user are observable by other users - Sequential semantics (Unix)
- Session Semantics
- Immutable shared-files semantics
- Transaction-like semantics
11Sequential Semantics(Unix Semantics)
- Read returns result of last write
- Easily achieved if
- Only one server
- Clients do not cache data
- BUT
- Performance problems if no cache
- We can write-through to use caches and deal
with obsolete data - Must notify clients holding copies
- Requires extra state, generates extra traffic
12Session Semantics
- Relax the rules
- Changes to an open file are initially visible
only to the process (or machine) that modified
it. - Last process to modify the file wins.
13Session Semantics
Client C
Server
Client A
Client B
Open(file)
Append(c)
Open(file)
Append(d)
Append(x)
Append(e)
Append(y)
Close(file)
Append(z)
Open(file)
Close(file)
Append(m)
m
Close(file)
m
Close(file)
14Other solutions
- Make files immutable
- Aids in replication
- Does not help with detecting modification
- Or...
- Use atomic transactions
- Each file access is an atomic transaction
- If multiple transactions start concurrently
resulting modification is serial
15File-Sharing SemanticsImmutable Shared-Files
Semantics
Server
Client B
Client A
Version 1.0
Tentative based on 1.0
Tentative based on 1.0
Version 1.1
Version conflict
Abort
Depend on each file system. Abortion is simple
(later, the client A can Decide to overwrite it
with its tentative 1.0 by changing the
corresponding directory)
Version 1.2
Version 1.2
Merge
Ignore conflict
16File usage patterns
- We cant have the best of all worlds
- Where to compromise?
- Semantics vs. efficiency
- Efficiency client performance, network
traffic, server load - - Modified semantics break transparency, reduce
functionality, etc. - To help decision Understand how files are used
- 1981 study by Satyanarayanan
17File usage patterns
- Most files are lt10 Kbytes
- (2005 average size of 385,341 files on a
typical Mac 197 KB) - (files accessed within 30 days 147,398 files.
average size56.95 KB) - Feasible to transfer entire files (simpler)
- Still have to support long files
- Most files have short lifetimes
- Perhaps keep them local
- Few files are shared
- Overstated problem
- Session semantics will cause no problem most
of the time
18Design issues
19Namespace Location transparency
- Is the name of the server known to the client?
- //server1/dir/file
- Server can move without client caring
- if the name stays the same.
- If file moves to server2 we have problems!
- Location independence
- Files can be moved without changing the
pathname - //archive/paul
20Namespace Where do you find the remote files?
- Should all machines have the exact same view of
the directory hierarchy? - e.g., global root directory?
- //server/path
- or forced remote directories
- /remote/server/path
- or.
- Should each machine have its own hierarchy with
remote resources located as needed? - /usr/local/games
21Access How do you access files?
- Requirement Access remote files as local files
- Remote FS name space should be syntactically
consistent with local name space - redefine the way all files are named and provide
a syntax for specifying remote files - -- e.g. //server/dir/file
- -- Can cause legacy applications to fail
- 2. use a file system mounting mechanism
- Overlay portions of another FS name space over
local name space
22Name resolution how to handle ..
- Parse
- (a) component at a time
- (b) entire path at once
- (b) is more efficient but
- offers less flexibility (e.g., naming as
indirection) - Perhaps use (a) and cache bindings to increase
performance
23Stateful or stateless design?
- Stateful Server maintains client-specific state
- Shorter requests
- Better performance in processing requests
- Cache coherence is possible
- Server can know whos accessing what
- File locking is possible
24Stateful or stateless design?
- Stateless Server maintains no information on
client accesses - Each request must identify file and offsets
- Server can crash and recover
- No state to lose
- Client can crash and recover
- No open/close needed
- They only establish state
- No server space used for state
- Dont worry about supporting many clients (with
low activity) - Problems with consistency
- E.g., if file is deleted on server
- File locking not possible
25Caching
26Caching
- Goal Hide latency to improve performance for
repeated accesses - Four places to place data
- Servers disk
- Servers buffer cache
- Clients buffer cache
- Clients disk
- (last two introduce cache consistency problems!)
27Approaches to caching
- Write-through
- What if another client reads its cached copy?
- Consistency
- All accesses will require checking with server
- Or Server maintains state and sends invalidations
- Performance overheads
- Delayed writes
- Write data can be buffered locally
overwiriting does not produce additi0onal
overhead - Decide whae to perform writes (when cache is
full or periodically, and on close) - One bulk write is more efficient than lots of
little writes - Problem semantics become ambiguous
28Approaches to caching
- Write on close
- Admit that we have session semantics
- Centralized control
- Keep track of who has what open on each node
- Stateful file system with signaling traffic
29Striping
30Cluster Architecture
local disk
Processor 1
interconnect
Memory
NIC1
Processor 2
NIC2
- Each node has its own (small) disk
- Used to store (i.e., copy) the executables, and
some data - For many applications there needs to be a
globally visible file system - Large shared input/output data file that too big
for local disks
31Distributed File System?
- Question how do we make files visible across a
set of machines? - Answer use a distributed file system
- dedicate one of the nodes to be the server
- attach several (large) disks to it
- e.g., NFS
interconnect
32Distributed File System?
- Question how do we make files visible across a
set of machines? - Answer use a distributed file system
- use a NAS (Network-Attached Storage)
- Does the NFS thing in hardware
interconnect
NAS
33Distributed File System?
- Advantages
- Simple and well understood
- Disadvantages
- The file server can be a bottleneck
- Especially for a cluster that runs many
scientific applications at once - The intended usage is that a single process
reads/writes to a file at a time - But parallel applications would most likely
prefer doing concurrent reads and concurrent
writes - Often not built for top performance (NFS)
34Parallel File System
- improves on the drawbacks of distributed file
systems - Multiple disks
- Each disk has its own I/O channel
- Disks can be used simultaneously
- I/O is parallel at both ends
- Multiple processes writing/reading
- Multiple disks writing/reading
- Not necessarily matching numbers
35Parallel File System
interconnect
Compute Nodes
I/O Nodes Disks
36Parallel File System
Storage Area Network
interconnect
Compute Nodes
I/O Nodes Disks
37Parallel File Systems
- a number of commercial parallel file systems
- e.g., IBMs GPFS
- use disk striping
- strip factor number of disks
- strip depth size of each block
File
Disks
38Striping
- Multiple physical disks separate I/O channels
striping parallel access to a single file - Typically implements some form of RAID to combine
striping with fault-tolerance - e.g., RAID 5
- The file system needs to figure out where blocks
are located - Each I/O node maintains some directory
- There is a global name service
- Concurrent writes locking of blocks not files!
39Application view Parallel Applications and I/O
B proportion of program that is sequential
- Option 1 A single node does all I/O
- Amdahls law says if your data is large, forget
parallel speedup - Option 2 Before the application, split the input
data and store it into local disks on the nodes,
then at the end gather output - Cumbersome
- Storage may not be sufficient anyway
- Option 3 Do parallel I/O with a parallel file
system - Allows non-contiguous pieces of data in parallel
- e.g., interleaved pieces of a matrix for a cyclic
data distribution - But the UNIX API is not convenient for writing
parallel applications and accessing a parallel
file system - No complex access patterns
- No collective I/O
- Different APIs make code non-portable
- Solution use MPI I/O (part of MPI 2)
40Simple Example
File
P0 P1 P2 P3
P4
- MPI_File fh
- MPI_Status status
- MPI_Comm_rank(MPI_COMM_WORLD, rank)
- MPI_Comm_size(MPI_COMM_WORLD, nprocs)
- bufsize filesize/nprocs
- nints bufsize/sizeof(int)
- MPI_File_open(MPI_COMM_WORLD, /pfs/data,
MPI_MODE_RDONLY, MPI_INFO_NULL, fh) - MPI_File_seek(fh, rank bufsize, MPI_SEEK_SET)
- MPI_File_read(fh, buf, nints, MPI_INT, status)
- MPI_File_close(fh)
41Striping Summary (from an application/app
developer viewpoint)
- If your application is stuck doing I/O for most
of its time - Buy I/O hardware, Do not use NFS but rather some
parallel file system - Write code using MPI I/O
- All processes should do the same amount of I/O
- Make as large I/O requests as possible at a time
to benefit from striping - Performance benefits when compared to the naive
solution can be orders of magnitude - Other striping solutions
- Striping FTP server
42Next
- Case Study Freeloader
- Case Study on Data access patterns small worlds
and data sharing graph
43Next classes
- Volunteers Discussion leader for Thursday.
- Tuesday DFS
- Scale and Performance in a Distributed File
System, J. H. Howard et al., ACM Transactions on
Computer Systems Feb. 1988, Vol. 6, No. 1, pp.
51-81. pfd - The Google File System, Ghemawat et al., SOSP
2003 pdf - Thursday Data replication
- Efficient Replica Maintenance for Distributed
Storage Systems, Byung-Gon Chun et al. NSDI06
pdf. - Drafting Behind Akamai (Travelocity-Based
Detouring), Ao-Jan Su et al. SIGCOMM06 pdf.