Title: Distributed File Systems
1Distributed File Systems
- Synchronization 11.5
- Consistency and Replication - 11.6
- Fault Tolerance 11.7
211.5 Synchronization
- File System Semantics
- File Locking
3Synchronization
- Is an issue only if files are shared
- Sharing in a distributed system is often
necessary, and at the same time can affect
performance in various ways. - In the following discussion we assume file
sharing takes place in the absence of
process-implemented synchronization operations
such as mutual exclusion.
4UNIX File Semantics
- In a single-processor system, any file read
operation returns the result of the most recent
write operation. - Even if two writes occur very close together, the
next read returns the result of the last write. - It is as if all reads and writes are time-stamped
from the same clock. Operation order is based on
strict time ordering.
5UNIX Semantics in DFS
- Possible to (almost) achieve IF
- There is only one server
- There is no caching at the client
- In this case every read and write goes directly
to the server, which processes them in sequential
order. - Network delays might make minor differences in
wall clock ordering.
6Caching and UNIX Semantics
- Single-server no client caching leads to poor
performance, so most file systems allow users to
make local copies of files (or file blocks) that
are currently in use. - Now UNIX semantics are problematic a write
executed on a local copy only will not be seen by
another client that reads the file from the
server, or from other clients that have the file
cached.
7Write-Through
- A possible solution is to require all changes to
local copies to be immediately written to the
server. - Inefficient caching is no longer as useful
- Not a total solution what happens when two users
have the same file cached?
8Consistency Models
- Recall discussion of consistency models in
Chapter 7 - Realistically, strict consistency or even
sequential consistency cant be easily achieved
without synchronization techniques such as
transactions or locks - Here we consider what the file system can do in
the absence of user-enabled methods.
9Session Semantics
- Instead of trying to implement UNIX semantics
where it really is impractical, define a new
semantic - Local changes to a file are not made permanent
until the file is closed. If another user opens
the file, it gets the original version. - This approach is common in DFSs.
- In effect, this turns a remote-access model into
an upload-download model.
10Simultaneous Caching
- What if two users concurrently cache and modify
the same file? How do we determine the new
state of the file? - Possibilities
- The most recently closed file becomes the new
official version (most common) - The decision is unspecified (an unlikely choice)
11Immutable Files
- The only operations on a file are, effectively,
create, read, and replace. - Once a file is created it can be read but not
changed. - A new file (incorporating changes to a current
file) can be created and placed in the directory
instead of the original version. - If several users try to replace an existing file
at the same time, one is chosen either the last
to close, or non-deterministically.
12Review File System Semantics
- UNIX semantics
- Session semantics
- Immutable files
-
- Every file operation is instantly visible to all
processes - No changes are visible until the file is closed.
- No updates are possible files can only be
replaced
13Transaction Semantics
- Transactions are a way of grouping several file
operations together and ensuring that they are
either all executed or none is executed. - We say they are atomic.
- The transaction system is responsible for
ensuring that all of the operations are carried
out in order, without any interference from
concurrent transactions.
14The Transaction Model
- Transaction a set of operations which must be
executed entirely, or not at all. - Processes in a transaction can fail at random
- Failure causes hardware or software problems,
network problems, lost messages, etc. - Transactions will either commit or abort
- Commit gt successful completion (All)
- Abort gt partial results are undone (Nothing)
15Transaction Model
- Transactions are delimited by two special
primitives - Begin_transaction // or something similar
- transaction operations
- (read, write, open, close, etc.)
- End_transaction
- If the transaction successfully reaches the end
statement, it commits and all changes become
permanent otherwise it aborts.
16ACID Properties of Transactions
- Atomic either all or none of the operations in a
transaction are performed - Consistent the transaction doesnt affect system
invariants e.g., no money lost in a banking
system - Isolated (serializable) one transaction cant
affect others until it completes - Durability changes made by a committed
transaction are permanent, even if the process or
server fails.
17Atomicity
- An atomic action is one that appears to be
indivisible and instantaneous to the rest of
the system. For example, machine language
instructions. - Transactions support the execution of multiple
instructions as if they were a single atomic
instruction.
18Consistent
- A state is consistent if invariants hold
- An invariant is a predicate which states a
condition that must be true. - Invariants for the airline ticket example
- seatsLeft seatsTotal seatsSold
- seatsLeft gt0
- In the bank case (simplified)
- balancefinal balanceoriginal withdrawals
deposits
19Isolated
- No other transaction will see the intermediate
results of a transaction. - Concurrent transactions have the same effect on
the database as if they had run serially. Notice
the similarity to critical sections, which do run
serially. - This characteristic is enforced through special
concurrency control measures.
20AD Properties
- ACID is a commonly used term, but somewhat
redundant. - Transactions that execute atomically will be
consistent and isolated. - Atomicity and durability capture the essential
qualities.
21Semantics of File Sharing in Distributed Systems
- UNIX semantics
- Session semantics
- Immutable files
- Transactions
- Every file operation is instantly visible to all
processes - No changes are visible until the file is closed.
- No updates are possible files can only be
replaced - All changes occur and are visible atomically or
not at all
22File Locking
- UNIX file semantics are not possible in DFS
- Session semantics and immutable files do not
always support the kind of sharing processes
need. - Transactions have a heavy overhead.
- Thus some additional form of locking is desirable
to enforce mutual exclusion on writes.
23File Locking in NFSv4
- Lock managers in NSF, as in other file systems,
are based on the centralized scheme discussed in
Chapter 6 - Client requests lock
- Lock manager grants lock
- Client releases lock (or it expires after a time)
- In NSF, if a client requests a lock which cannot
be granted, the client is not blocked must try
again later.
24Denied Requests
- If a clients request for a lock is denied, it
receives an error message. - Poll the server later for lock availability
- Clients can request to be put on a FIFO queue
when a lock is released it is reserved for the
first process on the queue if that process polls
within a certain amount of time it gets the lock.
25File locking in NFS
- Two types of locks
- Reader locks, which can be held simultaneously,
- Writer locks, which guarantee exclusive access.
- The lock operation is applied to consecutive byte
sequences in the file, rather than to the whole
file.
26NFSv4 Lock Related Operations
- Description
- Create a lock for a range of bytes
- Test whether a conflicting lock has been granted
- Remove a lock from a range of bytes
- Renew the lease on a lock
- Operation
- Lock
- Lockt
- Locku
- Renew
27Leases
- Locks are granted for a specific time interval.
- At the end of that interval the lock is removed
unless the client has requested an extension.
28Share Reservations in NFS
- An open request specifies the kind of access the
application requires READ, WRITE, BOTH - It also specifies the kind of access that should
be denied other clients NONE, READ, WRITE, BOTH - If requirements cant be met, open fails
- Share reservations implicit locking
29Share Reservations - Example
- Client tries to open a file for reading and
writing, and deny concurrent write access. - If no other client has the file open, the request
succeeds. - If another client has opened the file for
reading, the request succeeds - If another client has opened the file for
writing, the request fails. - If another client has the file open and has
denied read or write access, the request fails.
3011.6 Consistency and Replication
- Client-Side Caching
- Server-Side Replication
- Replication in P2P Systems
31Introduction
- Replication (and caching) gt multiple copies of
something - Two reasons for replication
- Reliability (protection against failure,
corruption) - Performance (size of user base, geographical
extent of system) - Replication can cause inconsistency at least one
copy is different from the rest.
32Caching in a DFS
- Caching in any DFS reduces access delays due to
disk access times or network latency. - Caches can be located in the main memory of
either the server or client and/or in the disk of
the client - Client-side caching (memory or disk) offers most
benefits, but also leads to potential
inconsistencies.
33Cache Consistency Measures
- Server-initiated consistency server notifies
client if its data becomes stale - e.g., another client closes its copy of the file,
which was opened for writing. - Client-initiated consistency client is
responsible for consistency of data - e.g., client side software can periodically check
with server to see if file has been modified.
34Caching in NFS
- NFSv3 did not define a caching protocol.
- Different implementations led to different
results. - Stale data data that doesnt agree with the
data at the server could exist for periods
ranging from a few seconds to ½ minute
35Cache Consistency Problem
- How can stale data (relative to server) be
avoided? - NFSv4 does not improve the system enormously, but
there are some changes - Many details are still implementation dependent.
- General structure next slide
36Client Side Caching in NFS Figure 11-21.
Memory Cache
NFS server
Client applica-tion
Disk cache
Network
37What Do Clients Cache?
- File data blocks
- File handles for future reference
- Directories
38Caching File Data
- The simplest approach to caching allows the
server to retain control over the file. - Procedure
- Client opens file
- Data blocks are transferred to the client (by
read ops) - Client can read and write data in the cache.
- When the file closes, flush changes back to
server - Session semantics NFS the last (most recent)
process to close a file has its changes become
permanent. Changes made by processes that run
concurrently are lost.
39Caching with Server Control
- In caching with server control
- All clients on a single machine may read and
write the same cached data if they have access
rights - data remaining in the cache after a file closes
doesnt need to be removed, altho changes must
be sent to server. - If a new client on the same machine opens a file
after it has been closed, the client cache
manager usually must validate local cached data
with the server - If the data is stale, replace it.
40Caching With Open Delegation
- Allows a client machine to handle some local open
and close operations from other clients on the
same machine. - Normally the server decides if a client can open
a file - Delegation can improve performance by limiting
contact with the server - The client machine gets a copy of the entire
file, not just certain blocks.
41Open delegation Examples
- Suppose a client machine has opened a file for
writing, and has been delegated rights to control
the file locally. - If another local client tries to lock the file,
the local machine can decide whether or not to
grant the lock - If a remote client tries to lock the file (at the
server) the server will deny file access - If a client has opened the file for reading,
only, local clients desiring write privileges
must still contact the server.
42Delegation and Callbacks
- Server may need to undelegate the file
perhaps when another client needs to obtain
access. - This can be done with a callback, which is
essentially an RPC from server to client. - Callbacks require the server to maintain state
(knowledge) about clients a reason for NFS to
be stateful.
43Caching Attributes
- Clients can cache attributes as well as data.
- (size of file, number of links, last date
modified, etc.) - Cached attributes are kept consistent by the
client, if at all - No guarantee that the same file cached at two
sites will have the same attributes at both sites - Attribute modifications should be written through
to the server (write through cache coherence
policy), although theres no requirement to do so
44Leases
- Lease cached data is automatically invalidated
after a certain period of time. - Applies to file attributes, file handles (mapping
of name to file handle), directories, and
sometimes data. - When lease expires, must renew data from server
- Helps with consistency.
45An Implementation of Leases
- Data blocks have time-stamps applied by the
server that indicate when they were last
modified. - When a block is cached at a client, the servers
time-stamp is also cached. - After a period of time, the client confirms the
validity of the data - Compare timestamp at the client to timestamp at
server - If server timestamp is more recent, invalidate
client data
46CodaA Prototype Distributed File System
- Developed at CMU M. Satarayanan
- Started in 1987 as an improvement on the Andrew
file system ( a classic research FS) - Most recent version of Coda (6.9.3) was
released 1/11/2008 (http//www.coda.cs.cmu.edu/new
s.html )
47Objectives of Coda
- Support disconnected operation (server goes down,
laptop is disconnected from network, etc.) - Client side caching is extensive
- Uses client disk cache
- Replication contributes to availability, fault
tolerance, scalability
48Caching in Coda
- Critical, because of Codas objectives
- Caching achieves scalability provides more fault
tolerance for the client in case it is
disconnected from the server. - When a client opens a file, the entire file is
downloaded. This is true for reads and writes.
49Concurrent Access
- In Coda, many clients may have a file open for
reading, but only one for writing. - Multiple readers and single writer may exist
concurrently - In NFS and most other file systems, multiple
readers and multiple writers can exist
concurrently.
50Callbacks/Server Initiated Cache Consistency
- A Coda callback is an agreement between the
server and a client. Server agrees to notify
client when a file has been modified by another
client. - At this time, the client may purge the file from
its cache, but it may also continue reading the
outdated copy. - This is a blend of session and transaction
semantics.
51Coda Callbacks
- Callback promise servers commitment to notify
client when file changes - Callback break notice from server that the
clients file is stale called a break because
it terminates the agreement. There will be no
further callbacks unless the client renews it.
52Figure 11-23, page 523
- Local copies of files can be used as long as the
client still has an outstanding callback promise - No other client has closed a modified file.
53client 1
cache
server
client 2
cache
Suppose clients 1 2 have cached the same
file. Client 1 modifies the file How/when does
client2 know? What role, if any, does the server
have? Are Coda and NFS different?
5411.6.2 Server-Side Replication
- Caching is a form of replication at the client
side. - Initiated by client request
- Cached information is temporary
- Unit of caching a file, or less (usually)
- Purpose improved performance
- Server replication
- Mainly for fault tolerance availability
- May actually degrade performance (overhead)
- Less common than caching in DFS
55Caching Replication in Coda
- Unit of replication volume (group of related
files) - Each volume is stored on several servers, its
Volume Storage Group (VSG) - Available Volume Storage Group (AVSG) is the set
of servers a client can actually reach - Contact one server to get permission to R/W,
contact all when closing an updated file.
56Server S1
Server S3
Server S2
Broken network
Client B Open(f)
Client A Open(f)
Figure 11-24. Two clients with a different AVSG
for the same file
57Writing in Disconnected Systems
- Each file has a Coda version vector (CVV),
analogous to vector timestamps, one component per
server. Starts at (1, 1, 1) - Update local component after a file is updated.
- As long as all servers get all updates, all
timestamps will be equal
58Detecting Inconsistencies
- In the previous example, both A and B will be
allowed to open a file for writing. - When A closes, it will update S1 and S2, but not
S3 B will update S3, but not S1, S2. - The timestamp at S1 and S2 will be 2, 2, 1.
- The timestamp at S3 will be 1, 1, 2.
- It is easy to detect the inconsistency, but
knowing how to resolve them is application
dependent.
59Replication in P2P Systems
- In P2P systems replication is more important
because - P2P members are less reliable may leave the
system or remove files - Load balance is important since there are no
designated servers - File usage in P2P is different most files are
read only, updates consist of adding new files,
so consistency is less of an issue.
60Unstructured P2P Systems(each node knows n
neighbors)
- Look-up search (in structured systems, lookup
is directed by some algorithm) - Replication speeds up the process
- How to allocate files to nodes (it may not be
possible to force a node to store files) - Uniformly distribute n copies across network
- Allocate more replicas for popular files
- Users who download files are responsible for
sharing them with others (as in BitTorrent)
61Structured P2P Systems
- Replication is used primarily for load balance
- Possible approaches
- Store a replica at each node in the search path
(concentrates replicas near the prime copy, but
may unbalance some nodes) - Store replicas at nodes that request a file,
store pointers to it at nodes along the way.
6211.7 Fault Tolerance in DFS
- Review of Fault Tolerance
- Handling Byzantine Failures
- High Availability in P2P systems
63Basic Concepts - Review
- Distributed systems may experience partial
failure - Build systems to automatically recover from
crashes. - Continue to operate normally while failures are
being repaired i.e., be fault tolerant. - Fault tolerant systems exhibit dependabilty.
- Availability the system is immediately ready to
use - Reliability the system can run continuously
without failing. - (remember availability/reliability example)
- Safety system failure doesnt have disastrous
consequences - Maintainability easy to repair
64Failure Models
- Failure may be due to an error at any place in
the system - The server crashes
- The network goes down
- A disk crashes
- Security violations occur
- Crash failure, omission failure, Byzantine
failure - Incorrect, but undetectable
- malicious servers produce deliberately wrong
results, - ...
65Handling Byzantine Failures in Distributed File
Systems
- Replication handles many errors in DFS but
Byzantine errors are harder to solve. - Text presents an algorithm by Castro and Liskov
that works as long as no more than 1/3 of the
nodes is faulty at any moment. - Clients must get the same answer from k1
servers (in a system with 3k 1) to be sure the
answer is correct.
66Availability in P2P Systems
- Possible approaches
- Replication (although must be at very high levels
due to unreliability of nodes) - Erasure coding divides a file into m fragments,
recodes them into n gt m fragments such that any
set of m fragments can be used to reconstruct the
entire file. Distribute fragments, rather than
entire file replicas - Requires less redundancy than full replication.
67THE END