Distributed System Structures

About This Presentation

Title:

Distributed System Structures

Description:

Naming scheme maps replicated file name to 1 replica ... Demand replication reading nonlocal replica causes it to be cached locally, ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 49

Provided by: simeo2

Category:

more less

Transcript and Presenter's Notes

Title: Distributed System Structures

1
Distributed System Structures

Background
Topology
Network Types
Communication
Communication Protocol
Robustness
Design Strategies

2
Learning Objectives

What is distributed system? Distributed OS?
What are advantages of distributed system?
What is data migration? Process migration? Why
needed?
How can pairs of processes wanting to communicate
over network be connected? Common schemes?
How failures detected in distributed systems How
does distributed system recover from failure?

3
A Distributed System
4
Motivation

Resource sharing
sharing and printing files at remote sites
processing information in distributed database
using remote specialized hardware devices
Computation speedup load sharing
Maintain responsiveness load balance
Availability detect, recover from site failure,
function transfer, reintegrate failed site
Communication message passing

5
Network Operating Systems

Users aware of multiple machines
Explicit access to other machine resources
remote logging in to another machine (ssh)
transfer data from remote to local machines
file transfer protocol (FTP)
secure copy (scp)
hypertext transfer protocol (http)

6
Distributed Operating Systems

Users not aware of multiple machines
access to remote, local resources similar
Data Migration
transfer data by transferring entire file, or
only portions necessary for immediate task
Process Migration
transfer computation rather than data

7
Distributed Operating Systems

Process Migration execute entire process, or
parts, at different sites
load balancing distribute processes to even out
workload
computation speedup subprocesses can run
concurrently on different sites
hardware preference process execution may
require specialized hardware (e.g., different
CPU)
software preference required software may only
be at one site
data access run process remotely, rather than
transfer all data
Downsides?

8
Topology

Sites in system can be physically connected in
variety of ways compared with respect to
following criteria
basic cost how expensive to link sites in
system?
communication cost time to send message from
site A to B
availability if link or site fails, can
remaining sites still communicate?
Topologies depicted as graphs
nodes correspond to sites
edge from node A to B direct connection between
sites

9
Network Topology
10
Network Types

Local-Area Network (LAN) small geographical
area
multiaccess bus, ring, or star network
growing popularity wireless networking
speed range ? 10 megabit/s 1Gb/s
broadcast fast and cheap
nodes
usually workstations, personal computers
fewer servers, printers

11
Network Types

Typical LAN

12
Network Types

Wide-Area Network (WAN) geographically
separated sites
point-to-point connections long-haul lines,
satellite links
speed 10s to 100s of Mbit/s
nodes (communication processors)
usually high percentage are routers
can also be big servers

13
Communication Processors in a WAN
14
Communication
Design of communication network must address 4
basic issues

Naming and name resolution how do 2 processes
locate each other to communicate?
Routing strategies how are messages sent through
network?
Packet strategies variable-length or fixed-size?
Connection strategies how do 2 processes send
sequence of messages?
Contention network is shared resource, so how
are conflicting demands for use resolved?

15
Naming and Name Resolution

Name systems in network
Address messages with process-id
Identify processes on remote systems by
lthost-name, identifiergt pair
Domain name service (DNS) specifies naming
structure of the hosts, as well as
name-to-address resolution (Internet)

16
Routing Strategies

Fixed routing path from A to B specified in
advance path changes only if hardware failure
disables it
shortest path usually chosen so communication
cost minimized
fixed routing cannot adapt to load changes
ensures messages will be delivered in order they
were sent
Virtual circuit path from A to B fixed for
session. Other sessions may have different paths
from A to B
partial remedy to adapting to load changes
ensures messages be delivered in order sent

17
Routing Strategies

Dynamic routing path for message chosen only
when message sent
usually site sends message on link least used at
that time
adapts to load changes by avoiding routing
messages on heavily used path
messages may arrive out of order remedy
sequence number on each message

18
Packet Strategies

Fixed size e.g., ATM simplifies switching
Reliable needs acknowledgement protocol
Unreliable e.g., datagrams rely on higher
layers to check delivery, check order

19
Connection Strategies

Circuit switching permanent physical link for
duration of communication (e.g., telephone call)
Message switching temporary link established
for duration of 1 message transfer (e.g.,
post-office mail)
Packet switching variable-length messages
divided into fixed-length packets each may take
different path. Packets reassembled into messages
as they arrive

Circuit switching setup time, less overhead for
shipping each message, may waste bandwidth (why
is voice over IP growing?) Message, packet
switching less setup time, more overhead per
message
20
Communication Protocol
Communication network organized in following
layers

Physical layer mechanical, electrical details
of physical transmission of bit stream
Data-link layer frames, or fixed-length parts
of packets includes
error detection
recovery from physical layer errors
Network layer provides connections, routes
packets in network
address of outgoing packets
decoding address of incoming packets
maintaining routing information to respond to
changing load levels

21
Communication Protocol

Transport layer low-level network access and
message transfer between clients, including
partitioning messages into packets
maintaining packet order
flow control
generating physical addresses
Session layer implements sessions, or
process-to-process communications protocols
Presentation layer resolves differences in
formats among sites in network, including
character conversions
half duplex/full duplex (echoing)
not common in real networks (in application layer)

22
Communication Protocol

Application layer interacts directly with
users, e.g.,
file transfer (ftp, http etc.)
remote-login protocols (telnet, ssh)
email (smtp etc.)
schemas for distributed databases
layer user sees relatively independent of
underlying technology

23
ISO Network Model
24
ISO Network Packet
25
TCP/IP Protocol Layers
26
Robustness

Failure detection
Reconfiguration

27
Failure Detection

Detecting hardware failure difficult
To detect link failure, handshaking protocol can
be used
Assume sites A and B established link. At fixed
intervals, exchange I-am-up message indicating up
and running
If Site A doesnt receive message within fixed
interval, assumes either (a) other site not up or
(b) message lost
Site A can now send Are-you-up? message to B
If A doesnt receive reply, can repeat message or
try alternative route to B

28
Failure Detection

If Site A doesnt ultimately receive reply from
Site B, concludes some type of failure occurred
Types of failures- site B down
- direct link between A and B down- alternative
link from A to B down
- message lost
A cant determine exactly why failure occurred

29
Reconfiguration

When Site A determines failure occurred, must
reconfigure system
1. If link from A to B failed, broadcast that to
every site
2. If site failed, every other site also
notified services offered by failed site no
longer available
When link or site available again, must again
broadcast that to all other sites

30
Design Issues

Transparency distributed system should appear
as conventional, centralized system to user
Fault tolerance distributed system should
continue to function in face of failure
Scalability as demands increase, should be easy
to add new resources to accommodate increased
demand
Cluster collection of semi-autonomous machines
that acts as single system

31
Distributed File Systems

Background
Naming and Transparency
Remote File Access
Stateful vs. Stateless Service
File Replication

32
Learning Objectives

What is distributed file system (DFS)?
What does transparency mean in DFS?
What do terms location transparency and location
independence mean for name mapping in DFS?
Stateful vs.stateless advantages and
disadvantages of each type of DFS

33
Background

Distributed file system (DFS) distributed
implementation of classical shared file system
multiple users share files and storage resources
DFS manages set of dispersed storage devices
Overall storage space managed by DFS composed of
different, remotely located, smaller storage
spaces
Usually correspondence between storage spaces and
sets of files

34
DFS Structure

Service software entity running on 1 machines
providing particular type of function to a priori
unknown clients
Server service software running on a single
machine
Client process that can invoke service using
set of operations that forms its client
interface
Client interface for file service formed by set
of primitive file operations (create, delete,
read, write)
Client interface of DFS should be transparent,
i.e., not distinguish between local and remote
files

35
Clients and Servers

Client/Server a software concept
marketing dictates selling a machine as a
server
can increase the price
Roles can vary
machine A running application an application
server
machine A needs a file on B, so A becomes a
client of B

36
Naming and Transparency

Naming mapping between logical and physical
objects
Multilevel mapping abstraction of file hides
details of how and where on disk file actually
stored
Transparent DFS hides location on network of
file
For file replicated on several sites, mapping
returns set of locations of files replicas
existence of multiple copies and location hidden

37
Naming Structures

Location transparency name doesnt reveal
physical storage location
file name still denotes specific, if hidden, set
of physical disk blocks
convenient way to share data
can expose correspondence between component units
and machines if scheme breaks, e.g., machine
moves
Location independence file name does not need
to be changed when files physical storage
location changes
better file abstraction
promotes sharing storage space itself
separates naming hierarchy form storage-devices
hierarchy

38
Naming Schemes 3 Main Approaches

Files named by combination of host name and local
name guarantees unique systemwide name
Attach remote directories to local directories,
giving appearance of coherent directory tree
must mount remote directories to access
transparently
Total integration of component file systems
single global name structure spans all files in
system
if server unavailable, some arbitrary set of
directories on different machines shouldnt
disappear (as e.g. in NFS)

39
Remote File Access

Reduce network traffic by caching recently
accessed disk blocks repeated accesses handled
locally
if needed data not already cached, copy brought
from server
accesses on local cached copy
files identified with 1 master copy at server
machine, but copies of (parts of) file scattered
in different caches
cache-consistency problem keeping cached copies
consistent with master file

40
Cache Location Disk vs. Memory

Advantages of disk caches
more reliable
cached data on disk still there after recovery,
no need to refetch
Advantages of main-memory caches
workstations can be diskless
accessed faster
memory upgrade increases speed advantage
server caches (to speed up disk I/O) in main
memory regardless of where user caches located
main-memory caches on user machine allows single
caching mechanism for servers and users

41
Cache Update Policy

Write-through write data to disk as soon as
cache modified
reliable, cache consistency easy, but poor
performance
Delayed-write modifications written to cache
and to server later. Write accesses complete
quickly some data may be overwritten before
write-back, and so need never be written at all
poor reliability unwritten data lost whenever
user machine crashes
variation scan cache regularly, flush blocks
modified since last scan
variation write-on-close, writes data back to
server when file closed. Best for files open for
long periods, frequently modified

42
Consistency

Is locally cached data consistent with master
copy?
Client-initiated approach
client initiates validity check
server checks whether local data consistent with
master copy
Server-initiated approach
server records, for each client, (parts of) files
it caches
when server detects potential inconsistency, must
react

43
Stateful File Service

Mechanism
client opens file
server fetches information about file from disk,
stores in its memory, gives client identifier
unique to client and open file
identifier used for subsequent accesses until
session ends
server must reclaim main-memory space used by
inactive clients
Increased performance
fewer disk accesses
stateful server knows if file opened for
sequential access and can read ahead next blocks

44
Stateless File Server

Avoids state information by making each request
self-contained
Each request identifies file, position in file
No need to establish and terminate connection by
open and close operations

45
Stateful vs. Stateless Service

Failure Recovery
stateful server loses all volatile state in crash
restore state by recovery protocol based on
dialog with clients, or abort operations underway
when crash occurred
server must be aware of client failures to
reclaim space for record of client process states
(orphan detection and elimination)
stateless server effects of server failure and
recovery much less noticeable newly
reincarnated server can respond to self-contained
request with no difficulty

46
Distinctions

Penalties for using robust stateless service
longer request messages
slower request processing
additional constraints on DFS design
Some environments require stateful service
server using server-initiated cache validation
cant offer stateless service records which
files cached by which clients
UNIX use of file descriptors and implicit offsets
inherently stateful servers must maintain tables
to map file descriptors to inodes, and store
current file offset

47
File Replication

Replicas of file on failure-independent machines
Improves availability and can shorten service
time
Naming scheme maps replicated file name to 1
replica
existence of replicas should be invisible to
higher levels
replicas distinguished by different lower-level
names
Updates replicas of file denote same logical
entity update to any replica must be reflected
on all others
Demand replication reading nonlocal replica
causes it to be cached locally, thereby
generating new nonprimary replica