Title: Distributed Systems
1Distributed Systems
- Lecture 4
- Communication
- 7. May, 2002
2Schedule of Today
Overview
- Summary of HW-stuff
- Overview
- Local IPC
- Remote IPC Pattern
- Bridges, Stubs
- Communication Protocols
- Multicast/Broadcast
- Group Communication
3Technology Trends
Summary Hardware
700
600
CPU MIPS
500
400
300
Memory MB
200
100
LAN Mbits
0
WAN Mbits
1985-1990
1990-1995
1995-2000
2000-2005
OS
overhead
Source Scientific American, Sept. 1995
4Typical Latencies (in ms)
Summary Hardware
WAN, disk latencies are fairly constant due to
physical limitations
Disk I/O
1000
100
Ethernet
10
RPC
1
ATM
roundtrip
0,1
WAN
0,01
roundtrip
Note dramatic drop in LAN latencies over ATM
1985-1990
1990-1995
1995-2000
2000-2005
5OS Latency Most Expensive Overhead on LAN
Communication!
Summary Hardware
40
35
30
25
20
OS
15
overhead as
percentage
10
5
0
1985-
1995-
1990
2000
6Broad Observations
Summary Hardware
- A discontinuity is currently occuring in WAN
communication speeds! - Other performance curves are all similar
- Disks have maxed out and hence are looking
slower and slower - Memory of remote computers looks closer and
closer - OS imposed communication latencies has risen in
relative terms over past decade!
7Implications?
Summary Hardware
- The revolution in WAN communication we are now
seeing is not surprising and will continue - Look for a shift from disk storage towards more
use of access to remote objects over the
network - OS overhead is already by far the main obstacle
to low latency and this problem will seem worse
and worse unless OS communication architectures
evolve in major ways.
8More Implications
Summary Hardware
- Look for full motion video to the workstation by
around 2005 or 2010 - Low LAN latencies an unexploited niche
- One puzzle what to do with extremely high data
throughput but relatively high WAN latencies - OS architecture and whole concept of OS must
change to better exploit the pool of memory of
a cluster of machines otherwise, disk latencies
will loom higher and higher
9Reliability and Performance
Summary Hardware
- Some think that more reliable means slower
- Indeed, it usually costs time to overcome failure
- For example, if a packet is lost, you probably
need to resend it, and may need to solicit the
retransmission - But for many applications, performance is a big
part of the application itself too slow means
not reliable for these! - Reliable systems thus must look for highest
possible performance - ... but unlike unreliable systems, they cant cut
corners in ways that make them flakey but faster
10Base Model of Communication
Overview
Sender
Receiver
Service Interface
Service Entry point
message
Medium
Spatial Distance
- Participants act as Sender or Receiver.
- Using a service by a client takes place via a
service-interface accessing an specific service
entry point. - Via the Medium the spatial distance is met
11Why Communication?
Overview
- Processes (of a distributed application) work
together - Due to lack of a shared memory processes must
transfer messages in order to cooperate - Preconditions
- Physical Interconnection Network
- Electric signals in copper links
- Rules obeyed by each communicating partner
- Communication protocols
- Common language and common semantics
12Characteristics of Services
Overview
- A client wants a specific service
- A server delivers a specific service
- Services can be performed with or without
acknowledgment - ? Perfoming a service includes the complete
protocol of request and reply
Client
Service
Server
13Horizontal versus VerticalCommunication
Overview
WWW-User
Altavista WWW-Server
Start the Search Produce a Answerpage
Request(...)
Vertical Communication
Vertical Communication
Horizontal Communication
Transfer of Data
14Message Based Communication
Overview
Process P2 begin receive(...) receive(...)
Process P1 begin send(...)
Why not this receive?
receive(...) end
send(...) end
15Message Passing System
Overview
- Implements data transfer via network
- Offers communication primitives at API
- at least a send(...) and receive(...) operation
- whole library of communication services
Distributed Application
simple form of middleware
16Message Passing System
Overview
- Functionality of a Message Passing system
- Uses standard protocols or implements new ones
- Guarantees specific properties according to
semantic - e.g. order of messages
- Abstracts from implementation details
- e.g. buffering, low-level addressing
- Masks certain failures
- e.g. automatic repetition after timeout
- Hides heterogenuity of HW and OS ? improving
portability
17Pragmatic Design Parameters
Overview
- Length of Message
- Constant or fixed
- Variable, but limited in size
- Unlimited
- Loss of messages
- Not noticed
- Suspected and notified
- Avoided
- Integrity of messages
- Not noticed
- Detected and notified
- Automatically corrected
18Prioritized Messages?
Overview
- Whats the semantic of a high-priority message,
how to deal with? - Shell transport-subsystem favour high
priority-messages? - Can we overwrite low-priority messages in a
buffer? - How many priority levels are useful?
- Within a receive, shell we first search for
high-priority messages? - Discuss for yourself very carefully!!!
19Order of Messages
Overview
- Often communication system offer no guarantee
- concerning the order of messages
FIFO doesnt guarantee, that messages may
violate indirectly the order of messages, e.g. ?
20Order of Messages
Overview
s(Ni1)
s(Ni)
s(Ni2)
r(Ni1)
r(Ni2)
r(Ni)
not causally ordered
21Failure Models
Overview
- Faulty send
- Faulty receive
- Faulty transmit
- Crash
- Fail-stop
- Timing failures
- Bycantine failures
22Faulty Send
Overview
- Sender may detect, but receiver doesnt notice
P1
P2
s
P3
Discuss the above statement!
23Faulty Receive
Overview
- Receiver may detect, but sender doesnt notice
P1
P2
s
P3
Discuss the above statement!
24Faulty Transmit
Overview
- Neither sender nor receiver are able to notice it
P1
P2
s
P3
25Crash
?
P1
P2
s
P3
s
Crash of a node without notification
26Fail Stop
P1
P2
s
P3
Crash of node P1 with notification
27Time Failure
- Event too late or too early
28Bycantine Failure
- Any kind of failure, e.g.
- forged messages
- process sending senseless messages
29Basic Communication Scheme
Due to transparency requirements the interface,
i.e. the system-calls send() and receive(),
should look the same.
Where to place the Logical Channel ? Where to
place the boundary between the two nodes ? How to
simulate the local case ?
30The Bridge Principle
Datatransfer over Network
Remark The functionality of the substitutes may
vary depending on the requirements of the pair
sender/receiver.
31Summary Bridges
- Local communication interfaces are preserved
- Missing partner is replaced by a substitute,
called stubs - Both stubs have to transfer the message via the
network - Bridges may become very complex
32Daily Communication
33Daily Communication
Remote Communication
Room boundary
talks to()
Person S
Person R
listens to()
Acoustic channel
Acoustic channel
Micro- phone
Loud speaker
Telephone Network
34Global Remote Communication
Distributed Systems
How to do? Analogies to daily life? What problems
to solve?
35Problems To Solve?
What may happen to a remote communication?
- We addressed the wrong communication partner
- We cannot get a communication line
- Due to line problems our partner cannot
understand us - The line or even the complete network breaks down
- We talk, but our partner does not respond
- ... Think over further problems
36Communication Templates
- Notification or uni-directional
send(...)
receive(...)
- Request or bi-directional
send(...) ..... receive(...)
receive(...) .... send(...)
37Synchronous Communication
- Blocking send sending process is blocked until
the message transaction is completed, - Sender knows message has been delivered/received
send
Buffer in OS of sender
Transport layer
Buffer in OS of receiver
1
2
3
4
38Communication Deadlocks
receive
P1
P2
receive
39Asynchronous Communication
- No-wait send, sending process is blocked until
its message has been delivered to the transport
systems - Longer blocking time if buffer space is empty
Pros Sending process can continue during message
transfer Higher degree of parallelism Cons Sender
doesnt know if and when message has
arrived Debugging of a distributed application is
more difficult OS has to preserve buffers (How
many??)
40Communication Libraries
- PVM (Parallel Virtual Machine)
- MPI (Message Passing Interface)
status send(buffer, size, dest, ...)
lt0 if failure, gt0 transfered bytes
41Duality of Communication Models
- Synchronous communication can be implemented with
asynchronuous communication
receiver
sender
... send m1 receive ack ...
... receive m1 send ack ...
Assumption receive is blocking
42Duality of Communication Models
- Asynchronous communication can be implemented
with synchronuous communication
Idea Use an additional buffer process for
buffering messages between sender and the
receiver.
... receive /wait until sender send a
message put message into bufferi ????? /wait
for next message send / or for the
receiver? ...
43Buffer Server Process
- Buffer process acts as a secretary
loop receive ... if message request then
send bufferi else put message into
bufferix go to loop
Solve the problem with a limited number of buffers
44Classification of Communication
asynchronous
synchronous
No-wait-send (Datagramm)
Notification
Rendezvous
Request
Remote Service Invocation (e.g. asynch. RPC)
Remote Procedure Call (RPC)
RPC or RMI (... Method ...) in OOS Further
variants concerning ports, mailboxes, broadcast,
etc.)
45Overview on Basic Communication Protocols
Distributed Systems
46Overview on Multicast Technology
Distributed Systems
- Reliability
- Ordering
- Membership
- Routing
- Quality of Service
47Overview on Group Communication
Distributed Systems
group identifier
process identifier
48Latency Problem in Networks
- We all know about latency in the Internet.
- Whats the main reason for this phenomenon?
- Distance
- involved switches etc.
- Load on path between source and target
- Buffer overflow
- something else?
49Scheduling and Congestion Problems in
Switches/Routers
What to do if the packets arrive faster than they
can be delivered? ? Buffering What to do when
buffer is full?
50Basic Communication Model
Distributed Systems
Purpose of a communication system exchange data
between parties
Source System
Destination System
source
transmitter
receiver
destination
Transmission system
51Communicating and Routing
Distributed Systems
Remark The following foils are for better
understanding and completeness and go beyond
the scope of this lecture (see Telematik etc.)
52Open System Interconnection Model (OSI)
Distributed Systems
53Physical Layer
Distributed Systems
- Provides an unreliable bit pipe
- Timing
- synchronous
- intermittent synchronous
- asynchronous
54Data Link Layer
Distributed Systems
- Provides Direct link for reliable packets
- Packet boundaries
- Error detection and correction in packets
- Data link control (omission free)
- Media access control
55Network Layer
Distributed Systems
Provides End-to-end link for reliable packets
Network
Data-Link
Data-Link
Data-Link
56Transport Layer
Distributed Systems
Provides End-to-end link for messages (arbitrary
size)
- Break and reassemble packet-messages
- Flow control
- Internetworking (gateways)
57Session Layer
Distributed Systems
Provides Virtual session
- Name service, directory
- From processor to process
- Access rights
- Billing
58Presentation Layer
Distributed Systems
Provides Virtual network service
- Conversion (ASCII-EBCDIC, floats, endian,
issues, etc., e.g.
59Application Layer
- File-transfer (ftp)
- Remote Login (e.g. telnet)
- E-mail (X.400)
- Name- and directory service (X.500)
60Protocols for OSI Layers
61Automatic Repeat Request Protocols (ARQ)
Distributed Systems
- Causes for message omission
- buffer spill
- error detection in a packet
- ARQ protocols
- Send Wait
- Arpanet
- Go back n
- Selective repeat
62Send Wait ARQ
Distributed Systems
1
63Arpanet ARQ
Distributed Systems
S W Channel 0
S W Channel 1
S W Channel 2
. . .
S W Channel 7
- Better line utilization than Send Wait
- Unlimited memory required (at least in theory)
64Go back n ARQ (n4)
Distributed Systems
0
1
2
3
4
1
2
3
4
5
6
a(0)
a(0)
a(0)
a(0)
a(0)
a(1)
a(2)
a(3)
- Good utilization
- limited memory required
- Full window is retransmitted in case of (one)
error
65Selective Repeat ARQ
Distributed Systems
- Combines acknowledges and non_acks
- Sliding window technique (as Go back n)
- Specifically indicating which packet is missing
- Limited memory required (a full window)
66Time Division Multiplexing (TDM)
Distributed Systems
- Best utilization if every node has something to
send - all the time
- Wastes time if this is not the case
- Slots can be unevenly assigned
Slot 1
Slot 2
Slot 3
Slot 4
Slot 1
67Slotted Aloha (Theoretical)
Distributed Systems
- Send at the next slot
- If collision occurs gt pick a random waiting
time - and send again at the next slot
- Breaks
- Maximal utilization is 0.36
- (but much less for a desired behavior)
Slot 2
Slot 4
Slot 1
68Aloha
Distributed Systems
- Send immediately
- If collision occurs gt pick a random waiting
time - and send again at that time
- Breaks
- Maximal utilization is 0.18
- (but much less for a desired behavior)
69Carrier Sense Multiple Access
Distributed Systems
- Listen to the line. Send if line is free
- If collision occurs gt pick a random waiting
time - and try again at that time
propagation delay
70CSMA/CD (with collision detection)
Distributed Systems
- Points to clarify
- propagation delay
- X persistent CSMA
- splitting algorithm for collision
Well known example Ethernet persistent CSMA/CD
with binary exponential backoff
71Token Ring
Distributed Systems
- Disadvantages
- Token loss
- Node crash
72Star Configuration
Distributed Systems
Can be used to mimic a bus configuration, e.g.
for Ethernet, Fast Ethernet, 1 Gig Ethernet or
Token Ring
major drawback single point of failure
73Distance Vector Routing
Distributed Systems
- Each router knows the ID of every router in the
network - Each router maintains a vector with an entry
- for every destination that contains
- Cost to reach the destination from this router
- Direct link that is on the cheapest path
- Each router periodically sends its vector to its
neighbors - Upon receiving a vector a router updates the
local vector - based on the direct links cost and the
received vector
74Link State Routing
Distributed Systems
- Each router knows the ID of every router in the
network - Each router maintains a topology map of whole
network - Each router periodically floods its link state
updates - (with its direct connectivity information)
- Upon receiving a vector, a router updates the
local map - and recalculates shortest paths
75Internet Routing
Distributed Systems
- Routing information protocol
- distance vector protocol
- hop count metric
- exchange is done every 30 seconds,
- fault detection every 180 sec
- cheap and easy to implement,
- unstable in the presence of faults
- Open shortest path first
- link state protocol
- internal hierarchy for better scaling
- optimization for broadcast LANs with routers on
them - (A designated router represents the whole LAN).
- Saves control messages and size.
76Distributed Systems
Internet Routing (2)
- An hierarchical routing protocol that connects
networks, - each of which runs an internal routing protocol
- RIP and OSPF are common internal protocols
- BGP (Border gateway protocol)
- A path vector protocol with additional policy
information - for each path.
- (Path vector protocols have the complete path
in each entry - and not only the next direct member).
- Generally used as the hierarchical routing
protocol
77TCP/IP Protocol Architecture
Distributed Systems
- Developed during work on the packet-switched
network ARPANET - Only 5 independent layers
- application
- transport (host-to-host)
- internet
- network access
- physical
78Physical Layer of TCP/IP
Distributed Systems
- physical interface between
- a data transmission device and a network
- deals with signals, data rate, etc.
79Network Layer of TCP/IP
Distributed Systems
- concerned with the exchange of data
- between and end system and the network
- software used depends on type of network
- concerned with access to and routing
- data across a network
80Internet Layer of TCP/IP
Distributed Systems
- used when two devices attached to different
networks - internet protocol (IP) is used
- to provide the routing function
- routers are used to relay data
- from one network to another one
81Transport Layer of TCP/IP
Distributed Systems
- provides logic for assuring that data
- exchanged between host is reliably delivered
- protocol at this layer is the
- transmission control protocol (TCP)
82Application Layer of TCP/IP
Distributed Systems
- contains protocols for specific user
applications
83Data Transmission via TCP/IP (1)
Distributed Systems
- TCP may break a block into smaller pieces
- to make it more manageable
- TCP appends information to each piece
- destination port
- sequence number
- checksum
- TCP hands the message down to IP
- with instructions to send it to a host
84Distributed Systems
Data Transmission via TCP/IP (2)
- IP hands the message down to the network access
- layer with instructions to send it to the
router - IP appends a header of control information
- now called an IP datagram
- destination host address
- Network access layer appends header information
- to create a packet or frame
85Protocol Data Units in TCP/IP
Distributed Systems
86TCP Header
Distributed Systems
87User Datagram Protocol (UDP)
Distributed Systems
- Connectionless server
- Does not guarantee delivery
- It adds a port addressing capability to IP
88Typical TCP/IP Applications
Distributed Systems
- Simple Mail Transfer Protocol (SMTP)
- File Transfer Protocol (FTP)
- TELNET
89 Simple Mail Transfer Protocol (SMTP)
Distributed Systems
- provides basic electronic mail facility
- provides mechanism for transferring
- messages among separate hosts
- includes mailing lists, return receipts, and
forwarding
90File Transfer Protocol (FTP)
Distributed Systems
- used to send files form one system to another
- under user commands
- allows user IDs and passwords to be transmitted
- allows the user to specify the file
- and file-actions as desired
91TELNET
Distributed Systems
- provides remote log-on capability, which enables
- a user at a terminal or personal computer
- to log on to a remote computer
- user functions as if directly connected to the
computer - remote terminals appear as local to the
application
92Preview
Preview
Implementation of Remote Communication RPC Socke
ts