Distributed Systems part 1 - PowerPoint PPT Presentation

About This Presentation

Title:

Distributed Systems part 1

Description:

'A distributed system is one in which the failure of a computer you didn't even ... Enables ad-hoc routing, message 'mule' behaviors. J. I. H. G. F. A. B. C. D. E. H ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 30

Provided by: chris161

Learn more at: https://www.cse.wustl.edu

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Systems part 1

1
Distributed Systems (part 1)

Chris Gill
cdgill_at_cse.wustl.edu
Department of Computer Science and Engineering
Washington University, St. Louis, MO, USA

CSE 591 Area 5 Talk Monday, November 10, 2008

2
What is a Distributed System?
"A distributed system is one in which the
failure of a computer you didn't even know
existed can render your own computer
unusable. - Leslie Lamport (BTW, this is
entirely ha, ha, only serious -)
3
Key Characteristics of a Distributed System

Programs on different computers must interact
A distributed system spans multiple computers
Programs must send information to each other
Programs must receive information from each other
Programs also need to do some work -)
Programs play different roles in those
interactions
Send a request (client), process the request
(server), send a reply (server), receive and
process reply (client)
Remember where to find things (directory, etc.
services)
Mediate interactions among distributed programs
(coordination, orchestration, etc. services)
Programs can interact in many other ways as well
Coordination tuple spaces (JavaSpaces, Linda,
LIME)
Publish-subscribe and message passing middleware
Externally driven (e.g., a workflow management
system)

4
Distribution Semantics Matters a Lot

How are the different computers inter-connected?
Does all traffic move on a common data bus?
Or, does traffic move across (hierarchical)
networks?
Or, does traffic move point-to-point between
hosts?
Are there spatial and/or temporal factors?
Does hosts physical location/movement matter?
Is delay noticeable, are bandwidth limits
relevant?
Are connections always on or can they be
intermittent?
Does the inter-connection topology change?
Is the inter-connection topology entirely
dynamic?
Programs can interact in many other ways as well
Coordination tuple spaces (JavaSpaces, Linda,
LIME)
Publish-subscribe and message passing middleware
Externally driven (e.g., a workflow management
system)

5
Distribution Semantics Examples (1/3)

Wired (hierarchical) internet
Can reach any host from any other host
Hosts are always on and available ( failure,
downtime)
Much of the WWW depends on this notion (example?)

J
A
I
B
H
C
G
D
F
E
6
Distribution Semantics Examples (2/3)

Nomadic (hierarchical) internet
Some hosts are mobile, connect to nearest access
point
Hosts may be unavailable, but reconnect
eventually
Host-to-host path topology may change due to this
Cell phones, wireless laptops exhibit this
behavior

J
A
I
B
H
C
G
D
F
E
7
Distribution Semantics Examples (3/3)

Mobile ad hoc networks (MANETS)
Mobile hosts connect to each other (w/out access
point)
Hosts may detect dynamic connection,
disconnection
Hosts must exploit communication windows of
opportunity
Enables ad-hoc routing, message mule behaviors

A
B
J
I
C
H
H
G
D
E
F
8
Distributed System Example (Wired)

Real-time avionics middleware
Layer(s) between the application and the
operating system
Ensures non-critical activities dont interfere
with timing of critical ones
Based on other open-source middleware projects
ACE C library and TAO object request broker
Standards-based (CORBA), written in C/Ada

Flight demonstrations BBN, WUSTL, Boeing,
Honeywell
9
Distributed System Example (Nomadic/MANET)

Sliver
A compact (small footprint) workflow engine for
personal computing devices (e.g., cell phones,
PDAs)
Allows mobile collaboration to assemble and
complete automated work-flows (task graphs)
Standards-based (BPEL, SOAP), written in Java

Developed by Greg Hackmann at WUSTL
10
How do Distributed Systems Interact?

Remote method invocations are one popular style
Allows method calls to be made between programs
Middleware uses threads, sockets, etc. to make it
so
CORBA, Java RMI, SOAP, etc. standardize the
details
Other styles (better for nomadic/mobile settings)
Coordination tuple spaces (JavaSpaces, Linda,
LIME)
Publish-subscribe and message passing middleware
Externally driven (e.g., a workflow management
system)

11
Challenges for (Wired) Distributed Systems

Distributed systems are inherently complex
Remote concurrent programs must inter-operate
Interactions must be assured of liveness and
safety
Also must avoid accidental complexity
Design for ease of configuration, avoidance of
mistakes
System architectures and design patterns can help
map low level abstractions into appropriate
higher level ones

12
How to Abstract Concurrent Event Handling?
Goal process multiple service requests
concurrently using OS level threads
Server
Port27098
Port26545
13
Basis Synchronous vs. Reactive Read
Clients
Server
Clients
Server
read()
select()
data
data
read()
HandleSet
HandleSet
14
Approach Reactive Serial Event Dispatching
Application
Reactor
read()
select()
Clients
Event Handlers
handle_()
read()
HandleSet
15
Interactions among Participants
Synchronous Event Demultiplexer
Concrete Event Handler
Reactor
Main Program
register_handler(handler, event_types)
get_handle()
handle_events()
select()
event
handle_event()
16
Distributed Interactions with Reactive Hosts

Application components implemented as handlers
Use reactor threads to run input and output
methods
Send requests to other handlers via sockets,
upcalls
Example of a multi-host request/result chain
h1 to h2, h2 to h3, h3 to h4

handler h1
handler h2
handler h4
handler h3
socket
socket
reactor r3
reactor r1
reactor r2
17
WaitOnConnection Strategy

Handler waits on socket connection for the reply
Makes a blocking call to sockets recv() method
Benefits
No interference from other requests that arrive
while the reply is pending
Drawbacks
One less thread in the Reactor for new requests
Could allow deadlocks when upcalls are nested

18
WaitOnReactor Strategy

Handler returns control to reactor until reply
comes back
Reactor can keep processing other requests while
replies are pending
Benefits
Thread available, no deadlock
Thread stays fully occupied
Drawbacks
Interleaving of request reply processing
Interference from other requests issued while
reply is pending

19
Blocking with WaitOnReactor

Wait-on-Reactor strategy could cause interleaved
request/reply processing
Blocking factor could be large or even unbounded
Based on the upcall duration
And sequence of other intervening upcalls
Blocking factors may affect real-time properties
of other end-systems
Call-chains can have a
cascading blocking effect

f2
f5
f3
f5 reply queued
Blocking factor for f2
f3 completes
f5 reply processed
f2 completes
20
Why not a Stackless WaitOnReactor Variant?

What if we didnt stack processing of results?
But instead allowed them to handled
asynchronously as they are ready
Stackless Python takes this approach
Thanks to Caleb Hines who pointed this out in CSE
532
Benefits
No interference from other requests that arrive
when reply is pending
No risk of deadlock as thread still returns to
reactor
Drawbacks
Significant increase in implementation complexity
Time and space overhead to match requests to
results (other patterns we cover in CSE 532 could
help, though)

21
Could WaitOnConnection Be Used?

Main limitation is its potential for deadlock
And, it offers low overhead, ease of
implementation/use
Could we make a system deadlock-free
if we knew its call-graph and were careful
about how threads were allowed to proceed?
Notice that a lot of distributed systems research
has this kind of flavor
Given one approach (of probably several
alternatives)
Can we solve problem X that limits its
applicability and/or utility?
Can we apply that solution efficiently in
practice?
Does the solution raise other problems that need
to be solved?

22
Deadlock Problem in Terms of a Call Graph

Call graph often can be obtained
Each reactor is assigned a color
Deadlock can exist
If there exists gt Kc segments of color C
Where Kc is the number of threads in node with
color C
E.g., f3-f2-f4-f5-f2 needs at least 2 1

f1
f2
f3
f4
f5
From V. Subramonian and C. Gill, A Generative
Programming Framework for Adaptive Middleware,
2004
23
Simulation Showing Thread Exhaustion
Clients send requests 3 Client3
TRACE_SAP_Buffer_Write(13,10) 4
Unidir_IPC_13_14 TRACE_SAP_Buffer_Transfer(13,14
,10) 5 Client2 TRACE_SAP_Buffer_Write(7,10)
6 Unidir_IPC_7_8 TRACE_SAP_Buffer_Transfer(7
,8,10) 7 Client1 TRACE_SAP_Buffer_Write(1,10
) 8 Unidir_IPC_1_2 TRACE_SAP_Buffer_Transfer
(1,2,10) Reactor1 makes upcalls to event
handlers 10 Reactor1_TPRHE1
---handle_input(2,1)---gt Flow1_EH1 12
Reactor1_TPRHE2 ---handle_input(8,2)---gt
Flow2_EH1 14 Reactor1_TPRHE3
---handle_input(14,3)---gt Flow3_EH1 Flow1
proceeds 15 Time advanced by 25 units. Global
time is 28 16 Flow1_EH1 TRACE_SAP_Buffer_Wri
te(3,10) 17 Unidir_IPC_3_4
TRACE_SAP_Buffer_Transfer(3,4,10) 19
Reactor2_TPRHE4 ---handle_input(4,4)---gt
Flow1_EH2 20 Time advanced by 25 units.
Global time is 53 21 Flow1_EH2
TRACE_SAP_Buffer_Write(5,10) 22
Unidir_IPC_5_6 TRACE_SAP_Buffer_Transfer(5,6,10)
Flow2 proceeds 23 Time advanced by 25 units.
Global time is 78 24 Flow2_EH1
TRACE_SAP_Buffer_Write(9,10) 25
Unidir_IPC_9_10 TRACE_SAP_Buffer_Transfer(9,10,1
0) 27 Reactor2_TPRHE5 ---handle_input(10,5)---
gt Flow2_EH2 28 Time advanced by 25 units.
Global time is 103 29 Flow2_EH2
TRACE_SAP_Buffer_Write(11,10) 30
Unidir_IPC_11_12 TRACE_SAP_Buffer_Transfer(11,12
,10) Flow3 proceeds 31 Time advanced by 25
units. Global time is 128 32 Flow3_EH1
TRACE_SAP_Buffer_Write(15,10) 33
Unidir_IPC_15_16 TRACE_SAP_Buffer_Transfer(15,16
,10) 35 Reactor2_TPRHE6 ---handle_input(16,6)
---gt Flow3_EH2 36 Time advanced by 25 units.
Global time is 153 37 Flow3_EH2
TRACE_SAP_Buffer_Write(17,10) 38
Unidir_IPC_17_18 TRACE_SAP_Buffer_Transfer(17,18
,10) 39 Time advanced by 851 units. Global
time is 1004
Server1
Server2
EH11
EH21
Client1
Flow1
EH31
EH12
EH22
Client2
Flow2
EH32
EH13
EH23
Client3
Flow3
EH33
Reactor1
Reactor2
Formally, increasing number of reactor threads
may not prevent deadlock
24
Solution New Deadlock Avoidance Protocols

Papers at FORTE 2005 through EMSOFT 2006
http//www.cse.wustl.edu/cdgill/PDF/forte05.pdf
http//www.cse.wustl.edu/cdgill/PDF/emsoft06_live
ness.pdf
César Sánchez PhD dissertation at Stanford
Collaboration with Henny Sipma and Zohar Manna
Paul Oberlin MS project here at WUSTL
Avoid interactions leading to deadlock
a liveness property
Like synchronization, achived via scheduling
Upcalls are delayed until enough threads are
ready
But, introduces small blocking delays
a timing property
In real-time systems, also a safety property

25
Deadlock Avoidance Protocol Overview

Regulates upcalls based on of available reactor
threads and call graphs thread height
Does not allow exhaustion
BASIC-P protocol implemented in the ACE Thread
Pool Reactor
Using handle suspension and resumption
Backward compatible, minimal overhead

Server1
Server2
EH11
EH21
Client1
Flow1
EH31
EH12
EH22
Client2
Flow2
EH32
EH23
EH13
Client3
Flow3
EH33
Reactor1
Reactor2
26
Timing Traces DA Protocol at Work
EH21
EH11
R1
R2
EH31
Flow1
EH22
EH12
R1
R2
EH32
Flow2
Timing traces from model/execution show DA
protocol regulating the flows to use available
resources without deadlock
EH23
EH13
R1
R2
EH33
Flow3
27
DA Blocking Delay (Simulated vs. Actual)
Actual Execution
Model Execution
Blocking delay for Client2
Blocking delay for Client3
28
Overhead of ACE TP reactor with DA
Negligible overhead with no DA protocol
Overhead increases with number of event handlers
because of their suspension and resumption on
protocol entry and exit
29
Where Can We Go From Here?

Distributed computing is ubiquitous
in planes, trains, and automobiles
in medical devices and equipment
in more and more places each day
Distributed systems offer many research
opportunities
Discover them from specific problems
May allow advances even in well worked areas
(e.g., deadlock avoidance)
What new systems can we build by spanning
different platforms?
Ill leave that as an open question for you to
consider (and ultimately, to answer)