Title: Distributed Systems part 1
1Distributed Systems (part 1)
- Chris Gill
- cdgill_at_cse.wustl.edu
- Department of Computer Science and Engineering
- Washington University, St. Louis, MO, USA
CSE 591 Area 5 Talk Monday, November 10, 2008
2What is a Distributed System?
"A distributed system is one in which the
failure of a computer you didn't even know
existed can render your own computer
unusable. - Leslie Lamport (BTW, this is
entirely ha, ha, only serious -)
3Key Characteristics of a Distributed System
- Programs on different computers must interact
- A distributed system spans multiple computers
- Programs must send information to each other
- Programs must receive information from each other
- Programs also need to do some work -)
- Programs play different roles in those
interactions - Send a request (client), process the request
(server), send a reply (server), receive and
process reply (client) - Remember where to find things (directory, etc.
services) - Mediate interactions among distributed programs
(coordination, orchestration, etc. services) - Programs can interact in many other ways as well
- Coordination tuple spaces (JavaSpaces, Linda,
LIME) - Publish-subscribe and message passing middleware
- Externally driven (e.g., a workflow management
system)
4Distribution Semantics Matters a Lot
- How are the different computers inter-connected?
- Does all traffic move on a common data bus?
- Or, does traffic move across (hierarchical)
networks? - Or, does traffic move point-to-point between
hosts? - Are there spatial and/or temporal factors?
- Does hosts physical location/movement matter?
- Is delay noticeable, are bandwidth limits
relevant? - Are connections always on or can they be
intermittent? - Does the inter-connection topology change?
- Is the inter-connection topology entirely
dynamic? - Programs can interact in many other ways as well
- Coordination tuple spaces (JavaSpaces, Linda,
LIME) - Publish-subscribe and message passing middleware
- Externally driven (e.g., a workflow management
system)
5Distribution Semantics Examples (1/3)
- Wired (hierarchical) internet
- Can reach any host from any other host
- Hosts are always on and available ( failure,
downtime) - Much of the WWW depends on this notion (example?)
J
A
I
B
H
C
G
D
F
E
6Distribution Semantics Examples (2/3)
- Nomadic (hierarchical) internet
- Some hosts are mobile, connect to nearest access
point - Hosts may be unavailable, but reconnect
eventually - Host-to-host path topology may change due to this
- Cell phones, wireless laptops exhibit this
behavior
J
A
I
B
H
C
G
D
F
E
7Distribution Semantics Examples (3/3)
- Mobile ad hoc networks (MANETS)
- Mobile hosts connect to each other (w/out access
point) - Hosts may detect dynamic connection,
disconnection - Hosts must exploit communication windows of
opportunity - Enables ad-hoc routing, message mule behaviors
A
B
J
I
C
H
H
G
D
E
F
8Distributed System Example (Wired)
- Real-time avionics middleware
- Layer(s) between the application and the
operating system - Ensures non-critical activities dont interfere
with timing of critical ones - Based on other open-source middleware projects
- ACE C library and TAO object request broker
- Standards-based (CORBA), written in C/Ada
Flight demonstrations BBN, WUSTL, Boeing,
Honeywell
9Distributed System Example (Nomadic/MANET)
- Sliver
- A compact (small footprint) workflow engine for
personal computing devices (e.g., cell phones,
PDAs) - Allows mobile collaboration to assemble and
complete automated work-flows (task graphs) - Standards-based (BPEL, SOAP), written in Java
Developed by Greg Hackmann at WUSTL
10How do Distributed Systems Interact?
- Remote method invocations are one popular style
- Allows method calls to be made between programs
- Middleware uses threads, sockets, etc. to make it
so - CORBA, Java RMI, SOAP, etc. standardize the
details - Other styles (better for nomadic/mobile settings)
- Coordination tuple spaces (JavaSpaces, Linda,
LIME) - Publish-subscribe and message passing middleware
- Externally driven (e.g., a workflow management
system)
11Challenges for (Wired) Distributed Systems
- Distributed systems are inherently complex
- Remote concurrent programs must inter-operate
- Interactions must be assured of liveness and
safety - Also must avoid accidental complexity
- Design for ease of configuration, avoidance of
mistakes - System architectures and design patterns can help
map low level abstractions into appropriate
higher level ones
12How to Abstract Concurrent Event Handling?
Goal process multiple service requests
concurrently using OS level threads
Server
Port27098
Port26545
13Basis Synchronous vs. Reactive Read
Clients
Server
Clients
Server
read()
select()
data
data
read()
HandleSet
HandleSet
14Approach Reactive Serial Event Dispatching
Application
Reactor
read()
select()
Clients
Event Handlers
handle_()
read()
HandleSet
15Interactions among Participants
Synchronous Event Demultiplexer
Concrete Event Handler
Reactor
Main Program
register_handler(handler, event_types)
get_handle()
handle_events()
select()
event
handle_event()
16Distributed Interactions with Reactive Hosts
- Application components implemented as handlers
- Use reactor threads to run input and output
methods - Send requests to other handlers via sockets,
upcalls - Example of a multi-host request/result chain
- h1 to h2, h2 to h3, h3 to h4
handler h1
handler h2
handler h4
handler h3
socket
socket
reactor r3
reactor r1
reactor r2
17WaitOnConnection Strategy
- Handler waits on socket connection for the reply
- Makes a blocking call to sockets recv() method
- Benefits
- No interference from other requests that arrive
while the reply is pending - Drawbacks
- One less thread in the Reactor for new requests
- Could allow deadlocks when upcalls are nested
18WaitOnReactor Strategy
- Handler returns control to reactor until reply
comes back - Reactor can keep processing other requests while
replies are pending - Benefits
- Thread available, no deadlock
- Thread stays fully occupied
- Drawbacks
- Interleaving of request reply processing
- Interference from other requests issued while
reply is pending
19Blocking with WaitOnReactor
- Wait-on-Reactor strategy could cause interleaved
request/reply processing - Blocking factor could be large or even unbounded
- Based on the upcall duration
- And sequence of other intervening upcalls
- Blocking factors may affect real-time properties
of other end-systems - Call-chains can have a
cascading blocking effect
f2
f5
f3
f5 reply queued
Blocking factor for f2
f3 completes
f5 reply processed
f2 completes
20Why not a Stackless WaitOnReactor Variant?
- What if we didnt stack processing of results?
- But instead allowed them to handled
asynchronously as they are ready - Stackless Python takes this approach
- Thanks to Caleb Hines who pointed this out in CSE
532 - Benefits
- No interference from other requests that arrive
when reply is pending - No risk of deadlock as thread still returns to
reactor - Drawbacks
- Significant increase in implementation complexity
- Time and space overhead to match requests to
results (other patterns we cover in CSE 532 could
help, though)
21Could WaitOnConnection Be Used?
- Main limitation is its potential for deadlock
- And, it offers low overhead, ease of
implementation/use - Could we make a system deadlock-free
- if we knew its call-graph and were careful
about how threads were allowed to proceed? - Notice that a lot of distributed systems research
has this kind of flavor - Given one approach (of probably several
alternatives) - Can we solve problem X that limits its
applicability and/or utility? - Can we apply that solution efficiently in
practice? - Does the solution raise other problems that need
to be solved?
22Deadlock Problem in Terms of a Call Graph
- Call graph often can be obtained
- Each reactor is assigned a color
- Deadlock can exist
- If there exists gt Kc segments of color C
- Where Kc is the number of threads in node with
color C - E.g., f3-f2-f4-f5-f2 needs at least 2 1
f1
f2
f3
f4
f5
From V. Subramonian and C. Gill, A Generative
Programming Framework for Adaptive Middleware,
2004
23Simulation Showing Thread Exhaustion
Clients send requests 3 Client3
TRACE_SAP_Buffer_Write(13,10) 4
Unidir_IPC_13_14 TRACE_SAP_Buffer_Transfer(13,14
,10) 5 Client2 TRACE_SAP_Buffer_Write(7,10)
6 Unidir_IPC_7_8 TRACE_SAP_Buffer_Transfer(7
,8,10) 7 Client1 TRACE_SAP_Buffer_Write(1,10
) 8 Unidir_IPC_1_2 TRACE_SAP_Buffer_Transfer
(1,2,10) Reactor1 makes upcalls to event
handlers 10 Reactor1_TPRHE1
---handle_input(2,1)---gt Flow1_EH1 12
Reactor1_TPRHE2 ---handle_input(8,2)---gt
Flow2_EH1 14 Reactor1_TPRHE3
---handle_input(14,3)---gt Flow3_EH1 Flow1
proceeds 15 Time advanced by 25 units. Global
time is 28 16 Flow1_EH1 TRACE_SAP_Buffer_Wri
te(3,10) 17 Unidir_IPC_3_4
TRACE_SAP_Buffer_Transfer(3,4,10) 19
Reactor2_TPRHE4 ---handle_input(4,4)---gt
Flow1_EH2 20 Time advanced by 25 units.
Global time is 53 21 Flow1_EH2
TRACE_SAP_Buffer_Write(5,10) 22
Unidir_IPC_5_6 TRACE_SAP_Buffer_Transfer(5,6,10)
Flow2 proceeds 23 Time advanced by 25 units.
Global time is 78 24 Flow2_EH1
TRACE_SAP_Buffer_Write(9,10) 25
Unidir_IPC_9_10 TRACE_SAP_Buffer_Transfer(9,10,1
0) 27 Reactor2_TPRHE5 ---handle_input(10,5)---
gt Flow2_EH2 28 Time advanced by 25 units.
Global time is 103 29 Flow2_EH2
TRACE_SAP_Buffer_Write(11,10) 30
Unidir_IPC_11_12 TRACE_SAP_Buffer_Transfer(11,12
,10) Flow3 proceeds 31 Time advanced by 25
units. Global time is 128 32 Flow3_EH1
TRACE_SAP_Buffer_Write(15,10) 33
Unidir_IPC_15_16 TRACE_SAP_Buffer_Transfer(15,16
,10) 35 Reactor2_TPRHE6 ---handle_input(16,6)
---gt Flow3_EH2 36 Time advanced by 25 units.
Global time is 153 37 Flow3_EH2
TRACE_SAP_Buffer_Write(17,10) 38
Unidir_IPC_17_18 TRACE_SAP_Buffer_Transfer(17,18
,10) 39 Time advanced by 851 units. Global
time is 1004
Server1
Server2
EH11
EH21
Client1
Flow1
EH31
EH12
EH22
Client2
Flow2
EH32
EH13
EH23
Client3
Flow3
EH33
Reactor1
Reactor2
Formally, increasing number of reactor threads
may not prevent deadlock
24Solution New Deadlock Avoidance Protocols
- Papers at FORTE 2005 through EMSOFT 2006
- http//www.cse.wustl.edu/cdgill/PDF/forte05.pdf
- http//www.cse.wustl.edu/cdgill/PDF/emsoft06_live
ness.pdf - César Sánchez PhD dissertation at Stanford
- Collaboration with Henny Sipma and Zohar Manna
- Paul Oberlin MS project here at WUSTL
- Avoid interactions leading to deadlock
- a liveness property
- Like synchronization, achived via scheduling
- Upcalls are delayed until enough threads are
ready - But, introduces small blocking delays
- a timing property
- In real-time systems, also a safety property
25Deadlock Avoidance Protocol Overview
- Regulates upcalls based on of available reactor
threads and call graphs thread height - Does not allow exhaustion
- BASIC-P protocol implemented in the ACE Thread
Pool Reactor - Using handle suspension and resumption
- Backward compatible, minimal overhead
Server1
Server2
EH11
EH21
Client1
Flow1
EH31
EH12
EH22
Client2
Flow2
EH32
EH23
EH13
Client3
Flow3
EH33
Reactor1
Reactor2
26Timing Traces DA Protocol at Work
EH21
EH11
R1
R2
EH31
Flow1
EH22
EH12
R1
R2
EH32
Flow2
Timing traces from model/execution show DA
protocol regulating the flows to use available
resources without deadlock
EH23
EH13
R1
R2
EH33
Flow3
27DA Blocking Delay (Simulated vs. Actual)
Actual Execution
Model Execution
Blocking delay for Client2
Blocking delay for Client3
28Overhead of ACE TP reactor with DA
Negligible overhead with no DA protocol
Overhead increases with number of event handlers
because of their suspension and resumption on
protocol entry and exit
29Where Can We Go From Here?
- Distributed computing is ubiquitous
- in planes, trains, and automobiles
- in medical devices and equipment
- in more and more places each day
- Distributed systems offer many research
opportunities - Discover them from specific problems
- May allow advances even in well worked areas
(e.g., deadlock avoidance) - What new systems can we build by spanning
different platforms? - Ill leave that as an open question for you to
consider (and ultimately, to answer)