Failure Handling in a modal Language - PowerPoint PPT Presentation

About This Presentation
Title:

Failure Handling in a modal Language

Description:

ML5 Illustration. PC. Host. Location of thread. Migration of thread ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 73
Provided by: jonathan55
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Failure Handling in a modal Language


1
Failure Handling in a modal Language
  • Nels Eric Beckman
  • Research Talk
  • Institute for Software Research
  • October 30, 2006

2
Claims Made in this Talk
  • ML5 is an elegant language for programming
    distributed systems.
  • In the face of node failure, the meaning of ML5
    programs becomes unclear.
  • We propose extensions to ML5 that makes their
    meaning clear.
  • (In reality, this research is a work in progress.)

3
ML5
  • A Programming Language for Distributed Systems
  • Based on a Modal Logic
  • i.e. A Logic With an Embedded Notion of Place
  • Tom Murphys Thesis Work
  • Targeted for Grid Programming

4
ML5, Briefly...
  • Allows Hosts to Send Thunks to One Another for
    Execution
  • In practice, code can be more cleanly decomposed.
  • Has An Advanced Type System
  • Location-specific resources can be typed as so.

5
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
6
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
7
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
8
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
rpc b
return x
Host Active thread Blocked thread Message
9
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
10
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
11
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
return x
Host Active thread Blocked thread Message
12
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
ret x
return x
Host Active thread Blocked thread Message
13
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
ret x
return x
Host Active thread Blocked thread Message
14
RPC-Style Distributed Programming
fun a
fun b
rpc(b,19.x.x.x) r
ret x
return x
Host Active thread Blocked thread Message
15
ML5 Illustration
Host Location of thread Migration of thread
16
ML5 Illustration
Host Location of thread Migration of thread
17
ML5 Illustration
Host Location of thread Migration of thread
18
ML5 Illustration
Host Location of thread Migration of thread
19
ML5 Illustration
Host Location of thread Migration of thread
20
ML5 Illustration
Host Location of thread Migration of thread
21
ML5 Illustration
Host Location of thread Migration of thread
22
ML5 Illustration
Host Location of thread Migration of thread
23
Example
  • Remotely Finding Lists Sum (RPC)
  • Server Code

class ListServ ListltIntegergt myList new
... ListltIntegergt getList() return myList

24
Example
  • Remotely Finding Lists Sum (RPC)
  • Client Code

class ListClient ListServerStub myServ new
... public void foo() ListltIntegergt list
myServ.getList() for(Integer item list)
count item.intValue() if( count gt
40 ) ...
25
Example
  • Remotely Finding Lists Sum (RPC)
  • To Fix Should We
  • Add a new server operation that returns true if a
    lists sum is greater than 40?
  • Weird if operation is only used once.
  • We wouldnt structure application this way in a
    centralized setting.
  • Bite the performance bullet and send the whole
    list?

26
Example
  • Remotely Finding Lists Sum (ML5)
  • Before

fun foo remote_host remote_list_ref let fun
sum a_list foldl op 0 a_list in if sum
( getremote_host( !remote_list_ref ) ) gt
40 then true else false
27
Example
  • Remotely Finding Lists Sum (ML5)
  • After

fun foo remote_host remote_list_ref let fun
sum a_list foldl op 0 a_list in getremot
e_host( if sum ( !remote_list_ref ) gt
40 then true else false )
28
Types
  • ML5 Type System Embeds a Notion of Place
  • Some values can be used at any place.
  • e.g. Primitive data types, structures
  • Some values can only be used at the location
    where they make sense.
  • e.g. File descriptors, reference cells, printers

29
Just a Few Types
  • t_at_w The type t is well-typed on host w.

30
Just a Few Types
  • getw,ae Evaluate e on host w and return
    the result to the current host. Change es type
    from _at_w to _at_w.
  • Example
  • fun foo (x int ref _at_w,
  • a w addr _at_w)
  • getw,a( !x !x )

31
Just a Few Types
  • getw,ae Evaluate e on host w and return
    the result to the current host. Change es type
    from _at_w to _at_w.
  • Example
  • fun foo (x int ref _at_w,
  • a w addr _at_w)
  • getw,a( !x !x )

Typed int_at_w
32
Just a Few Types
  • getw,ae Evaluate e on host w and return
    the result to the current host. Change es type
    from _at_w to _at_w.
  • Example
  • fun foo (x int ref _at_w,
  • a w addr _at_w)
  • getw,a( !x !x )

Typed int_at_w
33
Just a Few Types
  • ?t Suspended code that can be evaluated
    anywhere. Produces a value of type t.
  • Example
  • (let fun sum il foldl op 0 il
  • in
  • box (sum 1,2,3,4,5)
  • end) ?int _at_w

34
Just a Few Types
  • ?t A value of type t that exists at some other
    location.
  • Example
  • here (ref 5)?(ref int) _at_w

35
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
) !int_ref_at_w_2 getw_3, a_3( ( at
host 3 ) !int_ref_at_w_3))
36
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
) !int_ref_at_w_2 getw_3, a_3( ( at
host 3 ) !int_ref_at_w_3))
Host 2 dies!
37
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
) !int_ref_at_w_2 getw_3, a_3( ( at
host 3 ) !int_ref_at_w_3))
Throw an exception?
Host 2 dies!
38
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
) !int_ref_at_w_2 getw_3, a_3( ( at
host 3 ) !int_ref_at_w_3))
Continue on from Host 3?
Throw an exception?
Host 2 dies!
39
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
) !int_ref_at_w_2 getw_3, a_3( ( at
host 3 ) !int_ref_at_w_3) or_if_i_cant_return
(...)))
Continue on from Host 3?
Throw an exception?
Host 2 dies!
40
But What About Host Failure?
  • What happens here?

( at host 1 ) getw_2, a_2( ( at host 2
WHICH DOESNT EXIST!) !int_ref_at_w_2
getw_3, a_3( ( at host 3
) !int_ref_at_w_3) or_if_i_cant_return (...)))
Continue on from Host 3?
Throw an exception?
Host 2 dies!
41
What We Want (Intuitively)
  • callcc x gt
  • ( at host 1 )
  • getw_2, a_2(
  • ( at host 2 )
  • !int_ref_at_h_2
  • getw_3, a_3(
  • ( at host 3 )
  • !int_ref_at_h_3
  • or_if_i_cant_return
  • (throw (raise NetFail) to x)))

42
What We Want (Intuitively)
  • callcc x gt
  • ( at host 1 )
  • getw_2, a_2(
  • ( at host 2 )
  • !int_ref_at_h_2
  • getw_3, a_3(
  • ( at host 3 )
  • !int_ref_at_h_3
  • or_if_i_cant_return
  • (throw (raise NetFail) to x)))

Dont actually throw something through the
network.
43
What We Want (Intuitively)
  • callcc x gt
  • ( at host 1 )
  • getw_2, a_2(
  • ( at host 2 )
  • !int_ref_at_h_2
  • getw_3, a_3(
  • ( at host 3 )
  • !int_ref_at_h_3
  • or_if_i_cant_return
  • (throw (raise NetFail) to x)))

Have host one detect the failure.
Dont actually throw something through the
network.
44
Isnt This Just a Timeout Exception?
  • A Good Question
  • Why not just have the get operation throw a
    timeout exception, like in Java?
  • e.g.

getw_2, a_2 ( !int_on_w2 ) handle TimeOut gt
( do something )
45
Answers
  1. This is actually a little smarter than just
    timeout.
  2. The Implicit Spawn Problem

46
Answers
  1. This is actually a little smarter than just
    timeout.
  2. The Implicit Spawn Problem

getw_2, a_2 ( ( extremely complicated op ) )
handle TimeOut gt ( do something )
47
Answers
  1. This is actually a little smarter than just
    timeout.
  2. The Implicit Spawn Problem

T2
getw_2, a_2 ( ( extremely complicated op ) )
handle TimeOut gt ( do something )
T1
48
What We Need
  • Share the Fact that Host 1 Has Given Up
  • Kill the Thread ASAP
  • Make That Threads Actions Irrelevant
  • Each host gets a chance to undo potential
    effects.
  • All with Best Effort

49
One More Wrinkle
Grab continuation
Catom 1 Catom 2
50
One More Wrinkle
Assign Catom1 to myLeader
Catom 1 Catom 2
51
One More Wrinkle
Catom 1 Catom 2
52
The Design, In Short
  • try
  • e_1
  • continuing
  • e_2
  • end

53
The Design, In Short
  • try
  • e_1
  • continuing
  • e_2
  • end
  1. Execute e_1

54
The Design, In Short
  • try
  • e_1
  • continuing
  • e_2
  • end
  1. Execute e_1
  2. In the event of node failure... the entire
    expression will throw an exception on this host.

55
The Design, In Short
  • try
  • e_1
  • continuing
  • e_2
  • end
  1. Execute e_1
  2. In the event of node failure... the entire
    expression will throw an exception on this host.
  3. On the other hosts, e_2 will be executed, and its
    value discarded.

56
The Design, In Short
  • ( host 1)
  • try
  • ( set all of my neighbors
  • myLeader to host 1 )
  • continuing
  • if !myLeader host_1
  • then myLeader NONE
  • else ()
  • end

57
ML5-C Error Continuations
try continuing l end
Host Visited Host Location of thread Migration of
thread
58
ML5-C Error Continuations
Store Cont(stack)
try continuing l end
Host Visited Host Location of thread Migration of
thread
59
ML5-C Error Continuations
Store Cont(?l)
try continuing l end
Host Visited Host Location of thread Migration of
thread
60
ML5-C Error Continuations
try continuing l end
Host Visited Host Location of thread Migration of
thread
61
ML5-C Error Continuations
try continuing l end
Store Cont(?l)
Host Visited Host Location of thread Migration of
thread
62
ML5-C Error Continuations
try continuing l end
Host Visited Host Location of thread Migration of
thread
63
ML5-C Error Continuations
try continuing l end
Host Visited Host Location of thread Migration of
thread
64
ML5-C Error Continuations
Error!
try continuing l end
Error!
Host Visited Host Location of thread Migration of
thread
65
ML5-C Error Continuations
Restore Cont.
try continuing l end
Restore Cont.
l
Host Visited Host Location of thread Migration of
thread
66
ML5-C Error Continuations
raise Fail) handle...
l
Host Visited Host Location of thread Migration of
thread
67
ML5-C Error Continuations
raise Fail) handle...
Host Visited Host Location of thread Migration of
thread
68
Interesting Note
  • In Failure Case, We Have to Reason About Client
    and Server.
  • (The avoidance of this was one of the touted
    benefits of ML5!)

69
Future Work
  • This Work is Not Yet Finished
  • More Restrictive Modal Basis
  • Only neighbor catoms are accessible
  • This would be a lower level language in some
    sense.

70
Thanks!
  • Additional Questions?

71
Failure Handling is More Natural
  • In Claytronics, Failure is Possible at Any
    Moment.
  • Intuitively, it would be nice to say

try // a complex, multi host operation catch
(Failure v) // take an alternate // course of
action.
72
So You Want to See the Typing Rules...
  • Note These rules represent just a snapshot of
    the work.
Write a Comment
User Comments (0)
About PowerShow.com