Extending MRNet with Scalable Failure Recovery Mechanisms - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Extending MRNet with Scalable Failure Recovery Mechanisms

Description:

Extending MRNet with Scalable Failure Recovery Mechanisms. Dorian Arnold. Paradyn Project ... Conflicting/out-of-order recovery reports? Resolve using ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 25
Provided by: DA958
Category:

less

Transcript and Presenter's Notes

Title: Extending MRNet with Scalable Failure Recovery Mechanisms


1
Extending MRNet with Scalable Failure Recovery
Mechanisms
Dorian Arnold Paradyn Project
Paradyn Week April 29 30, 2008 Madison, WI
2
Todays Talk
  • Brief TBON/MRNet overview
  • State compensation recovery model
  • MRNet fault-tolerant extensions
  • Evaluation

3
Tree-based Overlay Networks for Scalable
Performance
FE
FE
Front-end
  • Scalable data multicastand aggregation
  • Flexible topologies
  • User-defined filters
  • Trade-off extra processing nodes for performance

BE
BE
BE
BE
BE
BE
BE
BE
Back-ends
BE
BE
BE
BE
BE
BE
BE
BE
4
Tree-based Overlay Networks for Scalable
Performance
FE
FE
  • Integer Maximum Computation
  • Practically infinite input stream
  • Stateful filters for incremental updates

9
9
4
9
4
9
4
9
4
9
4
2
8
9
4
2
8
9
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
2
4
1
2
6
8
9
2
5
To Infinity and Beyond
  • TBONs provide very scalable performance
  • Several MRNet-based examples today
  • Increasing scales reliability attention
  • Today 105
  • Tomorrow 106! 107? 108?
  • Need recovery mechanisms that do not mitigate
    scalability or performance

6
Current Reliability Approaches
  • Fail-over
  • Replace failed primary w/ backup replica
  • Quick failure recovery
  • High synchronization/utilization overhead
  • Checkpointing (coordinated)
  • Simplicity
  • Coordination overhead
  • Petascale Checkpointing Elnozahy, Plank 04
  • Need resources dedicated to fault-tolerance
  • May overload network/storage resources

7
State Compensation
  • Conceptual and empirical frameworks
  • Filter properties and weak consistency
  • Relax recovery model constraints
  • Limit recovery participants
  • Avoid coordination protocols
  • Inherent redundancy
  • No explicit replication
  • Surviving state compensates for lost state
  • Rapid recovery
  • Minimal application perturbation

8
Failure Model
  • Fail-stop (detectable crashes)
  • Any TBON process failure
  • Multiple, simultaneous failures
  • Application process failures
  • Restart or sequential checkpointing

9
TBON Data Aggregation
input
filter state
output
updated filter state
f (in, fsn ) ? out, fsn1
Filter state encapsulates input history (merges
new input) in fsn ? fsn1 Output is
incremental update fsn1 fsn ? out
10
Complicit Reductions
  • Idempotent
  • Upper/lower bound computations
  • Set unions
  • Graph merging (E.g. STAT, Paradyn)
  • Equivalence class computations
  • Data classification
  • Anomaly detection,
  • Non-idempotent
  • summation, average,
  • counting-based operations ? overvaluation

11
Inherent Redundancy
Integer Maximum Computation
FE
FE
9
Joining childrens state forms parents state
4
9
4
9
4
2
8
9
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
2
4
1
2
6
8
9
2
12
State Composition
If CPj fails, all state associated withCPj is
lost
output
TBON Output Theorem Output depends only on
channel states and root filter state
CPi
channel state
Compose state from below points of failure for
compensation.
filter state
All-encompassing Leaf State Theorem Leaf states
subsume sub-trees state
CPj
channel state
CPk
CPl
Therefore, leaf states can replacelost channel
state without changingcomputations semantics
13
MRNet Extension IEvent Detection Service
  • Detecting Component Failures
  • Premature connection termination
  • Process failures detected immediately
  • Node failures detected via keep alive
  • Disseminating Failure Information
  • Failures detected by multiple peers
  • Peers use TBON for rapid propagation

FE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
BE
14
MRNet Extension IEvent Detection Service
  • New child connections
  • Dynamic topologies at startup/recovery
  • Topologies can expand/change arbitrarily
  • How do new/deleted processes impact existing
    streams?
  • How do we notify application processes about
    topology changes?

15
MRNet Extension IITree Reconstruction
(Algorithm)
  • Orphans independently rank each adopter
  • Fan-out (overloading/imbalances decreased
    bandwidth)
  • Depth (more hops increased latencies)
  • Proximity (virtual topology physical topology)

16
MRNet Extension IITree Reconstruction
(Algorithm)
  • Weighted random sampling to mitigate overloading
    same parent

parent weight
sort key
random float
17
MRNet Extension IITree Reconstruction (Update
Propagation)
  • Failure report (failed rank)
  • Recovery report (child rank, parent rank)
  • Consistency Issues
  • Stale/missing failure report?
  • Retry on connect failure
  • Stale/missing recovery report?
  • Reconstruction algorithm inputs stale topology
  • Multiple identical reports?
  • Failure/recovery reports are idempotent
  • Conflicting/out-of-order recovery reports?
  • Resolve using incarnation (version) number

18
MRNet Extension IIIState Composition
if child fails remove failed child
resume filtering from non-failed children
endif if parent fails compute new parent
list while failed to connect to list front
remove list front send filter state
to parent endif
if child fails remove failed child
resume filtering from non-failed children
endif if parent fails compute new parent
list while failed to connect to list front
remove list front send filter state
to parent endif
19
Evaluation
  • Does compensation really work?
  • How responsive is failure recovery?
  • How do failures affect application performance?

20
Evaluation Does it Work?
  • Integer equivalence over input stream
  • Same input stream with/without failures
  • Are all input elements produced?
  • Are any erroneous output produced?

INSPECTION PASSED!
YES!
NO!
21
Evaluation Is it Responsive?
  • Recovery latency is a function of fan-out!
  • Only orphans actively participate in recovery
  • Adopting parents passively participate
  • How does fan-out impact recovery latency?

22
Evaluation Is it Responsive?
1283 2,097,152 processes
INSPECTION PASSED!
  • LLNLs Thunder
  • 1024x4 processors
  • 1.4 GHz Itanium2
  • Quadrics QsNetII

23
Current Work
  • Evaluate application perturbation
  • Evaluate tree reconstruction algorithms
  • TBON mechanisms for event dissemination
  • Other compensation mechanisms

INSPECTION PENDING!
24
References
  • Arnold and Miller, A Scalable Recovery Model for
    Tree-based Overlay Networks, UW Computer
    Sciences Technical Report, TR-1626, January 2008.
  • Roth, Arnold, and Miller, MRNet A
    Software-based Multicast/Reduction Network for
    Scalable Tools, SC 2003, Phoenix, AZ, November
    2003.

http//www.paradyn.org/mrnet
Write a Comment
User Comments (0)
About PowerShow.com