Title: Crash Detection
1Middleware for Active Reduction Operations in
Distributed Systems
By Nitin Bahadur Gokul Nadathur Department of
Computer Sciences University of Wisconsin-Madison
Spring 2000
2Talk Outline
- Motivation and Goals
- General Architecture of the middleware
- Components of the middleware
- Providing reliability - handling of node failures
- Applications developed using the middleware
- Performance
- Conclusions and possible extensions
3Motivation and Goals
- A middleware for an application with Master -
Worker paradigm - Scalable framework for communication and
computing client response (Reduction) - Unicast does not scale - so use multicast
- Introducing reduction operations dynamically in
clients - A general framework for communication among
clients
4The Big Picture...
Master App
ARTL
Client App
Client App
ARTL
ARTL
Client App
ARTL
5ART - Library Architecture
Application specific callbacks
Application
Application API
Reduction functions
Framework for processing messages
ARTL specific message
Event Handler
Outgoing message
ARTL Communication Layer
Incoming Packet
Network
ARTL messages 1. Query from master 2. Response
from downstream nodes
6ART - Library Architecture
Application specific callbacks
Application
Application API
Reduction functions
Framework for processing messages
ARTL specific message
Event Handler
Outgoing message
ARTL Communication Layer
Incoming Packet
Network
ARTL messages 1. Query from master 2. Response
from downstream nodes
7Communication Subsystem
- Connection Setup
- Connect nodes as a Binomial tree
- Send and receive ARTL and application messages
- Detect node failure and act accordingly
- Integrate restarted node in current tree structure
8Why use Binomial Tree
Client App
Client App
Master App
3
2
1
2
Master App
Client App
Client App
1
2
Client App
Client App
Binomial Tree Query Propagation time 2
Unicast Mechanism Query Propagation time 3
9Reduction
Reduction at 5 and 3
Example Reduction operations Min(), Max()
Responses
10Tree connection setup
11Tree Setup - Phase I
TCP connection setup
12Tree Setup - Phase II
TCP connection setup
13Tree Setup - Phase III
TCP connection setup
14Inter node communication
Data
ARTL Header
- Unicast and multicast data transmission
- ARTL receives application messages for which no
receive has been posted - these are sent to a callback function registered
by application - ARTL receives data on behalf of application when
application explicitly posts a receive
15ART - Library Architecture
Application specific callbacks
Application
Application API
Reduction functions
Framework for processing messages
ARTL Encapsulated message
Event Handler
Outgoing message
ARTL Communication Layer
Incoming Packet
Network
ARTL messages 1. Query from master 2. Response
from downstream nodes
16Reduction Functions
- Implemented as Shared objects
- Sent to client during Setup phase
- Each reduction function is associated with a
particular response it reduces
17Event Handler
Network
Thread Pool
Event Handler
Application
18 Multithreaded Architecture
- No prior Knowledge about behavior of reduction
function - Exploit concurrency - multiple processor per node
- Static Pool of threads - Creation and destruction
of threads is bad (Firefly RPC)
19Crash Reconfiguration
20Crash Reconfiguration
Crash Reconfiguration at depth 1
21Crash Reconfiguration
Crash Reconfiguration at depth 2
22Crash Reconfiguration
Crash Reconfiguration at depth 1
23Crash Reconfiguration
Crash Reconfiguration at depth 1
24Crash Detection
- Break in TCP connection with parent/child
- a signal is received at the other end of
connection - Use of periodic refresh messages to inform parent
that child is up and running - useful in WAN environments
25Crash Handling
- Parent of node down informs master
- All nodes are informed of a node failure
- Master recomputes tree
- If leaf node down, then no problem
- If intermediate node down, some reconfiguration
is required
26Node Restart
- Restarted node contacts master to tell it about
restart - Master sends it current state of network and the
shared object(s) - All nodes are informed of a node restart
- Master recomputes tree and informs the new nodes
parent about its new child - Parent and child establish connections
27SysMon - A System monitor
Monitors the load average from /procdisplays
Min, Max and average loads Per-node load is
also displayedARTL Reduction operations Min,
Max and Average
28SysMon - A System monitor
Node failures are detected and SysMon pops up an
alert
29File Transfer Application
- Transfers a file from master to all clients
- File can be executed at clients (if required)
- execution can be instantaneous on receiving file
- execution can be delayed until all nodes have
received the file
30File Transfer Performance
31Total Startup Time vs Number of Nodes
Client processes started using ssh on different
machines
32Conclusions and Extensions
- A middleware for dynamic operations
- Support for crash detection, recovery and dynamic
processes - Demonstrated near optimal speedup using real
applications
- Making response function dynamic - active
services - Differential scheduling in thread scheduler for
QoS - Making dynamic code secure