Towards a Portable Fault Tolerant Component Framework for MPI

1 / 21
About This Presentation
Title:

Towards a Portable Fault Tolerant Component Framework for MPI

Description:

Cuckoo MPI System : Fault/Recovery-model-aware component-based fault tolerant MPI ... Cuckoo FTMPI. User only selects the components. The application program ... –

Number of Views:23
Avg rating:3.0/5.0
Slides: 22
Provided by: hideyukij
Category:

less

Transcript and Presenter's Notes

Title: Towards a Portable Fault Tolerant Component Framework for MPI


1
Towards a Portable Fault Tolerant Component
Framework for MPI
  • Hideyuki Jitsumoto and Satoshi Matsuoka
  • ltjitsumo0_at_is.titech.ac.jpgt
  • Dept. of Mathematical and Computing Sciences,
    Tokyo Institute of Technology

2
Best MPI we want
  • Ideal MPI is
  • Dont stop by fault
  • Scientific calculations take a long time
  • Cluster and Grid have low reliability
  • Can be used easily
  • Can be used everywhere
  • There are various OS/HW (Linux, Windows/ GbE,
    Myrinet, InfiniBand )

3
Related work (1/3)
  • LAM/MPI Burns et al. 94 / OpenMPI Gabriel et
    al. 03
  • Modularized
  • Replaceable HW dependence code
  • But it doesnt consider much about fault/recovery
    model (mainly method of communication)
  • Not good recovery protocol
  • All of failure recover by checkpointing/restart
  • Non automatic restart (They dont have a daemon
    nor monitor)
  • FT-MPI Fagg et al. 99
  • Modify user code for FT
  • User can implement their original recovery model
  • But user get much coding cost

4
Related work(2/3) conclusion
  • The current fault tolerant MPI impl.
  • Cant be use easily
  • Users must implement FT method adapted to various
    exec. environment
  • Cant be use everywhere
  • Only a single recovery method

5
Related work(3/3) comparison
  • Interoperable FT technique Dont have HW
    dependence code about FT or Can replace them
  • Transparent MPI application run unchanged
  • Various Fault/Recovery Model dealing with
    various faults on each user-environment
  • 1 MPICH-V can be adapt to various environments
    using V1, V2, V/CL and V3 properly

we need Extensible/Facility of LAM/MPI and
Flexibility of FT-MPI.
6
Our Goal
  • Cuckoo MPI System Fault/Recovery-model-aware
    component-based fault tolerant MPI
  • Portable
  • components for adapting to different underlying
    computing environment
  • Flexible
  • components for handling different fault and
    recovery models
  • Transparent
  • transparent to user code components to handle
    different execution phases and appropriate
    recovery

7
Recovery Protocol (1/2) Protocol
  • Cuckoo MPI used following RP
  • IGNORE ignore a fault
  • Checkpointing/Restart, Migration
  • RESTART restart at same node
  • MIGRATION migration at different node
  • Process Replication
  • TRANSFER change replication process to primary
    one

8
Recovery Protocol (2/2)
  • Recovery-cost and recoverable fault are different
    in each protocol

Recovery Protocol must be selected to reduce
overhead according to the kind of fault (Fault
model)
9
Fault Model on MPI process
  • Physical Fault
  • Fault occurred on HW
  • Recoverable by MIGRATION/TRANSFER
  • Network Fault
  • Last redundant path was cut, or performance
    degradation was large
  • Recoverable by IGNORE/MIGRATION/TRANSFER
  • Process Fault
  • Process ABEND
  • Recoverable by RESTART/TRANSFER
  • Etc

Cuckoo MPI has modules select appropriate RP for
Fault
10
Components
  • components for supporting parallel FT algorithms

Parallel FT Protocol
FT Daemon
Monitoring Tools Interface
RP
Fault Detector
Special Network
Replicator
Checkpointer
Process FD
Restart
Physical FD
Ignore
Network FD
Migration
  • components foradapting to different HW
  • components for definition of Fault Model
  • components torecover a process

11
What flexibility does the component give?
  • The method of dealing effectively with frequency
    of communications, band-width, number of
    processes, and so on.

Parallel FT Protocol
FT Daemon
Monitoring Tools Interface
RP
Fault Detector
  • HW independency
  • Fault model to be able to deal with the system
  • The method of recovery(e.g., the way of
    selecting migration node )

12
Recovery Model (1/3)
Monitor
2. Acquire Nodes Status
FT Daemon
3.Fault Detection
4.Select appropriate Recovery Protocol
Process
1. Invoke checking function periodically
RESTART
MIGRATION
TRANSFER
IGNORE
5. Recovery
13
Recovery Model (2/3)
Process goes ABEND Node is alive
If (Process ! alive, Node alive) select
RESTART protocol If fault occurred many time
(e.g., twice), select MIGRATION protocol
Even if process fault occurred, node is
alive. Then, I want to use RESTART. But, if
process fault occurred many time Possibly, it
may be physical fault Then, I want to use
MIGRATION
Monitor
Select Migration (fault occurred twice)
FT Daemon
Select Restart
Process
RESTART
MIGRATION
TRANSFER
IGNORE
Process
14
Recovery Model (3/3)
There are no free node when process migrate.
Monitor
Make cold swaps hot assign processes to that ?
FT Daemon
Suspend MPI process?
Assign processes to busy node ?
Process
RESTART
MIGRATION
TRANSFER
IGNORE
15
Impl. - MPI Process (1/3)
  • About p4mpd, all of MPI communication use
    onlyp4_sendx,p4_recv, p4_message_available
  • wrap these functions (e.g., logging, message
    drain)
  • Process handle messages from mpdman at signal
    handler
  • add function to parse extra-operation (for FT) on
    handler

MPI Process
Application
MPI
ADI
MPICH
p4mpd
CH interface
Cuckoo IF
Cuckoo Component
FT Protocol
Cuckoo
Checkpointer
16
Impl. - MPD mpdman (2/3)
  • MPD/mpdman handle messages from lhs/rhs
  • Add function to parse extra-operation from
  • lhs mpdman to mpdman
  • rhs mpdman to mpdman
  • MPD to mpdman
  • lhs MPD to MPD
  • rhs MPD to MPD
  • mpdman to MPD
  • Add a function that reconstructs the ring of
    mpdman

MPD/mpdman
MPD/mpdman
CH interface
Cuckoo IF
Cuckoo Component
Parallel FT Protocol
Monitoring Tools
Fault Detector
Recovery Protocol
17
Impl. (3/3)
Communication
Cuckoo Interface
p4mpd
FT / Parallel FT Protocol
int p_p4_sendx() if(ck_rbif-gtsend !
NULL) return ck_rbif-gtsend()
else return p4_sendx()
int init() ck_rbif-gtsend
dlsym(RB_send)
int RB_send() ( FT Code ex. Logging)
res ck_ch_send() ( FT Code )
return res
int ck_ch_send() return p4_sendx()
CH Interface
Extra operation parsing
void Mpd_man_msg_handler() if ( strncmp(
buf, cmd. else if( (strcmp(buf,
cmdaddin_) 0)
(ck_rbif-gteman ! NULL))
ck_rbif-gteman ()
int RB_eman() cmd ck_ch_getval(cmd)
if(strcmp(cmd, rdy)0) .
int init() ck_rbif-gteman
dlsym(RB_eman)
p4mpd
FT / Parallel FT Protocol
int ck_ch_getval() return
mpd_getval_r()
18
Evaluation(1/2) Facility
Application
..
  • FT-MPI
  • User must implement fault-tolerance
  • The application program is modified
  • Implements fault-tolerance

User and Cluster Administrator
FT Application
19
Evaluation(2/2) Facility
Application
Implement
Parallel FT Protocol
  • Cuckoo FTMPI
  • User only selects the components
  • The application program isnt modified

Parallel FT Protocol
Parallel FT Protocol
Fault Detector
Fault Detector
Fault Detector
Recovery Protocol
Recovery Protocol
Component publisher (IT Specialist)
Recovery Protocol
Easy !
  • Selects the components

Application
FT Components
FT Application
User and Cluster Administrator
The load to the user is reduced with the
flexibility kept
20
Evaluation (3/3) - Performance
  • Now implementing.
  • Ill report as soon as I finish implementation
    and evaluation.

21
Future Work
  • Apply to MPICH-2
  • Interaction with some software
  • RI2N (Boku_at_Tsukuba Univ.)
  • Speculative CKPT (Yamagata_at_Tokyo Institute of
    Tech.)
  • Fault Injector (Maruyama_at_Tokyo Institute of
    Tech.)
  • Overview is followinghttp//www.para.tutics.tut.a
    c.jp/megascale/research.html
  • Dynamical FT processing(e.g., Change
    checkpointing cycle)
Write a Comment
User Comments (0)
About PowerShow.com