Diskless Checkpointing - PowerPoint PPT Presentation

About This Presentation
Title:

Diskless Checkpointing

Description:

Forked Diskless Chkpnt: Extra 2I in memory, less CPU activity. Optimizations: ... CPU-Intensive parallel programs. Instances that took 1.5~2 hrs on 16 processors ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 35
Provided by: KL7471
Category:

less

Transcript and Presenter's Notes

Title: Diskless Checkpointing


1
Diskless Checkpointing
  • 15 Nov 2001

2
Motivation
  • Checkpointing on Stable Storage
  • Disk access is a major bottleneck!
  • Incremental Checkpointing
  • Copy-on-write
  • Compression
  • Memory Exclusion
  • Diskless Checkpointing

3
Diskless?
  • Extra memory is available (e.g. NOW)
  • Use memory instead of disk
  • Good
  • Network Bandwidth gt Disk Bandwidth
  • Bad
  • Memory is not stable

4
Bottom-line
  • NOW with (nm) processors
  • The application runs on exactly n procs,
  • and should proceed as long as
  • The number of processors in the system is at
    least n
  • The failures occur within certain constraint

Application Processors (n)
Chkpnt Processors (m)
Available Processors (nm)
5
Overview
  • Coordinated Chkpnt (Sync-and-Stop)
  • To checkpoint,
  • Application Proc Chkpnt the state in memory
  • Chkpnt Proc Encoding the application chkpnts and
    storing the encodings in memory
  • To recover,
  • Non-failed Procs Roll-back
  • Replacement processors are chosen.
  • Replacement Proc Calculate the chkpnts of the
    failed procs using other chkpnts encodings

6
Outline
  • Application Processor Chkpnt
  • Disk-based
  • Diskless
  • Incremental
  • Forked (or copy-on-write)
  • Optimization
  • Encoding the chkpnts
  • Parity (RAID level 5)
  • Mirroring
  • 1-Dimensional Parity
  • 2-Dimensional Parity
  • Reed-Solomon Coding
  • Optimization
  • Result

7
Application Processor Chkpnt
  • Goal
  • The processor should be able to roll back to its
    most recent chkpnt.
  • Need to tolerate failures when chkpnt
  • Make sure that each coordinated chkpnt remains
    valid until the next coordinated chkpnt has been
    completed.

8
Disk-based Chkpnt
  • To chkpnt
  • Save all values in the stack, heap, and registers
    to disk
  • To recover
  • Overwrites the address space with the stored
    checkpoint
  • Space Demands
  • 2M in disk

(M the size of an application processors
address space)
9
Simple Diskless Chkpnt
  • To chkpnt
  • Wait until encoding calculated
  • Overwrite diskless chkpnts in memory
  • To recover
  • Roll-backed from in-memory chkpnts
  • Space Demands
  • Extra M in memory

(M the size of an application processors
address space)
10
Incremental Diskless Chkpnt
  • To chkpnt
  • Initially set all pages R_ONLY
  • On page fault, copy set RW
  • To recover
  • Restore all RW pages
  • Space Demands
  • Extra I in memory

(I the incremental chkpnt size)
11
Forked Diskless Chkpnt
  • To chkpnt
  • Application clones itself
  • To recover
  • Overwrites state with clones
  • Or clone assumes the role of the application
  • Space Demands
  • Extra 2I in memory

(I the incremental chkpnt size)
12
Optimizations
  • Breaking the chkpnt into chunks
  • Efficient use of memory
  • Sending Diffs (Incremental)
  • Bitwise xor of the current copy and chkpnt copy
  • Unmodified pages need not be sent
  • Compressing Diffs
  • Unmodified regions of memory

13
Application Processor Chkpnt (review)
  • Simple Diskless Chkpnt Extra M in memory
  • Incremental Diskless Chkpnt Extra I in memory
  • Forked Diskless Chkpnt Extra 2I in memory,
    less CPU activity
  • Optimizations
  • Chkpnt into chunks, diffs, and compressed diffs

14
Encoding the chkpnts
  • Goal
  • Extra chkpnt processors should store enough
    information that the chkpnts of failed processors
    may be reconstructed.
  • Notation
  • Number of chkpnt processors (m)
  • Number of application processors (n)

15
Parity (RAID level 5, m1)
  • To chkpnt,
  • On failure of ith proc,
  • Can tolerate
  • Only one processor failure
  • Remarks
  • Chkpnt processor is a bottleneck of communication
    and computation

Example n4, m1
Application Processor
Chkpnt Processor
j-th byte of Application processor i
16
Mirroring (mn)
  • To chkpnt,
  • On failure of ith proc,
  • Can tolerate
  • Up to n processor failures
  • Except the failure of both an application
    processor and its checkpoint processor
  • Remarks
  • Fast, no calculation needed

Example nm4
Application Processor
Chkpnt Processor
j-th byte of Application processor i
17
1-Dimensional Parity (1ltmltn)
  • To chkpnt,
  • Application processors are partitioned into m
    groups.
  • ith chkpnt processor calculates the parity of the
    chkpnts in group i
  • On failure of ith proc,
  • Same as in Parity encoding
  • Can tolerate
  • One processor failure per group
  • Remarks
  • More efficient in communication and computation

Example n4, m2
Application Processor
Chkpnt Processor
j-th byte of Application processor i
18
2-Dimensional Parity
  • To chkpnt,
  • Application processors are arranged logically in
    a two-dimensional grid
  • Each chkpnt processor calculates the parity of
    the row or the column
  • On failure of ith proc,
  • Same as in Parity encoding
  • Can tolerate
  • Any two-processor failures
  • Remarks
  • Multicast

Example n4, m4
Application Processor
Chkpnt Processor
j-th byte of Application processor i
19
Reed-Solomon Coding (m)
  • To chkpnt,
  • Vandermonde matrix F, s.t. f(i,j)j(i-1)
  • Use matrix-vector multiplication to calculate
    chkpnt
  • To recover,
  • Use Gaussian Elimination
  • Can tolerate
  • Any m failures
  • Remarks
  • Use Galois Fields to perform arithmetic
  • Computation overhead

20
Optimizations
  • Sending and calculating the encoding in
    RAID level 5-based encodings (e.g. Parity)

(b) FAN-IN log(n) step
(a) DIRECT C1 bottleneck
21
Encoding the Chkpnts (review)
  • Parity (RAID level 5, m1)
  • Only one failure, bottleneck
  • Mirroring (mn)
  • Up to n failures (unless both app and chkpnt
    fail), fast
  • 1-Dimensional Parity
  • One failure per group, more efficient than Parity
  • 2-Dimensional Parity
  • Any two failures, comm overhead w/o multicast
  • Reed-Solomon Coding
  • Any m failures, computation overhead
  • DIRECT vs. FAN-IN

22
Testing Applications (1)
  • CPU-Intensive parallel programs
  • Instances that took 1.52 hrs on 16 processors
  • NBODY N-body interactions among particles in a
    system
  • Particles are partitioned among processors
  • Location field of each particle is updated
  • Expectation
  • Poor with incremental chkpnt
  • Good with diff-based compression
  • MAT FP matrix product of two square matrices
    (Cannons alg.)
  • All three matrices are partitioned in square
    blocks among processors
  • In each step, adds the product and passing the
    input submatrices
  • Expectation
  • Incremental chkpnt
  • Very poor with diff-based compression

23
Testing Applications (2)
  • PSTSWM Nonlinear shallow water equations on a
    rotating sphere
  • Majority pages, but only few bytes per page are
    modified
  • Expectation
  • Poor with incremental chkpnt
  • Good with diff-based compression
  • CELL Parallel cellular automaton simulation
    program
  • Two (sparse) grids of cellular automata
    (current/next)
  • Expectation
  • Poor with incremental chkpnt
  • Good with compression
  • PCG Axb for a large, sparse matrix
  • First, converted to a small, dense format
  • Expectation
  • Incremental chkpnt
  • Very poor with diff-based compression

24
Diskless Checkpointing
  • 20 Nov 2001

25
Disk-based vs. Diskless Chkpnt
  Disk-based Diskless
Where to chkpnt? In stable storage In local memory
How to recover? Restore from stable storage Re-calculate
Remarks Can tolerate whole failure Cannot tolerate whole failure
Low BW to stable storage Memory is much faster
    Encoding (communication) overhead
26
Recalculate the lost chkpnt?
Error Detection Correction in Digital
Communication
Chkpnt Recovery in Diskless Chkpnt
1-bit Parity (m1)
110010111 (right) 110000111
(detectable) 110010110 (detectable) 110000110
(oops)
110010111 (chkpnt) 1100X0111
(tolerable) 11001011X (tolerable) 1100X011X
(intolerable)
Mirroring (mn)
1100101111001011 (right) 1100101111001010
(detectable) 1100101100111100
(detectable) 1100101011001010 (oops)
1100101111001011 (right) 110010111100101X
(tolerable) 11001011XXXXXXXX (tolerable) 1100101
X1100101X (intolerable)
  • Remarks
  • Difference we can easily know that which node is
    wrong in chkpnt system.
  • Some codings can be used to recover from errors
    in Digital Comm, too. (e.g. Reed-Solomon)

27
Performance
  • Criteria
  • Latency time between chkpnt initiated and ready
    for recovery
  • Overhead increase in execution time with chkpnt
  • Applications

App Description Pattern
NBODY N-body interactions PSTSWM Simulation of
the states on 3-D system CELL Parallel cellular
automaton
Majority pages, but only few bytes per page are
modified
Only small parts are updated, but updated in
their entirety
MAT FP Matrix multiplication (Canons) PCG PCG
for sparse matrix
28
Implementation
  • BASE No chkpnt
  • DISK-FORK Disk-based chkpnt w/ fork()
  • SIMP Simple diskless
  • INC Incremental diskless
  • FORK Forked diskless
  • INC-FORK Incremental, forked diskless
  • C-SIMP w/ diff-based compression
  • C-INC
  • C-FORK
  • C-INC-FORK

29
Experiment Framework
  • Network of 24 Sun Sparc5 w/s connected to each
    other by a fast, switched Ethernet 5MB/s
  • Each w/s has
  • 96MB of physical memory
  • 38MB of local disk storage
  • Disks with bandwidth of 1.7MB/s are connected via
    Ethernet, and NFS on Ethernet achieved a
    bandwidth of 0.13 MB/s
  • Latency time between chkpnt initiated and ready
    for recovery
  • Overhead increase in execution time with chkpnt

30
(No Transcript)
31
(No Transcript)
32
Discussion
  • Latency diskless has much lower latency than
    disk-based.
  • Lowers the expected running time of the
    application in the presence of failures (has
    small recovery time)
  • Overhead comparable

33
Recommendations
  • DISK-FORK
  • If chkpnt are small
  • If the likelihood of wholesale system failures
    are high
  • C-FORK
  • If many pages, but a few bytes per page are
    modified
  • INC-FORK
  • If not a significant number of pages are modified

34
Reference
  • J. S. Plank, K. Li, and M.A. Puening. "Diskless
    checkpointing." IEEE Transactions on Parallel
    Distributed Systems, 9(10)972986, Oct. 1998
Write a Comment
User Comments (0)
About PowerShow.com