MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems

1 / 40
About This Presentation
Title:

MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems

Description:

... pointing and Logging for Recovery in Mobile Computing Systems ... disk cannot reliably function as the stable storage required to store recovery information. ... –

Number of Views:35
Avg rating:3.0/5.0
Slides: 41
Provided by: peopl88
Category:

less

Transcript and Presenter's Notes

Title: MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems


1
Movement-Based Check-pointing and Logging for
Recovery in Mobile Computing Systems
  • Sapna E. George, Ing-Ray Chen, Ying Jin
  • Dept. of Computer Science
  • Virginia Polytechnic and State University

2
Outline
  • Background
  • Problem Definition Failure Recovery in the
    Mobile Computing Environment
  • Proposed Solution Movement-Based Check-pointing
    and Logging
  • Performance Analysis
  • Analytic Model of the System
  • Analysis Results and Conclusions
  • Future Work

3
Background
4
Mobile Computing
  • Advances in wireless networking and portable
    device technologies are revolutionizing computing
  • Mobile Computing A type of distributed
    computing
  • Involves hosts that may be mobile
  • Host network connectivity maintained through
    wireless communications

5
Fault-tolerance in Distributed systems
  • Check-pointing, Logging, Rollback recovery
  • Check-pointing ? failure-free operations
  • Save system state to stable storage
  • This snapshot is called a checkpoint
  • Logging ? failure-free operations
  • All non-deterministic events and the information
    necessary to replay these events are logged to
    the stable storage
  • In addition to checkpoints

6
Fault-tolerance in Distributed systems
  • Failure Recovery
  • Failed process rolls back to the latest
    checkpoint
  • Replays all the logged events in their original
    order
  • Recreates pre-failure state independently

7
Problem Definition
  • Failure Recovery in the Mobile Computing
    Environment

8
Effects of Properties of MC Env.
  • Mobility of hosts
  • If checkpointing requires coordination, the MH
    must be searched and located first before control
    messages can be delivered this increases
    communication delay
  • Data related to recovery, such as checkpoints and
    logs, may be distributed over many MSS a
    mechanism is required for efficient storage,
    retrieval and management of this dispersed
    information

9
Effects of Properties of MC Env.
  • Low bandwidth and unreliable network connectivity
  • A recovery mechanism that requires a large number
    of messages or large size of messages imposes
    undue burden on the wireless resources and
    increases the cost of providing fault tolerance.

10
Effects of Properties of MC Env.
  • Limited battery life of host devices
  • Communication is energy intensive.
  • Recovery mechanism must keep communication (the
    number of messages and the size of messages) to a
    minimum.

11
Effects of Properties of MC Env.
  • Lack of stable storage on host devices
  • Devices are vulnerable to physical damage
  • Devices are small and are equipped with limited
    memory
  • MHs disk cannot reliably function as the stable
    storage required to store recovery information.

12
Effects of Properties of MC Env.
  • Different types of failures
  • Voluntary disconnection and hardware failure must
    be handled differently
  • A disconnected host may reconnect after a while
    and expect to resume operations
  • A MH that is currently unreachable cannot be
    expected to participate in a checkpointing or
    recovery operation.
  • A scheme that requires synchronization or
    coordination with other MHs would either block
    until the MH reconnected or would fail.

13
The Problem
  • Traditional recovery schemes suffer from many
    shortcomings when applied to the mobile computing
    environment.
  • The failure-prone nature of the environment makes
    it essential to provide some form of explicit
    recovery mechanism.

14
The Problem
  • In general, application recovery mechanisms try
    to balance
  • Recovery cost (failure-free operational cost)
  • Recovery time
  • Storage requirements for recovery related
    information

15
The Problem
  • Adaptations of traditional recovery schemes for
    the mobile computing environment
  • Do not consider mobility in the selection of
    checkpointing interval
  • Use periodic checkpointing
  • Subsequently control the proliferation of
    recovery information using techniques that merge
    logs and move the information closer to the MH.

16
Proposed Solution
  • Movement-Based Check-pointing and Logging

17
Assumed Mobile Computing System
  • A set of mobile hosts (MHs)
  • They maintain network connectivity through a
    wireless link to a static mobile support station
    (MSS)
  • A MSS handles all communications to and from MHs
    within its area of influence known as a cell
  • Each MSS is equipped with enough volume of stable
    storage to store the state and log information

18
Assumed Mobile Computing System
  • Interactions between the MH and the network
    infrastructure relevant to failure recovery
  • Handoff Cell boundary crossing
  • Disconnection For power conservation
  • Reconnection Possibly in a cell different from
    the one in which it disconnected

19
Assumed Mobile Computation
  • A distributed computation ? a number of processes
    executing concurrently on multiple hosts.
  • Process states
  • Normal- executing application related
    computations, receiving user inputs or sending
    and receiving messages.
  • Save - saves its state as a checkpoint to the
    stable storage
  • Between checkpoints, the process also logs all
    events (Normal state)
  • Recovery Loads checkpoints and applies logs

20
Movement-Based Checkpointing and Logging
  • Interval between checkpoints is governed by the
    number of handoffs experienced by the MH and is
    not fixed
  • MH maintains a handoff counter which is
    incremented by 1 every time a handoff occurs.
  • When the value of the counter becomes greater
    than a threshold M, a checkpoint is taken.
  • In between checkpoints, all write events related
    to a MH is also logged to the local MSS.

21
Movement-Based Checkpointing and Logging
  • The threshold M is a configurable parameter.
    Depends on
  • User mobility rate
  • Network the failure rate
  • Application log arrival rate

22
Movement-Based Checkpointing and Logging
  • Thus, depending on the variability in the MHs
    mobility, the time interval between successive
    checkpoints differs.
  • Recovery MH recovers independently without
    coordination with other MHs
  • Upon reconnection, MH informs local MSS.
  • Local MSS contacts MSS with latest checkpoint
  • Local MSS contacts all MSS storing logs
  • All data transferred to local MSS via wired
    network and to MH via wireless link
  • MH rolls back and applies logs

23
Movement-Based Checkpointing and Logging
  • The performance of this scheme depends on
    identifying the optimal movement threshold M per
    user and application.
  • Checkpoints and logs remain within acceptable
    range of the MHs current location and eliminates
    the need for information consolidation.
  • Ensures acceptable recovery time since M bounds
    the number of MSSs from which logs must be
    retrieved.

24
Performance Analysis
  • Analytic Model

25
Stochastic Petri-Net (SPN) Model
26
SPN Model Parameters
27
SPN Model Parameters
  • Parameter Tk- Checkpoint rate of the MH
  • Parameter Ti- Recovery rate of the MH inverse
    of recovery time
  • i - number of handoffs experienced by the MH
    since the last checkpoint and before failure.

28
Analytic Model Recovery Time
29
Analytic Model Recovery Time
  • Treq_rec - Time spent on recovery information
    requests
  • Nmss_logs Number of MSSs storing logs
  • Dmss - average hop count between MSScp and MSSrec

30
Analytic Model Recovery Time
  • Tckp_tx - Time spent on transmitting the latest
    checkpoint to the MH
  • Tlog_tx - Time spent on transmitting the logs to
    the MH
  • Trec - Time spent to rollback to the last
    checkpoint and apply the logs

31
Analytic Model Cost of Recovery
  • Tr Average Recovery time per failure
  • Fr Recovery probability
  • Tc Cost of recovery

No. of checkpoints before failure
No. of logs before failure
32
SPN Evaluation Parameters
  • Size of a log entry - 50B
  • Size of a checkpoint - 2000B
  • Bandwidth of wired network-2Mbps
  • Ratio of bandwidth of wireless to wired network
    (r) - 0.1
  • Time required to apply a log entry (Telog) -
    0.0001s
  • Time required to transmit a log entry through the
    wireless channel (Tlog_w) - 0.002s
  • Time required to transmit a checkpoint through
    the wireless channel (Tckp_w) - 0.08s

33
Performance Analysis
  • Results and Conclusions

34
Recovery Probability vs. Recovery Time
35
Recovery Probability vs. Log Arrival Rate
36
Recovery Probability vs. Failure Rate
37
Recovery Probability Recovery Time vs. Movement
Threshold
38
Determining Optimal Movement Threshold that
Minimizes Recovery Cost Per Failure
39
Conclusion Proposed Scheme
  • An efficient failure recovery scheme for mobile
    computing systems based on movement-based
    checkpointing and logging
  • Movement-based checkpointing and logging scheme
    takes a checkpoint only after the mobile node has
    made M movements (mobility handoffs).
  • The value of M is governed by the failure rate,
    log arrival rate, and the mobility rate of the
    application and MH.
  • Identify the optimal movement threshold M, when
    given the failure, mobility and log arrival
    rates, to minimize the cost of recovery per
    failure.

40
Conclusion Practical Application
  • Build a table at configuration time covering
    possible parameter values of the mobility rate
    and failure rate of the MH and log arrival rate
    of the mobile applications, and listing the
    optimal M value that would minimize the recovery
    cost per failure.
  • At runtime, based on the measured rates, the
    optimal M may be selected dynamically to minimize
    the recovery cost per failure.
  • Optimal M selected must also satisfy the
    specified recovery probability when given an
    application deadline to recover from a failure.
Write a Comment
User Comments (0)
About PowerShow.com