MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems

1 / 40

About This Presentation

Title:

MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems

Description:

... pointing and Logging for Recovery in Mobile Computing Systems ... disk cannot reliably function as the stable storage required to store recovery information. ... –

Number of Views:35

Avg rating:3.0/5.0

Slides: 41

Provided by: peopl88

Category:

more less

Transcript and Presenter's Notes

Title: MovementBased Checkpointing and Logging for Recovery in Mobile Computing Systems

1
Movement-Based Check-pointing and Logging for
Recovery in Mobile Computing Systems

Sapna E. George, Ing-Ray Chen, Ying Jin
Dept. of Computer Science
Virginia Polytechnic and State University

2
Outline

Background
Problem Definition Failure Recovery in the
Mobile Computing Environment
Proposed Solution Movement-Based Check-pointing
and Logging
Performance Analysis
Analytic Model of the System
Analysis Results and Conclusions
Future Work

3
Background
4
Mobile Computing

Advances in wireless networking and portable
device technologies are revolutionizing computing
Mobile Computing A type of distributed
computing
Involves hosts that may be mobile
Host network connectivity maintained through
wireless communications

5
Fault-tolerance in Distributed systems

Check-pointing, Logging, Rollback recovery
Check-pointing ? failure-free operations
Save system state to stable storage
This snapshot is called a checkpoint
Logging ? failure-free operations
All non-deterministic events and the information
necessary to replay these events are logged to
the stable storage
In addition to checkpoints

6
Fault-tolerance in Distributed systems

Failure Recovery
Failed process rolls back to the latest
checkpoint
Replays all the logged events in their original
order
Recreates pre-failure state independently

7
Problem Definition

Failure Recovery in the Mobile Computing
Environment

8
Effects of Properties of MC Env.

Mobility of hosts
If checkpointing requires coordination, the MH
must be searched and located first before control
messages can be delivered this increases
communication delay
Data related to recovery, such as checkpoints and
logs, may be distributed over many MSS a
mechanism is required for efficient storage,
retrieval and management of this dispersed
information

9
Effects of Properties of MC Env.

Low bandwidth and unreliable network connectivity
A recovery mechanism that requires a large number
of messages or large size of messages imposes
undue burden on the wireless resources and
increases the cost of providing fault tolerance.

10
Effects of Properties of MC Env.

Limited battery life of host devices
Communication is energy intensive.
Recovery mechanism must keep communication (the
number of messages and the size of messages) to a
minimum.

11
Effects of Properties of MC Env.

Lack of stable storage on host devices
Devices are vulnerable to physical damage
Devices are small and are equipped with limited
memory
MHs disk cannot reliably function as the stable
storage required to store recovery information.

12
Effects of Properties of MC Env.

Different types of failures
Voluntary disconnection and hardware failure must
be handled differently
A disconnected host may reconnect after a while
and expect to resume operations
A MH that is currently unreachable cannot be
expected to participate in a checkpointing or
recovery operation.
A scheme that requires synchronization or
coordination with other MHs would either block
until the MH reconnected or would fail.

13
The Problem

Traditional recovery schemes suffer from many
shortcomings when applied to the mobile computing
environment.
The failure-prone nature of the environment makes
it essential to provide some form of explicit
recovery mechanism.

14
The Problem

In general, application recovery mechanisms try
to balance
Recovery cost (failure-free operational cost)
Recovery time
Storage requirements for recovery related
information

15
The Problem

Adaptations of traditional recovery schemes for
the mobile computing environment
Do not consider mobility in the selection of
checkpointing interval
Use periodic checkpointing
Subsequently control the proliferation of
recovery information using techniques that merge
logs and move the information closer to the MH.

16
Proposed Solution

Movement-Based Check-pointing and Logging

17
Assumed Mobile Computing System

A set of mobile hosts (MHs)
They maintain network connectivity through a
wireless link to a static mobile support station
(MSS)
A MSS handles all communications to and from MHs
within its area of influence known as a cell
Each MSS is equipped with enough volume of stable
storage to store the state and log information

18
Assumed Mobile Computing System

Interactions between the MH and the network
infrastructure relevant to failure recovery
Handoff Cell boundary crossing
Disconnection For power conservation
Reconnection Possibly in a cell different from
the one in which it disconnected

19
Assumed Mobile Computation

A distributed computation ? a number of processes
executing concurrently on multiple hosts.
Process states
Normal- executing application related
computations, receiving user inputs or sending
and receiving messages.
Save - saves its state as a checkpoint to the
stable storage
Between checkpoints, the process also logs all
events (Normal state)
Recovery Loads checkpoints and applies logs

20
Movement-Based Checkpointing and Logging

Interval between checkpoints is governed by the
number of handoffs experienced by the MH and is
not fixed
MH maintains a handoff counter which is
incremented by 1 every time a handoff occurs.
When the value of the counter becomes greater
than a threshold M, a checkpoint is taken.
In between checkpoints, all write events related
to a MH is also logged to the local MSS.

21
Movement-Based Checkpointing and Logging

The threshold M is a configurable parameter.
Depends on
User mobility rate
Network the failure rate
Application log arrival rate

22
Movement-Based Checkpointing and Logging

Thus, depending on the variability in the MHs
mobility, the time interval between successive
checkpoints differs.
Recovery MH recovers independently without
coordination with other MHs
Upon reconnection, MH informs local MSS.
Local MSS contacts MSS with latest checkpoint
Local MSS contacts all MSS storing logs
All data transferred to local MSS via wired
network and to MH via wireless link
MH rolls back and applies logs

23
Movement-Based Checkpointing and Logging

The performance of this scheme depends on
identifying the optimal movement threshold M per
user and application.
Checkpoints and logs remain within acceptable
range of the MHs current location and eliminates
the need for information consolidation.
Ensures acceptable recovery time since M bounds
the number of MSSs from which logs must be
retrieved.

24
Performance Analysis

Analytic Model

25
Stochastic Petri-Net (SPN) Model
26
SPN Model Parameters
27
SPN Model Parameters

Parameter Tk- Checkpoint rate of the MH
Parameter Ti- Recovery rate of the MH inverse
of recovery time
i - number of handoffs experienced by the MH
since the last checkpoint and before failure.

28
Analytic Model Recovery Time
29
Analytic Model Recovery Time

Treq_rec - Time spent on recovery information
requests
Nmss_logs Number of MSSs storing logs
Dmss - average hop count between MSScp and MSSrec

30
Analytic Model Recovery Time

Tckp_tx - Time spent on transmitting the latest
checkpoint to the MH

Tlog_tx - Time spent on transmitting the logs to
the MH

Trec - Time spent to rollback to the last
checkpoint and apply the logs

31
Analytic Model Cost of Recovery

Tr Average Recovery time per failure
Fr Recovery probability
Tc Cost of recovery

No. of checkpoints before failure
No. of logs before failure
32
SPN Evaluation Parameters

Size of a log entry - 50B
Size of a checkpoint - 2000B
Bandwidth of wired network-2Mbps
Ratio of bandwidth of wireless to wired network
(r) - 0.1
Time required to apply a log entry (Telog) -
0.0001s
Time required to transmit a log entry through the
wireless channel (Tlog_w) - 0.002s
Time required to transmit a checkpoint through
the wireless channel (Tckp_w) - 0.08s

33
Performance Analysis

Results and Conclusions

34
Recovery Probability vs. Recovery Time
35
Recovery Probability vs. Log Arrival Rate
36
Recovery Probability vs. Failure Rate
37
Recovery Probability Recovery Time vs. Movement
Threshold
38
Determining Optimal Movement Threshold that
Minimizes Recovery Cost Per Failure
39
Conclusion Proposed Scheme

An efficient failure recovery scheme for mobile
computing systems based on movement-based
checkpointing and logging
Movement-based checkpointing and logging scheme
takes a checkpoint only after the mobile node has
made M movements (mobility handoffs).
The value of M is governed by the failure rate,
log arrival rate, and the mobility rate of the
application and MH.
Identify the optimal movement threshold M, when
given the failure, mobility and log arrival
rates, to minimize the cost of recovery per
failure.

40
Conclusion Practical Application

Build a table at configuration time covering
possible parameter values of the mobility rate
and failure rate of the MH and log arrival rate
of the mobile applications, and listing the
optimal M value that would minimize the recovery
cost per failure.
At runtime, based on the measured rates, the
optimal M may be selected dynamically to minimize
the recovery cost per failure.
Optimal M selected must also satisfy the
specified recovery probability when given an
application deadline to recover from a failure.