Title: Replication, Caching, Prefetching and Hoarding for Mobile Computing
1Replication, Caching, Prefetching and Hoarding
for Mobile Computing
2Definitions
- Replication To maintain multiple (consistent)
copies of a data item - Static replication the number and location of
copies are statically determined (at compile
time, design time). - Dynamic replication the number and location of
copies is determined dynamically (at run-time) - Caching To maintain a temporary copy of the data
in fast (local) memory. The copy is fetched when
it is first accessed. - Pre-fetching To obtain a temporary before it is
accessed (to hide access latency). - Hoarding To preload a copy of a data object so
that the mobile client can work while it is
disconnected from the network (I.e. prefetching
to tolerate disconnections).
3Data Access Model
- On-Demand
- Broadcast Channel
4Motivation
- Caching (Prefetching/Hoarding) at mobile clients
is crucial to improve performance of info access
and database querying. - Issues
- Read only data currency guarantees
- Read/Write data consistency in presence of
disconnected operations - Server Load/Scalability in presence of numerous
clients
5Mobile Database Querying Requirements
- Minimize query delay
- Maximize number of queries answered per unit time
(system throughput) - Handle client disconnection
- Conserve wireless bandwidth and battery power
- Minimize server load
- Handle mobility
6Advantages of Caching in Mobile Environment
- Helps reduce latency caused by narrow bandwidth
wireless links - Enable limited functionality in mobile hosts even
in disconnected mode - Helps conserve battery power by reducing the
number of uplink queries - Conserves bandwidth
7Problems in Maintaining Consistent Cache
- Classic solutions do not work
- Mobile Clients may be disconnected for long
duration gt invalidations may be lost - Upon reconnection mobile clients will have to
revalidate their cache (wastes energy and
bandwidth). - Need new solutions
8Challenges to Efficient Caching Scheme
- Efficient caching scheme should take into
account - Data access pattern
- Data update rate
- Communication/access cost
- Mobility pattern of the clients
- Connectivity characteristics
- Disconnection frequency
- Available bandwidth
- Data currency requirements
- Location-dependence of information
9General Issues in Designing Caching Schemes
- Where to cache? How many levels of caching to
use? - What to cache (when to cache a data item and how
long) ? - How to invalidate cached items? Who is
responsible for invalidations? What is the
granularity at which the invalidations are done? - What data currency guarantees the system can
provide to the user? What are the costs involved?
How to charge the user? - What is the effect of the caching scheme on the
query delay (response time) and the system
throughput (query completion rate)?
10Classification of Cache Invalidation Schemes
- Who is in charge of invalidations?
- Server or Client (Push or Pull) Callbacks or
Validation Checks - Whether or not server maintains per client state
information? - Stateless or Stateful Server
- How server sends invalidation reports?
- Synchronously or Asynchronously
- What kind of information is sent in the
invalidation report? - State or History based
- How information is organized in invalidation
reports? - Uncompressed or Compressed
11Cache Maintenance Schemes
- Broadcasting Invalidation Reports Barbara Sigmod
94. - Disconnected Operation in CODA (Satyanarayanan
et. al. ) - Hoarding (Prefetching)
- AS (Asynchronous Stateful) Caching Scheme (Kahol
et. al. ICDCS 00)
12Broadcasting Invalidation Reports
- Uses stateless servers and synchronous broadcasts
Barbara Sigmod 94 - Clients maintain local caches and use the
information in invalidation reports to update
their cache. - A server broadcasts invalidation reports every L
time units which contains ids of all the data
items which changed during the past w kL time
units. - A query is satisfies after receiving the next
invalidation report.
13Broadcasting IR Variations
- If a client is disconnected from the network and
misses k consecutive invalidation reports then it
has to discard its cache. - Two variations
- Timestamp Strategy (TS) invalidation reports
contain ids of modified data items over a large
window (k gt 1). - Amnesic Terminal (AT) invalidation reports
contain ids of only those data items which
changed since the last broadcast (k1). - TS is better when clients are sleepers and AT
is better when clients are workaholics.
14Disconnected Operation in CODA
- Goal COnstant Data Availability
- Mechanisms server replication and disconnected
operations. - Caching scheme (asynchronous, stateful)
- Uses callbacks while a client is reachable from a
server. - During disconnections permits access to possibly
stale data. - Upon reconnection, the client does validity
checks on each volume cached. - Uses hoarding to improve data availability
15Drawbacks
- Drawbacks of Barbaras scheme
- Poor delay characteristics due to waiting
involved before answering a query. - Poor network utilization characteristics due to
answering of queries in bursts. - Does not support arbitrary disconnection pattern.
- Drawbacks of CODA caching scheme
- Server has to keep cache state of each client
(affects scalability). - A client has to perform volume-by-volume
validation check after each reconnection.
16AS Caching Scheme (Kahol et al)
- Maintains a Home Location Cache (HLC) at home MSS
of a mobile client. - A HLC contains the state of the cache at a MH.
- Uses Asynchronous transfer of invalidation
reports. - Supports arbitrary disconnection durations by
maintaining the timestamp of the last
invalidation report destined for an MH at its
HLC.
17An Example for AS Scheme
- Each cache is associated with a cache timestamp
which is the timestamp of the last invalidation
report received. - A mobile client sends a probe message to its home
MSS when it gets connected to determine whether
it missed any invalidation reports while it was
disconnected.
18Hoarding
- Planned and accidental disconnections are not
considered failures. - A technique to reduce the cost of cache misses
during disconnection - load necessary data before disconnect and be
ready. - Hoarding techniques
- user-provided information (client-initiated
disconnection) - explicitly specify which data (files, tables) to
hoard - Implicitly based on the specified application
- access structured-based (use past history)
- E.g., tree-based in file systems, access paths
(joins) in databases
19Hoarding versus Prefetching
- Both pre-fetch data in anticipation of future
use. - Prefetching
- Objective is to improve performance (throughput
or response time). - Cache miss is not catastrophic.
- Hoarding
- Objective is to fetch all needed data into MU
cache prior to disconnect. Thus the goal is to
facilitate disconnected operation. - Cache miss is catastrophic.
- OK to overfetch
20Hoarding in Database Systems
- Granularity of Hoarding
- RDBMS ranges from tables, set of tables, whole
relations - OO DBMS objects, set of objects or class
- Hoard by issuing queries or materialized views
- User may explicit issue hoarding queries
- E.g., Create View with Update-On clause Lauzac
98 - OO query to describe hoarding profiles
Gruber 94 - History of past references both queries and data
objects - Hoard Keys - an extended database organization
Badrinath 98 - hoard keys are used to partition a relation in
disjoint logical horizontal fragments
21References
- D. Barbara and T. Imielinski, Sleepers and
Workaholics Caching Strategies in Mobile
Environments, VLDB Journal, 4, 567-602, 1995. - A. Kahol, S. Khurana, S.K. S. Gupta, and P. K.
Srimani, A Strategy to Manage Cache Consistency
in a Disconnected Distributed Environment, IEEE
Transactions on Parallel and Distributed Systems,
12(7), 686-700, July 2001