Title: Locality Optimizations in OceanStore
1Locality Optimizations in OceanStore
An introduction to introspective techniques for
exploiting locality in wide area storage
utilities.
- Patrick R. Eaton
- Dennis Geels
2Agenda
- OceanStore Review
- Problem Overview
- Previous Work
- Proposed Solution
- Prefetching Algorithm
- Preliminary Results
- Future Work
3OceanStore Review
- Properties of OceanStore relevant to
introspective locality optimizations - implemented in the extremely wide area
- has many places to put any single piece of data
- cannot rely on users to make relationships among
data explicit - depends on effective locality optimizations for
improved performance - No possible way to solve exactly
4Problem Overview
- Passively observe data accesses
- data shared among multiple users
- single users accessing the network from different
physical locations - data is replicated across the network
- Optimize the location of data to provide quicker
access to users - cluster semantically related data
- replicate data to move it closer to consumers
- migrate primary replicas toward the source of
updates
5Measurable Attributes
- File Temperature
- A measure that indicates the frequency of access
to the file - A hot file is frequently accessed
- Semantic Distance (Kuenning)
- Any measure that can quantify relationships
between files on the range 0,?) - Local distance relates one instance of a file
access to another - Reference distance is an aggregate measure that
summarizes all local distances for a pair of
files - Typical measures use access order or timing
information
6Prefetching Techniques
- Automatic Prefetching (Griffoen and Appleton)
- construct a probability graph that records
accesses which follow within a lookahead period - predict a prefetch when the chance of an access
is above a tunable parameter - Context Modeling (Kroeger and Long)
- record in a trie all access sequences which have
been observed - maintain pointers to all nodes which represent
current contexts - predict a prefetch when the chance of an access
to a child of a current context is above a
probability threshold
7Our Approach
- Exploit the ideas of semantic distance to compute
relationships among data - Cluster data based on the observed relationships
- Store a summary of these relationships with the
data - Migrate (prefetch) files based on familiar
patterns in the access stream - recognize higher order correlations as in context
modeling - tolerate noise in the access stream
8Motivation for Prefetching Algorithm
A
Y
Many patterns can be predicted only by
observation of higher-order correlation--combining
several pieces of past history.
K
B
Z
A
Other patterns can only be detected through
identification and filtering of noise.
B
C
9General Prefetching Algorithm
- Update
- Record the most recent file accesses in the file
history buffer (FHB) - Each time a new file S is accessed, extract all
triples of the form (FHB(i), FHB(j)) ?S from the
FHB and update in the second-order distance table - Predict
- Each time a new file S is accessed, examine the
distance table entries of (FHB(i), S) - Prefetch files that are predicted with confidence
above a certain threshold - Problems
- O(k2) work to update distance table
- Noise infects distance table
10Optimizations to the Prefetching Algorithm
- First-order distance table
- Records files that are close, as measured by
semantic distance - Allows reverse lookup
- Use first-order distance tables to filter out
irrelevant file relationships - Update only relevant entries in the second-order
distance table - Search for predictions based on only relevant
access pairs
Indicative FHBs
11Prefetching Algorithm Example
- Update
- Extract relevant triples by intersecting the FHB
with the results from the reverse lookup in
first-order tables
- Predict
- Extract relevant doubles by intersecting the FHB
with the results from the reverse lookup in the
first-order tables - Prefetch if the second-order table predicts a
future access with sufficient confidence
12Preliminary Results (Local System)
13Future Work
- Retarget the simulations to model OceanStore
- Continue to refine the prefetching algorithm
- Examine the potential of higher order prefetching
- Combine prefetching and clustering
- Look for opportunities to test the ideas on
different workloads