Title: OceanStore GlobalScale Persistent Storage
1OceanStoreGlobal-Scale Persistent Storage
- Ying Lu
- CSCE496/896 Spring 2006
2Give Credits
- Many slides are from John Kubiatowicz, University
of California at Berkeley - I have modified them and added new slides
3Endeavour Maxims
- Personal Information Mgmt is the Killer App
- Not corporate processing but management,
analysis, aggregation, dissemination, filtering
for the individual - Automated extraction and organization of daily
activities to assist people - Information Technology as a Utility
- Continuous service delivery, on a
planetary-scale, on top of a highly dynamic
information base
4OceanStore Context Ubiquitous Computing
- Computing everywhere
- Desktop, Laptop, Palmtop, Cars, Cellphones
- Shoes? Clothing? Walls?
- Connectivity everywhere
- Rapid growth of bandwidth in the interior of the
net - Broadband to the home and office
- Wireless technologies such as CDMA, Satelite,
laser - Rise of the thin-client metaphor
- Services provided by interior of network
- Incredibly thin clients on the leaves
- MEMs devices -- sensorsCPUwireless net in 1mm3
- Mobile society people move and devices are
disposable
5Questions about information
- Where is persistent information stored?
- 20th-century tie between location and content
outdated - (we all survived the Feb 29th bug -- lets move
on!) - How is it protected?
- Can disgruntled employee of ISP sell your
secrets? - Cant trust anyone (how paranoid are you?)
- Can we make it indestructible?
- Want our data to survive the big one!
- Highly resistant to hackers (denial of service)
- Wide-scale disaster recovery
- Is it hard to manage?
- Worst failures are human-related
- Want automatic (introspective) diagnose and
repair
6First ObservationWant Utility Infrastructure
- Mark Weiser from Xerox Transparent computing is
the ultimate goal - Computers should disappear into the background
- In storage context
- Dont want to worry about backup, obsolescence
- Need lots of resources to make data secure and
highly available, BUT dont want to own them - Outsourcing of storage already becoming popular
- Pay monthly fee and your data is out there
- Simple payment interface ? one bill from one
company
7Second ObservationNeed wide-scale deployment
- Many components with geographic separation
- System not disabled by natural disasters
- Can adapt to changes in demand and regional
outages - Wide-scale use and sharing also requires
wide-scale deployment - Bandwidth increasing rapidly, but latency bounded
by speed of light - Handling many people with same system leads to
economies of scale
8OceanStoreEveryones data, One big Utility
- The data is just out there
- Separate information from location
- Locality is only an optimization (an important
one!) - Wide-scale coding and replication for durability
- All information is globally identified
- Unique identifiers are hashes over names keys
- Single uniform lookup interface replaces DNS,
server location, data location - No centralized namespace required (such as SDSI)
9Amusing back of the envelope calculation(courtesy
Bill Bolotsky, Microsoft)
- How many files in the OceanStore?
- Assume 1010 people in world
- Say 10,000 files/person (very conservative?)
- So 1014 files in OceanStore!
- If 1 gig files (not likely), get 1 mole of files!
- Truly impressive number of elements
- but small relative to physical constants
10Utility-based Infrastructure
Canadian OceanStore
Sprint
ATT
IBM
Pac Bell
IBM
- Service provided by confederation of companies
- Monthly fee paid to one service provider
- Companies buy and sell capacity from each other
11Outline
- Motivation
- Properties of the OceanStore
- Specific Technologies and approaches
- Naming and Data Location
- Conflict resolution on encrypted data
- Replication and Deep archival storage
- Introspective computing for optimization and
repair - Economic models
- Conclusion
12Ubiquitous Devices ? Ubiquitous Storage
- Consumers of data move, change from one device to
another, work in cafes, cars, airplanes, the
office, etc. - Properties REQUIRED for OceanStore storage
substrate - Strong Security data encrypted in the
infrastructure resistance to monitoring and
denial of service attacks - Coherence too much data for naïve users to keep
coherent by hand - Automatic replica management and optimization
huge quantities of data cannot be managed
manually - Simple and automatic recovery from disasters
probability of failure increases with size of
system - Utility model world-scale system requires
cooperation across administrative boundaries
13OceanStore Technologies INaming and Data
Location
- Requirements
- System-level names should help to authenticate
data - Route to nearby data without global communication
- Dont inhibit rapid relocation of data
- OceanStore approach Two-level search with
embedded routing - Underlying namespace is flat and built from
secure cryptographic hashes (160-bit SHA-1) - Search process combines quick, probabilistic
search with slower guaranteed search
14Universal Location Facility
- Takes 160-bit unique identifier (GUID) and
Returns the nearest object that matches
15Routing Two-tiered approach
- Fast probabilistic routing algorithm
- Entities that are accessed frequently are likely
to reside close to where they are being used
(ensured by introspection) - Slower, guaranteed hierarchical routing method
Self-optimizing
16Probabilistic Routing Algorithm
self-optimizing on the depth of the attenuated
bloom flilter array
n3
n2
n1
n4
self-protecting
Bloom filter on each node Attenuated Bloom
filter on each directed edge.
17Hierarchical Routing Algorithm
- Based on Plaxton scheme
- Every server in the system is assigned a random
node-ID - Objects root
- each object is mapped to a single node whose
node-ID matches the objects GUID in the most
bits (starting from the least significant) - Information about the GUID (such as location)
were stored at its root
18Construct Plaxton Mesh
0324
1324
19Basic Plaxton MeshIncremental suffix-based
routing
e
d
c
b
a
20Use of Plaxton MeshRandomization and Locality
21OceanStore Enhancements of the Plaxton Mesh
- Documents have multiple roots (Salted hash of
GUID) - Each node has multiple neighbor links
- Searches proceed along multiple paths
- Tradeoff between reliability, performance and
bandwidth? - Dynamic node insertion and deletion algorithms
- Continuous repair and incremental optimization of
links
self-healing
self-optimizing
self-managing
22OceanStore Technologies IIRapid Update in an
Untrusted Infrastructure
- Requirements
- Scalable coherence mechanism which can operate
directly on encrypted data without revealing
information - Handle Byzantine failures
- Rapid dissemination of committed information
- OceanStore Approach
- Operations-based interface using conflict
resolution - Modeled after Xerox Bayou ? updates packets
includePredicate/update pairs which operate on
encrypted data - User signs Updates and principle party signs
commits - Committed data multicast to clients
23Update Model
- Concurrent updates w/o wide-area locking
- Conflict resolution
- Updates Serialization
- A master replica?
- Role of primary tier of replicas
- All updates submitted to primary tier of replicas
which chooses a final total order by following
Byzantine agreement protocol - A secondary tier of replicas
- The result of the updates is multicast down the
dissemination tree to all the secondary replicas
24Tentative UpdatesEpidemic Disemination
25Committed UpdatesMulticast Dissemination
26Data Coding Model
- Two distinct forms of data active and archival
- Active Data in Floating Replicas
- Latest version of the object
- Archival Data in Erasure Coded Fragments
- A permanent, read-only version of the object
- During commit, previous version coded with
erasure-code and spread over 100s or 1000s of
nodes - Advantage any 1/2 or 1/4 of fragments
regenerates data
27Floating Replica and Deep Archival Coding
28Proactive Self-Maintenance
- Continuous testing and repair of information
- Slow sweep through all information to make sure
there are sufficient erasure-coded fragments - Continuously reevaluate of risk and redistribute
data - Slow sweep and repair of metadata/search trees
- Continuous online self-testing of HW and SW
- Detects flaky, failing, or buggy components via
- fault injection triggering hardware and software
error handling paths to verify their
integrity/existence - stress testing pushing HW/SW components past
normal operating parameters - scrubbing periodic restoration of potentially
decaying hardware or software state - Automates preventive maintenance
29OceanStore Technologies IVIntrospective
Optimization
- Requirements
- Reasonable job on global-scale optimization
problem - Take advantage of locality whenever possible
- Sensitivity to limited storage and bandwidth at
endpoints - Repair of data structures, increasing of
redundancy - Stability in chaotic environment ? Active
Feedback - OceanStore Approach
- Introspective Monitoring and analysis of
relationships to cluster information by
relatedness - Time series-analysis of user and data motion
- Rearrangement and replication in response to
monitoring - Clustered prefetching fetch related objects
- Proactive-prefetching get data there before
needed - Rearrangement in response to overload and attack
30 Example Client Introspection
- Client observer and optimizer components
- greedy agents working on the behalf of the client
- Watches client activity/combines with historical
info - Performs clustering and time-series analysis
- forwards results to infrastructure (privacy
issues!) - Monitoring of state of network to adapt behaviour
- Typical Actions
- cluster related files together
- prefetch files that will be needed soon
- Create/destroy floating replicas
31OceanStore Conclusion
- The Time is now for a Universal Data Utility
- Ubiquitous computing and connectivity is (almost)
here! - Confederation of utility providers is right model
- OceanStore holds all data, everywhere
- Local storage is a cache on global storage
- Provides security in an untrusted infrastructure
- Large scale system has good statistical
properties - Use of introspection for performance and
stability - Quality of individual servers enhances
reliability - Exploits economies of scale to
- Provide high-availability and extreme
survivability - Lower maintenance cost
- self-diagnosis and repair
- Insensitivity to technology changesJust unplug
one set of servers, plug in others