OceanStore Global-Scale Persistent Storage - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

OceanStore Global-Scale Persistent Storage

Description:

OceanStore Global-Scale Persistent Storage John Kubiatowicz University of California at Berkeley Context: Project Endeavour Interdisciplinary, Technology-Centered ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 35
Provided by: JohnKubi2
Category:

less

Transcript and Presenter's Notes

Title: OceanStore Global-Scale Persistent Storage


1
OceanStoreGlobal-Scale Persistent Storage
  • John Kubiatowicz
  • University of California at Berkeley

2
Context Project EndeavourInterdisciplinary,
Technology-Centered Team
  • Alex Aiken, PL
  • Eric Brewer, OS
  • John Canny, AI
  • David Culler, OS/Arch
  • Joseph Hellerstein, DB
  • Michael Jordan, Learning
  • Anthony Joseph, OS
  • Randy Katz, Nets
  • John Kubiatowicz, Arch
  • James Landay, UI
  • Jitendra Malik, Vision
  • George Necula, PL
  • Christos Papadimitriou, Theory
  • David Patterson, Arch
  • Kris Pister, Mems
  • Larry Rowe, MM
  • Alberto Sangiovanni-Vincentelli, CAD
  • Doug Tygar, Security
  • Robert Wilensky, DL/AI

3
Endeavour Goals
  • Enhancing human understanding
  • Help people to interact with information,
    devices, and people - exploit Moores law growth
    in everything
  • Enable new approaches for problem solving
    learning
  • Figure of merit how effectively we amplify and
    leverage human intellect
  • Enabling and exploiting ubiquitous computing
  • Small devices, sensors, smart materials, cars,
    etc
  • New methods for design, construction, and
    administration of ultra-scale systems
  • Planetary-scale Information Utilities
  • Infrastructure is transparent and always active
  • Extensive use of redundancy of hardware and data
  • Devices that negotiate their interfaces
    automatically
  • Elements that tune, repair, and maintain
    themselves

4
Endeavour Maxims
  • Exploit Moores law growth for better behavior
  • Use of excess capacity for better human
    interface
  • Personal Information Mgmt is the Killer App
  • Not corporate processing but management,
    analysis, aggregation, dissemination, filtering
    for the individual
  • Automated extraction and organization of daily
    activities to assist people
  • Time to move beyond the Desktop
  • Community computing infer relationships among
    information, delegate control, establish
    authority
  • Information Technology as a Utility
  • Continuous service delivery, on a
    planetary-scale, on top of a highly dynamic
    information base

5
Endeavour Approach
  • Information Devices
  • Beyond desktop computers to MEMS-sensors/actuators
    with capture/display to yield enhanced activity
    spaces
  • Information Utility
  • Information Applications
  • High Speed/Collaborative Decision Making and
    Learning
  • Augmented Smart Spaces Rooms and Vehicles
  • Design Methodology
  • User-centric Design withHW/SW Co-design
  • Formal methods for safe and trustworthy
    decomposable and reusable components
  • Fluid, Network-Centric System Software
  • Partitioning and management of state between soft
    and persistent state
  • Data processing placement and movement
  • Component discovery and negotiation
  • Flexible capture, self-organization, and re-use
    of information

6
OceanStore Context Ubiquitous Computing
  • Computing everywhere
  • Desktop, Laptop, Palmtop
  • Cars, Cellphones
  • Shoes? Clothing? Walls?
  • Connectivity everywhere
  • Rapid growth of bandwidth in the interior of the
    net
  • Broadband to the home and office
  • Wireless technologies such as CMDA, Satelite,
    laser
  • Rise of the thin-client metaphor
  • Services provided by interior of network
  • Incredibly thin clients on the leaves
  • MEMs devices -- sensorsCPUwireless net in 1mm3
  • Mobile society people move and devices are
    disposable

7
Questions about information
  • Where is persistent information stored?
  • 20th-century tie between location and content
    outdated (we all survived the Feb 29th bug --
    lets move on!)
  • In world-scale system, locality is key
  • How is it protected?
  • Can disgruntled employee of ISP sell your
    secrets?
  • Cant trust anyone (how paranoid are you?)
  • Can we make it indestructible?
  • Want our data to survive the big one!
  • Highly resistant to hackers (denial of service)
  • Wide-scale disaster recovery
  • Is it hard to manage?
  • Worst failures are human-related
  • Want automatic (introspective) diagnose and
    repair

8
First ObservationWant Utility Infrastructure
  • Mark Weiser from Xerox Transparent computing is
    the ultimate goal
  • Computers should disappear into the background
  • In storage context
  • Dont want to worry about backup
  • Dont want to worry about obsolescence
  • Need lots of resources to make data secure and
    highly available, BUT dont want to own them
  • Outsourcing of storage already becoming popular
  • Pay monthly fee and your data is out there
  • Simple payment interface ? one bill from one
    company

9
Second ObservationNeed wide-scale deployment
  • Many components with geographic separation
  • System not disabled by natural disasters
  • Can adapt to changes in demand and regional
    outages
  • Gain in stability through statistics
  • Difference between thermodynamics and mechanics?
    surprising stability of temperature and pressure
    given 1030 molecules with highly variable
    behavior!
  • Wide-scale use and sharing also requires
    wide-scale deployment
  • Bandwidth increasing rapidly, but latency bounded
    by speed of light
  • Handling many people with same system leads to
    economies of scale

10
OceanStoreEveryones data, One big Utility
  • The data is just out there
  • Separate information from location
  • Locality is an only an optimization (an important
    one!)
  • Wide-scale coding and replication for durability
  • All information is globally identified
  • Unique identifiers are hashes over names keys
  • Single uniform lookup interface replaces DNS,
    server location, data location
  • No centralized namespace required (such as SDSI)

11
Basic StructureIrregular Mesh of Pools
12
Amusing back of the envelope calculation(courtesy
Bill Bolotsky, Microsoft)
  • How many files in the OceanStore?
  • Assume 1010 people in world
  • Say 10,000 files/person (very conservative?)
  • So 1014 files in OceanStore!
  • If 1 gig files (not likely), get 1 mole of files!
  • Truly impressive number of elements
  • but small relative to physical constants

13
Utility-based Infrastructure
Canadian OceanStore
Sprint
ATT
IBM
Pac Bell
IBM
  • Service provided by confederation of companies
  • Monthly fee paid to one service provider
  • Companies buy and sell capacity from each other

14
Outline
  • Motivation
  • Properties of the OceanStore and Assumptions
  • Specific Technologies and approaches
  • Conflict resolution on encrypted data
  • Replication and Deep archival storage
  • Naming and Data Location
  • Introspective computing for optimization and
    repair
  • Economic models
  • Conclusion

15
Ubiquitous Devices ? Ubiquitous Storage
  • Consumers of data move, change from one device to
    another, work in cafes, cars, airplanes, the
    office, etc.
  • Properties REQUIRED for OceanStore storage
    substrate
  • Strong Security data encrypted in the
    infrastructure resistance to monitoring and
    denial of service attacks
  • Coherence too much data for naïve users to keep
    coherent by hand
  • Automatic replica management and optimization
    huge quantities of data cannot be managed
    manually
  • Simple and automatic recovery from disasters
    probability of failure increases with size of
    system
  • Utility model world-scale system requires
    cooperation across administrative boundaries

16
State of the Art?
  • Widely deployed systems NFS, AFS (/DFS)
  • Single regions of failure, caching only at
    endpoints
  • ClearText exposed at various levels of system
  • Compromised server?? all data on server
    compromised
  • Mobile computing community Coda, Ficus, Bayou
  • Small scale, fixed coherence mechanism
  • Not optimized to take advantage of high-bandwidth
    connections between server components
  • ClearText also exposed at various levels of
    system
  • Web caching community Inktomi, Akamai
  • Specialized, incremental solutions
  • Caching along client/server path, various
    bottlenecks
  • Database Community
  • Interfaces not usable by legacy applications
  • ACID update semantics not always appropriate

17
OceanStore Assumptions
  • Untrusted Infrastructure
  • The OceanStore is comprised of untrusted
    components
  • Only cyphertext within the infrastructure
  • Information must not be leaked over time
  • Principle Party
  • There is one organization that is financially
    responsible for the integrity of your data
  • Mostly Well-Connected
  • Data producers and consumers are connected to a
    high-bandwidth network most of the time
  • Exploit multicast for quicker consistency when
    possible
  • Promiscuous Caching
  • Data may be cached anywhere, anytime
  • Operations Interface with Conflict Resolution
  • Applications employ an operations-oriented
    interface, rather than a file-systems interface
  • Coherence is centered around conflict resolution

18
OceanStore Technologies INaming and Data
Location
  • Requirements
  • System-level names should help to authenticate
    data
  • Route to nearby data without global communication
  • Dont inhibit rapid relocation of data
  • OceanStore approach Two-level search with
    embedded routing
  • Underlying namespace is flat and built from
    secure cryptographic hashes (160-bit SHA-1)
  • Search process combines quick, probabilistic
    search with slower guaranteed search
  • Long-distance data location and routing are
    integrated
  • Every source/destination pair has multiple
    routing paths
  • Continuous, on-line optimization adapts for hot
    spots, denial of service, and inefficiencies in
    routing

19
Universal Location Facility
  • Takes 160-bit unique identifier (GUID) and
    Returns the nearest object that matches

20
Some current results
  • Have a working algorithm for local search
  • Uses attenuated bloom filters
  • Performs search by passing messages from node to
    node. All state kept in messages!
  • Updates filters through semi-chaotic passing of
    information between neighbors
  • Resembles compiler dataflow algorithm
  • Can be shown to converge
  • Have candidate for backing store index
  • Randomized data structure with locality
    properties
  • Every document has multiple roots in the
    OceanStore
  • Searches close to copy tend to find copy
    quickly
  • Redundant, insensitive to faults, and repairable
  • Investigating algorithms to continually adapt
    routing structure to adjust for faults and denial
    of service

21
OceanStore Technologies IIRapid Update in an
Untrusted Infrastructure
  • Requirements
  • Scalable coherence mechanism which can operate
    directly on encrypted data without revealing
    information
  • Handle Byzantine failures
  • Rapid dissemination of committed information
  • OceanStore Approach
  • Operations-based interface using conflict
    resolution
  • Modeled after Xerox Bayou ? updates packets
    includePredicate/update pairs which operate on
    encryped data
  • Use of oblivious function techniques to perform
    this update
  • Use of incremental cryptographic techniques
  • User signs Updates and principle party signs
    commits
  • Committed data multicast to clients

22
Tentative UpdatesEpidemic Disemination
23
Committed UpdatesMulticast Dissemination
24
Our State of the Art
  • Have techniques for protecting metadata
  • Uses encryption and signatures to provide
    protection against substitution attacks
  • Provides secure pointer technology
  • Have a working scheme that can do some forms of
    conflict resolution directly on encryped data
  • Uses new technique for searching on encrypted
    data.
  • Can be generalized to perform optimistic
    concurrency, but at cost in performance and
    possibly privacy
  • Byzantine assumptions for update commitment
  • Signatures on update requests from clients
  • Compromised servers are unable to produce valid
    updates
  • Uncompromised second-tier servers can make
    consistent ordering decision with respect to
    tentative commits
  • Use of threshold cryptography in inner-tier of
    servers
  • Signatures on update stream from inner-tier
  • Use of chained MACs to reduce overhead

25
OceanStore Technologies IIIHigh-Availability
and Disaster Recovery
  • Requirements
  • Handle diverse, unstable participants in
    OceanStore
  • Mitigate denial of service attacks
  • Eliminate backup as independent (and fallible)
    technology
  • Flexible disaster recovery for everyone
  • OceanStore Approach
  • Use of erasure-codes to provide stable storage
    for archival copies and snapshots of live data
  • Mobile replicas are self-contained centers for
    logging and conflict resolution
  • Version-based update for painless recovery
  • Continuous introspection repairs data structures
    and degree of redundancy

26
Floating Replicas and Deep Archival Coding
  • Floating Replicas are per-object virtual servers
  • Complete copy of data
  • logging for updates/conflict resolution
  • Interaction with other centers to keep data
    consistent
  • May appear and disappear like bubbles
  • Erasure coded fragments provide very stable store
  • Multi-level codes spread over 1000s of nodes
  • Could lose 1/2 of nodes and still recover data
  • Archive old versions of data and checkpoints
  • Inactive data may only be in erasure-coded form

27
Floating Replica and Deep Archival Coding
28
Structure of Archival Checkpoints
Checkpoint Reference (GUID)
. . . . .
NOTE Each Block needs a GUID
Blocks
  • All blocks and fragments signed
  • Copy on Write behavior
  • Older metablocks fragmented also

Unit of Archival Storage
29
Proactive Self-Maintenance
  • Continuous testing and repair of information
  • Slow sweep through all information to make sure
    there are sufficient erasure-coded fragments
  • Continuously reevaluate of risk and redistribute
    data
  • Slow sweep and repair of metadata/search trees
  • Continuous online self-testing of HW and SW
  • detects flaky, failing, or buggy components via
  • fault injection triggering hardware and software
    error handling paths to verify their
    integrity/existence
  • stress testing pushing HW/SW components past
    normal operating parameters
  • scrubbing periodic restoration of potentially
    decaying hardware or software state
  • automates preventive maintenance

30
OceanStore Technologies IVIntrospective
Optimization
  • Requirements
  • Reasonable job on global-scale optimization
    problem
  • Take advantage of locality whenever possible
  • Sensitivity to limited storage and bandwidth at
    endpoints
  • Repair of data structures, increasing of
    redundancy
  • Stability in chaotic environment ? Active
    Feedback
  • OceanStore Approach
  • Introspective Monitoring and analysis of
    relationships to cluster information by
    relatedness
  • Time series-analysis of user and data motion
  • Rearrangement and replication in response to
    monitoring
  • Clustered prefetching fetch related objects
  • Proactive-prefetching get data there before
    needed
  • Rearrangement in response to overload and attack

31
Example Client Introspection
  • Client observer and optimizer components
  • greedy agents working on the behalf of the client
  • Watches client activity/combines with historical
    info
  • Performs clustering and time-series analysis
  • forwards results to infrastructure (privacy
    issues!)
  • Monitoring of state of network to adapt behavior
  • Typical Actions
  • cluster related files together
  • prefetch files that will be needed soon
  • Create/destroy floating replicas

32
OceanStore Technologies VThe oceanic data market
  • Properties
  • Utility providers have resources (storage and
    bandwidth)
  • Clients use resources both directly and
    indirectly
  • Use of data storage and bandwidth on demand
  • Data movement on behalf of users
  • Some customers are more important than others
  • Techniques that we are exploring (very
    preliminary)
  • Data market driven by principle party
  • Tradeoff between performance (replication) and
    cost
  • Secure signatures on data packets permit
  • Accounting of bandwidth and CPU utilization
  • Access control policies (Bays in OceanStore
    nomenclature)
  • Use of challenge-response protocols (similar to
    zero-knowledge proofs) to demonstrate possession
    of data

33
Two-Phase Implementation
  • This term Read-Mostly Prototype
  • Construction of data location facility
  • Initial introspective gathering of tacit info and
    adaptation
  • Initial archival techniques (use of erasure
    codes)
  • Unix file-system interface under Linux (legacy
    apps)
  • Later? Full Prototype
  • Final conflict resolution and encryption
    techniques
  • More sophisticated tacit info gathering and
    rearrangement
  • Final object interface and integration with
    Endeavour applications
  • Wide-scale deployment via NTON and Internet-2

34
OceanStore Conclusion
  • The Time is now for a Universal Data Utility
  • Ubiquitous computing and connectivity is (almost)
    here!
  • Confederation of utility providers is right model
  • OceanStore holds all data, everywhere
  • Local storage is a cache on global storage
  • Provides security in an untrusted infrastructure
  • Large scale system has good statistical
    properties
  • Use of introspection for performance and
    stability
  • Quality of individual servers enhances
    reliability
  • Exploits economies of scale to
  • Provide high-availability and extreme
    survivability
  • Lower maintenance cost
  • self-diagnosis and repair
  • Insensitivity to technology changesJust unplug
    one set of servers, plug in others
Write a Comment
User Comments (0)
About PowerShow.com