OceanStore GlobalScale Persistent Storage - PowerPoint PPT Presentation

About This Presentation
Title:

OceanStore GlobalScale Persistent Storage

Description:

Consumers of data move, change from one device to another, work ... of service' need 12TB of spinning storage ... 100TB of spinning storage (Brewster Kahle) ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 18
Provided by: johnkubi
Category:

less

Transcript and Presenter's Notes

Title: OceanStore GlobalScale Persistent Storage


1
OceanStoreGlobal-Scale Persistent Storage
  • John Kubiatowicz

2
Ubiquitous Devices ? Ubiquitous Storage
  • Consumers of data move, change from one device to
    another, work in cafes, cars, airplanes, the
    office, etc.
  • Properties REQUIRED for Endeavour storage
    substrate
  • Strong Security data must be encrypted whenever
    in the infrastructure resistance to monitoring
  • Coherence too much data for naïve users to keep
    coherent by hand
  • Automatic replica management and optimization
    huge quantities of data cannot be managed
    manually
  • Simple and automatic recovery from disasters
    probability of failure increases with size of
    system
  • Utility model world-scale system requires
    cooperation across administrative boundaries

3
Utility-based Infrastructure
Canadian OceanStore
Sprint
ATT
IBM
Pac Bell
IBM
  • Service provided by confederation of companies
  • Monthly fee paid to one service provider
  • Companies buy and sell capacity from each other

4
State of the Art?
  • Widely deployed systems NFS, AFS (/DFS)
  • Single regions of failure, caching only at
    endpoints
  • ClearText exposed at various levels of system
  • Compromised server?? all data on server
    compromised
  • Mobile computing community Coda, Ficus, Bayou
  • Small scale, fixed coherence mechanism
  • Not optimized to take advantage of high-bandwidth
    connections between server components
  • ClearText also exposed at various levels of
    system
  • Web caching community Inktomi, Akamai
  • Specialized, incremental solutions
  • Caching along client/server path, various
    bottlenecks
  • Database Community
  • Interfaces not usable by legacy applications
  • ACID update semantics not always appropriate

5
OceanStore Assumptions
  • Untrusted Infrastructure
  • Infrastructure is comprised of untrusted
    components
  • Only cyphertext within the infrastructure
  • Must be careful to avoid leaking information
  • Mostly Well-Connected
  • Data producers and consumers are connected to a
    high-bandwidth network most of the time
  • Exploit mechanism such as multicast for quicker
    consistency between replicas
  • Promiscuous Caching
  • Data may be cached anywhere, anytime
  • Global optimization through tacit information
    collection
  • Operations Interface with Conflict Resolution
  • Applications employ an operations-oriented
    interface, rather than a file-systems interface
  • Coherence is centered around conflict resolution

6
OceanStore Technologies INaming and Data
Location
  • Requirements
  • Find nearby data without global communication
  • Dont get in way of rapid relocation of data
  • Search should reflect locality and network
    efficiency
  • System-level names should help to authenticate
    data
  • OceanStore Technology
  • Underlying namespace is flat and built from
    cryptographic signatures (160-bit SHA-1)
  • Data location is a form of gradient-search of
    local pools of data (use of attenuated
    Bloom-filters)
  • Fallback to global, exact indexing structure in
    case data not found with local search

7
Bloom Filters(brief aside)
  • Use multiple hash functions to hash each item
  • Use hash values to generate bit offsets
  • Combine bits of all items together
  • Bit vector is summary
  • To use summary, hash new value. Value is NOT
    in pool if any bit0

1
0
0
Pool Summary
1
1
1
0
Pool
8
Cascaded-Pools Hierarchy
Local Summary
Local Summary
Downward Summary
Downward Summary
Local Summary
Local Summary
Local Summary
Every pool has good randomized index structure
(such as Treaps)
9
Progress Last Term
  • Sean Rhea and Westly Weimer
  • Built data location facility on simulated network
  • Uses attenuated bloom filters
  • Performs search by passing messages from node to
    node. All state kept in messages!
  • Updates filters through semi-chaotic passing of
    information between neighbors
  • Resembles compiler dataflow algorithm
  • Can be shown to converge
  • Future?
  • Find other holographic representations of
    location
  • Whole new approach to data location?
  • Unified name service, data location, routing

10
OceanStore Technologies IIHigh-Availability and
Disaster Recovery
  • Requirements
  • Handle diverse, unstable participants in
    OceanStore
  • Eliminate backup as independent (and fallible)
    technology
  • Flexible disaster recovery for everyone
  • OceanStore Technologies
  • Use of erasure-codes (Tornado codes) to provide
    stable storage for archival copies and snapshots
    of live data
  • Mobile replicas are self-contained centers for
    logging and conflict resolution
  • Version-based update for painless recovery
  • Redundancy exploited to tolerate variation of
    performance from network servers (RIVERS)

11
Progress Last Term
  • Hakim Weatherspoon, Shelley Zhuang and Matthew
    Delco
  • Designed a storage system using erasure codes
  • Compared Reed-Solomon codes to Tornado codes
  • over 1000 to 1 performance advantage in favor of
    Tornado codes!
  • Explored different distribution and gathering
    techniques
  • Future?
  • Can this system be turned into a generic
    replacement for standard UNIX backup?
  • Transform into underlying archival piece of
    OceanStore
  • Use of Tornado codes for Rivers-like adaptation
    to variations in latency
  • Self-repairing data structures???

12
OceanStore Technologies IIIIntrospective
Monitoring and Optimization
  • Requirements
  • Reasonable job on a global-scale optimization
    problem
  • Take advantage of locality whenever possible
  • Sensitivity to limited storage and bandwidth at
    endpoints
  • Stability in chaotic environment
  • OceanStore Technologies
  • Introspective Monitoring and analysis of
    relationships
  • between different pieces of data
  • between users of a given piece of data
  • Rearrangement of data in response to monitoring
  • Economic models with analogies to simulated
    annealing
  • Sub problem of Tacit Information Analysis (option
    5)

13
Progress Last Term
  • Patrick R. Eaton, Dennis Geels and Greg Mori
  • Introspective monitoring of local file system
  • Clustering of related data together
  • Identifying of patterns for prefetching
  • Built filesystem simulation system in which to
    explore techniques
  • Byung Hoon Kang, Sarika Sahni and H. Wilson So in
    collaboration with Laurent El Ghaoui
  • Time-series extraction of patterns
  • Do people move predictably? Can we use this?
  • Future?
  • Kalman filters, hidden-Markov Models, and other
    statistical methods for automatically migrating
    data
  • More realistic traces (collaboration with Mary
    Baker?)

14
OceanStore Technologies IVRapid Update in an
Untrusted Infrastructure
  • Requirements
  • Scalable coherence mechanism which provides
    performance even though replicas widely separated
  • Operate directly on encrypted data
  • Updates should not reveal info to untrusted
    servers
  • OceanStore Technologies
  • Operations-based interface using conflict
    resolution
  • Use of incremental cryptographic techniques No
    time to decrypt/update/re-encrypt
  • Use of oblivious function techniques to perform
    this update (fallback to secure hardware in
    general case)
  • Use of automatic techniques to verify security
    protocols

15
Progress Last Term
  • Monica Chew and Chris Wells and David Bindel
  • Designed ECFS, the extended cryptographic
    filesystem
  • Explored metadata in an untrusted infrastructure
  • Uses encryption and signatures to provide
    protection against substitution attacks
  • Dawn Song, David Wagner, Doug Tygar
  • New technique for encrypting data in a way that
    is searchable
  • Could perform general grep functionality at
    server without revealing what you are searching
    for
  • Use in conflict resolution seems plausible
  • Future?
  • Key problem Denial of Service
  • Conflict resolution interfaces
  • Computation on Encrypted data?

16
Grab Bag
  • Use of Archival system to handle portions of the
    Berkeley backup?
  • To get same level of service need 12TB of
    spinning storage
  • Want it to be off site for disaster recovery
  • New Opportunity 100TB of spinning storage
  • (Brewster Kahle)
  • OceanStore as a software distribution technology
    Microsoft windows in the net?
  • Versioning mechanism for handling software
    upgrades

17
Two-Phase Implementation
  • Year I Read-Mostly Prototype
  • Construction of data location facility
  • Initial introspective gathering of tacit info and
    adaptation
  • Initial archival techniques (use of erasure
    codes)
  • Unix file-system interface under Linux (legacy
    apps)
  • Year III Full Prototype
  • Final conflict resolution and encryption
    techniques
  • More sophisticated tacit info gathering and
    rearrangement
  • Final object interface and integration with
    Endeavour applications
  • Wide-scale deployment via NTON and Internet-2
Write a Comment
User Comments (0)
About PowerShow.com