The Data Ring: Community Content Sharing - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Data Ring: Community Content Sharing

Description:

The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz) Data Sharing Communities Examples: UCSC genome browser, SwissProt ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 22
Provided by: NeoklisP7
Category:

less

Transcript and Presenter's Notes

Title: The Data Ring: Community Content Sharing


1
The Data Ring Community Content Sharing
  • Serge Abiteboul (INRIA)
  • Alkis Polyzotis (UC Santa Cruz)

2
Data Sharing Communities
Data sharing community a group of users that
share and query information within some domain
  • Examples UCSC genome browser, SwissProt, Flickr
  • Interesting data management problem
  • Shared information is heterogeneous
  • Data is distributed and dynamic
  • Lack of central administration
  • Users are not database savvy

3
The Data Ring
  • P2P middleware system that provides
  • Monitoring
  • Querying
  • and other database-like services over the
    distributed information
  • Main goal simplicity of use

4
Data abstraction in the data ring
  • Topological layer
  • Physical layer
  • External layer

5
Data abstraction in the data ring
Topological Layer
  • Declarative query services
  • Data and query model based on XML

6
Data abstraction in the data ring
Physical Layer
  • Basic service is distributed query evaluation
  • Comprises the overlay network (DHT), physical
    access structures (indices, replicas, views), and
    the catalog.

7
Data abstraction in the data ring
External Layer
  • Provides semantically richer data models

8
Data abstraction in the data ring
  • Our focus is on the topological and physical
    layer
  • External layer is equally important and an active
    research area

Topological Layer
Physical Layer
9
Thesis 1 formalism for distributed XML data and
queries
10
Distributed XML data and queries
  • What made the relational model successful
  • A logic for describing tables
  • An algebra for query optimization
  • We need the equivalent for trees in a distributed
    context
  • A logic for describing distributed XML data
  • An algebra for optimizing distributed XML queries

11
Desiderata for description logic
  • Seamless transition between data and services
  • Important for loose data integration
  • Support for XML streams
  • Streams are essential for subscription services
  • They are also necessary to support recursion

12
Starting point AXML
  • AXML XML tree with embedded web service calls
  • Seamless transition between intentional and
    extensional data
  • Provides a simple mechanism for loose data
    integration
  • Core concept XML streams
  • A web service call returns a stream of elements
  • Support for both push and pull semantics

13
Desiderata for algebra
  • Be amenable to rewrites
  • Capture the topology of distributed computation
  • Allow seamless transition between logical and
    physical state
  • Plans may need to be re-optimized in mid-flight
  • It may be necessary to perform partial
    optimization
  • Error recovery

14
A proposal based on AXML
  • A distributed plan is a workflow of web services
    which is exactly a AXML tree
  • Components
  • An encoding of distributed plans in AXML
  • Rewrite rules
  • A nice bonus plans can be readily exchanged
    between nodes

15
Disclaimer
  • AXML is a starting point, not a panacea
  • Bottom line we need formalisms for distributed
    XML queries

16
Thesis 2 autonomic administration
17
Autonomic administration
  • Users are not database experts
  • Typically, scientists with computer experience
  • Users are averse to too many knobs
  • No central authority that is responsible for
    administration
  • Autonomic administration is a necessity -- not a
    gadget

18
Facets of autonomy
  • Self-monitoring
  • Self-tuning
  • Self-healing

19
Some issues
  • System integration
  • Distribution
  • On-line tuning
  • Pro-active tuning

20
Distributed vs. local tuning
  • Distributed tuning
  • Based on the global workload
  • Catalog organization, replication
  • Local tuning
  • Based on local workload
  • Physical design tuning

21
Data activation for files
  • A large portion of the data is expected to be in
    files
  • We need to develop query processors for data
    residing in files
  • File activation optimize access to the file
    based on the local workload
  • E.g., instantiate an index on file contents or
    materialize a relational view
  • Local tuning is essential in this context
Write a Comment
User Comments (0)
About PowerShow.com