Title: The Data Ring: Community Content Sharing
1The Data Ring Community Content Sharing
- Serge Abiteboul (INRIA)
- Alkis Polyzotis (UC Santa Cruz)
2Data Sharing Communities
Data sharing community a group of users that
share and query information within some domain
- Examples UCSC genome browser, SwissProt, Flickr
- Interesting data management problem
- Shared information is heterogeneous
- Data is distributed and dynamic
- Lack of central administration
- Users are not database savvy
3The Data Ring
- P2P middleware system that provides
- Monitoring
- Querying
- and other database-like services over the
distributed information - Main goal simplicity of use
4Data abstraction in the data ring
- Topological layer
- Physical layer
- External layer
5Data abstraction in the data ring
Topological Layer
- Declarative query services
- Data and query model based on XML
6Data abstraction in the data ring
Physical Layer
- Basic service is distributed query evaluation
- Comprises the overlay network (DHT), physical
access structures (indices, replicas, views), and
the catalog.
7Data abstraction in the data ring
External Layer
- Provides semantically richer data models
8Data abstraction in the data ring
- Our focus is on the topological and physical
layer - External layer is equally important and an active
research area
Topological Layer
Physical Layer
9Thesis 1 formalism for distributed XML data and
10Distributed XML data and queries
- What made the relational model successful
- A logic for describing tables
- An algebra for query optimization
- We need the equivalent for trees in a distributed
context - A logic for describing distributed XML data
- An algebra for optimizing distributed XML queries
11Desiderata for description logic
- Seamless transition between data and services
- Important for loose data integration
- Support for XML streams
- Streams are essential for subscription services
- They are also necessary to support recursion
12Starting point AXML
- AXML XML tree with embedded web service calls
- Seamless transition between intentional and
extensional data - Provides a simple mechanism for loose data
integration - Core concept XML streams
- A web service call returns a stream of elements
- Support for both push and pull semantics
13Desiderata for algebra
- Be amenable to rewrites
- Capture the topology of distributed computation
- Allow seamless transition between logical and
physical state - Plans may need to be re-optimized in mid-flight
- It may be necessary to perform partial
optimization - Error recovery
14A proposal based on AXML
- A distributed plan is a workflow of web services
which is exactly a AXML tree - Components
- An encoding of distributed plans in AXML
- Rewrite rules
- A nice bonus plans can be readily exchanged
between nodes
- AXML is a starting point, not a panacea
- Bottom line we need formalisms for distributed
XML queries
16Thesis 2 autonomic administration
17Autonomic administration
- Users are not database experts
- Typically, scientists with computer experience
- Users are averse to too many knobs
- No central authority that is responsible for
administration - Autonomic administration is a necessity -- not a
18Facets of autonomy
- Self-monitoring
- Self-tuning
- Self-healing
19Some issues
- System integration
- Distribution
- On-line tuning
- Pro-active tuning
20Distributed vs. local tuning
- Distributed tuning
- Based on the global workload
- Catalog organization, replication
- Local tuning
- Based on local workload
- Physical design tuning
21Data activation for files
- A large portion of the data is expected to be in
files - We need to develop query processors for data
residing in files - File activation optimize access to the file
based on the local workload - E.g., instantiate an index on file contents or
materialize a relational view - Local tuning is essential in this context