Title: Load Shedding in XML Streams
1 Adding Intelligence to the
Optimization in Data Integration System Di
Wang Advisor Prof. Murali Mani Database Systems
Research Group (DSRG), Department of Computer
Science
Our Approach Divide the join operation
Data Integration System
Intuitionally, many join algorithms have
naturally two phases partition probe
Typical Data Integration System Architecture
- Application areas of data integration
- Enterprise information integration ()
- Data sources on the web
- Scientific data sharing
-
Application
Query/ Browser
Metadata/ Catalog
Mediator
Rough Idea The wrappers do the partition
phase, The mediator do the probe phase.
- Many heterogeneous database systems
- Information Manifold
- TSIMMIS
- Garlic
- Tukwila
-
Wrapper
Wrapper
Wrapper
Wrapper
Relation Database
Complex Object Repository
Document System
Bio-Info Database
- Target Scenarios
- Some non-database sources can not do
probing but only basic operations - The mediator/ wrapper is already
heavy-loaded, while it is required perform join
As part of data integration, our focus is on
system Performance and Optimization
Motivation Two classes of optimization
Based on the assumptions of integration system
Tight-federated data sources Smart
mediator -- Assume that optimizer can
have accurate information of
sources with wrappers support --
Optimizer select the best query plan based on
series of cost formulas and computation
Network-bound data sources Thin mediator --
Assume that statistics of data sources are
unavailable, and data arrival is
unpredictable -- Bunches of adaptive
technologies are used to optimize the query
plan during execution
2
1
Could produce a blind or inefficient initial plan
?
Highly relies on wrappers Cost of costing
Question How to tradeoff the cost of cost
model computation and the inefficiency of blind
initial plan?