Rethinking Data Integration - PowerPoint PPT Presentation

1 / 6
About This Presentation
Title:

Rethinking Data Integration

Description:

In the 90s, our goals were to bring semantic ... Challenge 1: Building Data-Centric Webs for Sharing ... Auto-build mediated schemas as 'containers for sharing' ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 7
Provided by: zac58
Category:

less

Transcript and Presenter's Notes

Title: Rethinking Data Integration


1
Rethinking Data Integration
  • Zachary G. Ives
  • University of Pennsylvania
  • WebDB
  • July 16, 2005

2
Have We Missed the Boat Again?(With Apologies to
Jennifer Widom)
  • In the 90s, our goals were to bring semantic
    querying to the Web
  • Today data integration is finally a reality as
    EII
  • Exciting to see it commercialized by IBM, BEA,
  • But it has mostly reverted to our communitys
    traditional focus business databases and
    queries
  • Why arent there any large-scale integration
    successes?
  • Yes, partly a matter of
  • But its also a limitation of our top-down,
    schema first mentality
  • Developing a global schema works well in business
    data
  • It doesnt scale to diverse and evolving domains!
  • Also people like to control their data!
  • Availability, guaranteed performance, ability to
    change things

3
Catching the Ever-Elusive BoatBottom-up
Approaches to Data Sharing
  • The WWW succeeded by avoiding consistency and
    standardization
  • Data sharing confederations without first
    standardizing on a global schema
  • A Web of loosely coupled, autonomous,
    cooperating systems
  • Map among many small schemas instead of one
    large one
  • Different schemas represent different interests
  • Schemas as building blocks, not endpoints
  • Can most easily map to semantically related
    schemas
  • A first step peer-to-peer data management and
    mediation (Piazza, Hyperion, P2P data exchange,
    )
  • Pick any schema, query from it
  • Use transitive closure of all mappings to all
    relevant data

4
Challenge 1 Building Data-Centric Webs for
Sharing
  • People want data control and query anonymity!
  • Focus on data interchange, not just integrated
    querying
  • Schema building blocks, analogous to component
    frameworks
  • Composable merge operators (Bernstein, Pottinger,
    )
  • Auto-build mediated schemas as containers for
    sharing
  • Build confederations via collaborative filtering?
    (Doan)
  • Collaborative data sharing multi-way exchange
    of data among many autonomous collaborators
    (Orchestra see CIDR 05)
  • Updates at different sites, different schemas
    need to reconcile the changes
  • Tolerate not resolve the inevitable
    conflicts, based on provenance
  • Looser consistency models

5
Orchestra CIDR05
  • Different bioinformatics institutes, research
    groups store their data in separate warehouses
    with related, overlapping data
  • Each source is independently updated, curated
    locally
  • Updates are published periodically in some
    standard schema
  • Each site wants to import these changes, maintain
    a copy of all data
  • Individual scientists also import the data and
    changes, and would like to share their derived
    results
  • Caveat not all sites agree on the facts! Not
    probabilistic often, no consensus on the
    right answer!

6
Challenge 2 Seeing How the DataConnects
Together
  • New visualization methods for metadata and
    mappings
  • Most schema matching is semi-automated we need
    user input
  • Clio is a nice first step, but doesnt scale to
    large schemas
  • Need better schema/mapping visualization
    techniques than tree viewers maybe 3D?
  • Open a dialog with the CHI community?
Write a Comment
User Comments (0)
About PowerShow.com