Integrating data sources on the World-Wide Web - PowerPoint PPT Presentation

About This Presentation
Title:

Integrating data sources on the World-Wide Web

Description:

ABC Database RIM Spec. Integration Example: XYZ Database RIM Spec. Integration Example: ... is involved in constructing a data warehouse and other operational systems. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 25
Provided by: RamonLa4
Category:

less

Transcript and Presenter's Notes

Title: Integrating data sources on the World-Wide Web


1
Integrating data sources on the World-Wide Web
  • Ramon Lawrence and Ken Barker
  • U. of Manitoba, U. of Calgary
  • umlawren,barker_at_cs.umanitoba.ca

2
Introduction
  • Integration of data is required when accessing
    multiple databases within an organization or on
    the WWW.
  • Our focus is automatically combining database
    schema using schema integration.
  • Schema integration requires knowledge of data
    semantics and use of metadata.

3
Motivation
  • Organizations have several database systems which
    must interoperate.
  • Users often access multiple Web databases whose
    knowledge must be integrated and presented in a
    useful form.
  • Data warehouses and OLAP systems require data
    semantics to be understood and data to be
    cleansed and summarized.

4
Background
  • Schema integration involves combining diverse
    database schema into an integrated view by
    resolving conflicts.
  • Schema conflicts include naming, structural, and
    semantic conflicts.
  • Schema integration is required for database
    interoperability, but it is currently a manual
    process.

5
Previous Work
  • Research systems
  • integrating systems by logical rules (Sheth)
  • defining global dictionaries (Castano)
  • Carnot Project using the Cyc knowledge base
  • Industrial systems and standards
  • Metadata Interchange Specification (MDIS)
  • XML, BizTalk, E-commerce portals

6
Architecture Components The Global Dictionary
  • A global dictionary (GD) provides standardized
    terms to capture data semantics.
  • Hierarchy of terms related by IS-A or Has-A links
  • Contains base set of common database concepts,
    but new concepts can be added
  • A GD term is a single, unambiguous semantic
    definition.
  • Several GD entries for a single English word are
    required if the word has multiple definitions.

7
Architecture ComponentsUsing the Global
Dictionary
  • GD terms are used to build semantic names to
    describe the semantics of schema elements.
  • Semantic names have the form
  • semantic name CT CT ,CT CN
  • CT context term, CN concept name
  • each CT and CN is a single term from the GD
  • Semantic names are included in RIM specifications
    describing a data source.

8
Architecture ComponentsThe Relational
Integration Model
  • Database metadata and semantic names are combined
    into Relational Integration Model (RIM)
    Specifications (RIM Specs)
  • contains information on a relational schema
  • organized into database, table, and field levels
  • stores semantic names to describe and integrate
    schema elements

9
Architecture ComponentsIntegrating RIM Specs
  • Each database to be integrated is described using
    a RIM specification.
  • Identical concepts in different databases are
    identified by similar semantic names.
  • Concepts with identical (or hierarchially
    related) semantic names are combined regardless
    of their physical representation in the
    individual databases.

10
Integration Architecture
  • Our integration architecture consists of two
    separate phases
  • capture process RIM specs are constructed for
    each data source independently
  • integration process RIM specs are combined using
    the integration algorithm which matches semantic
    names using the global dictionary

11
Integration ArchitectureThe Capture Process
  • Capture process involves
  • automatically extracting the schema information
    and metadata using a specification editor
  • assigning semantic names to each schema element
    (tables and fields) to capture their semantics

12
Integration ArchitectureThe Capture Process
Relational Schema
Automatic Extraction
RIM Spec
Specification Editor
DBA Lookup of terms
Global Dictionary
13
Integration ArchitectureThe Integration Process
  • Integration process involves
  • automatically identifying identical concepts by
    matching semantic names
  • constructing a global view of database concepts
    consisting of a hierarchy of concept terms
  • resolving structural differences during query
    generation and submission (e.g. a concept may be
    represented as a table in one database and a
    field (attribute) in another)

14
Integration ArchitectureThe Integration Process
.
Client
Client
Integration Site
Subtransactions
RIM spec
RIM spec
..
RDBMS
RDBMS
15
Integration Architecture Benefits
  • The benefits of the two phase architecture are
  • Dynamic integration schemas integrated as needed
  • RIM Specs are constructed only once and
    independent of each other
  • Automatic conflict resolution by integrating
    based on semantic name rather than physical
    structure
  • Users are isolated from system names and
    organization by querying through a global view
    using semantic names for concepts

16
Integration Example
  • Two claims databases to be integrated
  • ABC Company Claims_tb(claim_id, claimant,
    net_amount, paid_amount)
  • XYZ Company T_claims(id, customer, claim_amt),
    T_payments(cid, pid, amount)
  • First step is to construct RIM specs for each
    database.

17
Integration ExampleABC Database RIM Spec
18
Integration ExampleXYZ Database RIM Spec
19
Integration ExampleIntegrated View
  • Global view after integration
  • Claim
  • Id
  • Net amount
  • Customer
  • name
  • Payment
  • id
  • amount

20
Integration ExampleDiscussion
  • Important points
  • system and field names are not presented to the
    user who queries based on semantic names
  • database structure is not shown to the user
  • different physical representations for the same
    concept are combined (e.g. payment (attribute) in
    ABC with payment table in XYZ database)
  • hierarchially related concepts (customer vs.
    claimant) are combined based on their IS-A
    relationship in the global dictionary

21
Applications to the WWW
  • Integrating diverse data sources is involved in
    constructing a data warehouse and other
    operational systems.
  • The WWW is a diverse organizations of databases
    which users access.
  • Automatically integrating web data sources by a
    browser or portal reduces query complexity and
    integration of results for the user.

22
Conclusions
  • Automatic integration of database schema is
    possible by using a global dictionary of terms
    and constructing semantic names for schema
    elements.
  • Integration of data sources has applications to
    the WWW and construction of data warehouses.

23
Important Changes
  • The integration architecture is constantly being
    refined. Some notable differences in this
    presentation versus the paper
  • Our integration system uses XML to represent a
    RIM spec which is renamed as a X-Spec.
  • An integration site is used as a central portal
    for integration and management.
  • No longer using semantic distance calculations
    between terms.
  • Format of semantic name has been simplified.

24
Future Work
  • The integration architecture is involving with
    standards on XML and now captures metadata
    information in XML documents.
  • The system is being tested on sample problems,
    and a query mechanism is work-in-progress.
  • We are refining a prototype of the system called
    Unity.
Write a Comment
User Comments (0)
About PowerShow.com