AutoMed: Automatic generation of Mediator tools for heterogeneous database integration

About This Presentation
Title:

AutoMed: Automatic generation of Mediator tools for heterogeneous database integration

Description:

GAV and LAV view definitions can be derived from a BAV specification ... schema is an extension of the original schema, then domain knowledge may be ... –

Number of Views:64
Avg rating:3.0/5.0
Slides: 28
Provided by: Poulova
Category:

less

Transcript and Presenter's Notes

Title: AutoMed: Automatic generation of Mediator tools for heterogeneous database integration


1
AutoMed Automatic generation of Mediator tools
for heterogeneous database integration
  • Alex Poulovassilis (Birkbeck College)
  • Joint project with Peter McBrien (Imperial
    College)
  • EPSRC Grants GR/N38107, GR/N35915

2
Integrated Schema
Schema
Schema
Schema
3
Background
  • In earlier work (ER97, IS98, DKE98) we
    developed a new framework to support
    transformation and integration of heterogeneous
    database schemas.
  • Our framework consisted of
  • a new notion of schema equivalence
  • a set of primitive schema transformations which
    can be composed to define unconditional or
    conditional equivalences between schemas

4
Background
  • We represent the modelling constructs of
    higher-level data models (e.g. relational,
    object-oriented, semi-structured, XML) in terms
    of a hypergraph data model (HDM)
  • The HDM common data model provides a unifying
    semantics for such higher-level modelling
    constructs

5
Background
  • Our schema transformations allow constructs from
    different modelling languages to be mixed within
    the same intermediate schema (CAiSE99)
  • Our schema transformations are automatically
    reversible, setting up a two-way transformation
    pathway between pairs of schemas

6
(No Transcript)
7
(No Transcript)
8
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category

9
  • addSubClass Film Prog
  • addSubClass Doc Prog
  • addSubClass Series Prog
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category

10
  • addSubClass Film Prog
  • addSubClass Doc Prog
  • addSubClass Series Prog
  • addClass Series p(p,S)?category
  • addClass Doc p(p,D)?category
  • addClass Film p(p,F)?category
  • addClass Prog p(p,c)?category
  • delRel category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series

11
  • delSubClass Film Prog
  • delSubClass Doc Prog
  • delSubClass Series Prog
  • delClass Series p(p,S)?category
  • delClass Doc p(p,D)?category
  • delClass Film p(p,F)?category
  • delClass Prog p(p,c)?category
  • addRel category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series

12
  • addConstraint subset Film Prog
  • addConstraint subset Doc Prog
  • addConstraint subset Series Prog
  • addNode Series p(p,S)?category
  • addNode Doc p(p,D)?category
  • addNode Film p(p,F)?category
  • addNode Prog p(p,c)?category
  • delEdge category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series
  • delNode Programme Prog
  • delNode Category F,D,S

13
  • delConstraint subset Film Prog
  • delConstraint subset Doc Prog
  • delConstraint subset Series Prog
  • delNode Series p(p,S)?category
  • delNode Doc p(p,D)?category
  • delNode Film p(p,F)?category
  • delNode Prog p(p,c)?category
  • addEdge category (p,F)p?Film U

  • (p,D)p?Doc U

  • (p,S)p?Series
  • addNode Programme Prog
  • addNode Category F,D,S

14
Query and Data Translation
  • These pathways can thus be used to automatically
    translate data and queries between schemas
    (ER99)
  • From a pathway TS gt S we
  • compose the queries in the add steps to derive a
    definition of each construct in S as a view over
    S, and
  • compose the queries in the del steps to derive a
    definition of each construct in S as a view over
    S

15
Query and Data Translation
  • Thus
  • Prog p (p,c)?category
  • Film p(p,F)?category
  • Doc p(p,D)?category
  • Series p(p,S)?category
  • category (p,F)p?Film U (p,D)p?Doc U
    (p,S)p?Series
  • These view definitions can then be used to
    automatically translate data and queries between
    S and S

16
Both-As-View integration
  • Our schema transformation pathways capture at
    least the information available from
    global-as-view (GAV) or local-as-view (LAV)
  • We discuss this in a forthcoming paper (ICDE03)
    and term our integration approach both-as-view
    (BAV)
  • In particular, we discuss how
  • GAV and LAV view definitions can be derived from
    a BAV specification
  • a BAV specification can be partially derived from
    a set of GAV or LAV view definitions

17
Schema Evolution
  • Unlike GAV and LAV, our framework readily
    supports the evolution of both local and global
    schemas (CAiSE02, ICDE03)
  • The first step is to define the evolution of the
    global or local schema as a schema transformation
    pathway from the old to the new schema
  • There is then a systematic way of evolving, as
    opposed to re-generating, the transformation
    pathways and perhaps the global schema in the
    case of a local schema evolution

18
Schema Evolution
  • In particular (see CAiSE02 and ICDE03 for
    details)
  • if the evolved schema is semantically equivalent
    to the original schema, then the transformation
    network can be repaired automatically
  • if the evolved schema is a contraction of the
    original schema, the transformation network can
    again be repaired automatically
  • if the evolved schema is an extension of the
    original schema, then domain knowledge may be
    required (but again the network is evolved rather
    than regenerated)

19
The AutoMed Project
  • The aims of the AutoMed project are to
    investigate
  • how our theoretical framework can be practically
    applied real data integration problems
  • how much of a mediators global query processing
    functionality can be automatically generated from
    our transformation pathways
  • evolutionary and heuristic techniques for schema
    improvement and global query optimisation

20
AutoMed Architecture
Schema and Transformation Repository
Schema Transformation and Integration Tool
Global Query Processor
Global Query Optimiser
Model Definitions Repository
Model Definition Tool
Schema Evolution Tool
21
Query Processing and Optimisation
  • We are handling query language heterogeneity by
    translation into/from a functional intermediate
    query language IQL Edgar Jasper
    (BNCOD02 poster, BNCOD02 summer school paper)
  • A query Q expressed in a high-level query
    language on a global schema S is first translated
    into IQL
  • GAV view definitions are derived from the
    transformation pathways from the local schemas to
    S, and are used to reformulate the query into an
    IQL query over the local schema constructs
  • A LAV query processing approach would also be
    possible

22
Query Processing and Optimisation
  • Query optimisation and query evaluation then
    occur
  • Specific issues for query optimisation in AutoMed
    include
  • optimising the view definitions derived from the
    transformation pathways, and
  • handling heterogeneous modelling constructs
    appearing within these view definitions
  • For query evaluation, wrappers will undertake
    translation of IQL sub-queries into the local
    query language, and translation of results back
    into the IQL type system. Further
    post-processing is possible.

23
XML Data Sources
  • As well as integration of structured data
    sources, we have done some preliminary work on
    translating and integrating XML data (CAiSE01)
  • We have defined a representation of XML in terms
    of the nodes, edges and constraints of the HDM
  • We capture the ordering of XML elements by an
    order node and a hyperedge to it from the edge
    representing the parent-child relationship

24
Translating XML into HDM
  • ltcustomer nameJonesgt
  • ltaccount numberA14/gt
  • ltaccount numberB37/gt
  • lt/customergt
  • ltcustomer nameSmithgt
  • ltaccount numberC514/gt
  • ltaccount numberD438/gt
  • lt/customergt

root
order
customer
name
order
number
account
25
XML Data Sources
  • We have also defined a set of primitive
    transformations on XML (in terms of the
    underlying transformations on the equivalent HDM
    representation)
  • XML documents are then translated into a simple
    ER representation, which allows them to be
    integrated with each other and with other
    structured data sources
  • The above work has been implemented by Tanvir
    Faqueer
  • He is now looking at automatic or semi-automatic
    transformation and integration of the ER models
    arising from XML documents

26
Unstructured Text Sources
  • We are also working on extracting structure from
    unstructured text sources Dean Williams
  • The aim here is to integrate information
    extracted from unstructured text with structured
    or semi-structured information available from
    other sources
  • We are using existing IE technology (the GATE
    tool from Sheffield) for text annotation.
    Natural language and domain ontologies will be
    used to extend these annotations
  • The extracted information will be matched with
    existing information in order to derive new facts
    and perhaps new global schema constructs

27
Materialised integration
  • Finally, as well as virtual integration of data
    sources, we are also investigating using the
    AutoMed framework for materialised integration
    i.e. a data warehousing approach
  • In particular, we are looking at incremental view
    maintenance and data lineage tracing using the
    AutoMed schema transformation pathways Hao Fan
Write a Comment
User Comments (0)
About PowerShow.com