George Papastefanatos1, Panos Vassiliadis2, - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

George Papastefanatos1, Panos Vassiliadis2,

Description:

Reverse engineering of real-world ETL processes, extracted from an application ... (b) the number of activities, that have automatically been adapted by our ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 40
Provided by: non887
Category:

less

Transcript and Presenter's Notes

Title: George Papastefanatos1, Panos Vassiliadis2,


1
What-if Analysis for Data Warehouse Evolution
  • George Papastefanatos1, Panos Vassiliadis2,
  • Alkis Simitsis3 ,Yannis Vassiliou1
  • (1) National Technical University of Athens
  • gpapas,yv_at_dbnet.ece.ntua.gr
  • (2) University of Ioannina
  • pvassil_at_cs.uoi.gr
  • (3) IBM Almaden Research Center
  • asimits_at_us.ibm.com

2
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

3
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

4
Data Warehouse environment
WWW


5
ETL

WWW


6
Motivation
WWW


7
Evolving ETL sources
  • Schema Changes on the sources of ETL processes.
    Design constructs are
  • Added, Removed, Modified
  • ETL processes affected
  • Syntactically i.e., become invalid
  • Semantically i.e., query must conform to the
    new source database semantics
  • Adaptation of activities, queries and views
  • time-consuming task,treated in most of the cases
    manually by the administrators/developers

8
We would like to know
  • What part of the process is affected and how if
    e.g., an attribute is deleted?
  • Can we predict the impact of changes?
  • To what extent can readjustment be automated?
  • Can we perform what-if analysis for potential
    changes of source configurations?

9
Contribution
  • A general mechanism for performing what-if
    analysis for potential changes of ETL source
    configurations
  • A graph model for relations, queries, views, ETL
    activities, and their significant properties
  • A framework for annotating the graph with
    policies concerning the behavior of nodes in the
    presence of hypothetical changes.
  • A set of rules that dictate the proper actions,
    when additions, deletions or updates are
    performed to relations, attributes, and
    conditions
  • An experimental assessment of our proposal

10
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

11
Query representation
12
Query representation
13
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

14
Annotating the graph with adaptation rules
According to prevailing policy, the proper action
is taken ? graph transformation
15
Annotating the graph with adaptation rules
  • Assuming that a graph construct is annotated with
    a policy for a particular event (e.g., an
    activity node is tuned to deny deletions of its
    provider attributes)
  • (a) it performs the identification of the
    affected subgraph
  • (b) if the policy is appropriate, it automates
    the readjustment of the graph to fit the new
    semantics imposed by the change.

16
Query Adaptation - Example
Q SELECT EMP.Emp, EMP.Name FROM EMP
Q SELECT EMP.Emp, EMP.Name, EMP.Phone
FROM EMP
17
Algorithm Propagate changeS (PS)
18
Conflict resolution
  • Graph constructs may have contradictory policies
    for the same event

Rule
Policies defined on query graph structures are
stronger than policies defined on view graph
structures which in turn prevail on policies
defined on relation graph structures
19
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

20
Evaluation of our framework
  • Reverse engineering of real-world ETL processes,
    extracted from an application of the Greek public
    sector
  • Monitored the changes that took place to the
    sources of the studied data warehouse
  • Performed what-if analysis for several evolution
    scenarios

21
Configuration of our setting
  • 7 ETL processes

7 source tables
53 ETL activities 3 lookup tables
9 target tables
22
Configuration of our setting
  • evolution events
  • renaming source tables,
  • renaming attributes of source tables,
  • adding and deleting attributes from source
    tables,
  • modifying the domain of attributes
  • changing the primary key of lookup tables

23
Measurements
  • For each event, we counted
  • (a) the number of activities affected both
    semantically and syntactically,
  • (b) the number of activities, that have
    automatically been adapted by our framework
    (propagate or block policies) as opposed to those
  • (c) that required administrators intervention
    (i.e., a prompt policy)

24
Adapted activities w.r.t. the ETL scenario size
25
Adapted activities w.r.t. the complexity of
activities.
26
Outline
  • Motivation
  • Graph-based modeling of ETL
  • Adapting ETL workflows
  • Evaluation
  • Conclusions

27
Summary
  • A uniform representation for modeling relations,
    queries, views and ETL activities
  • A framework for annotating the graph with
    policies concerning the behavior of nodes in the
    presence of hypothetical changes
  • A set of rules that dictate the proper adaptation
    actions, when evolution events are performed to
    ETL sources
  • An experimental assessment of our proposal

28
On-going/Future Work
  • Hecataeus A tool for visualizing and performing
    what-if analysis for several evolution scenarios.
  • SQL extensions for annotating graph constructs
    with evolution semantics
  • Patterns of evolution sequences

29
Danke schön!Questions?
http//www.cs.uoi.gr/pvassil/projects/architectur
e_graph/
30
(No Transcript)
31
Back up Slides
32
Related work
  • DB schema Evolution
  • Schema Versioning
  • DW schema Evolution
  • Materialized View Evolution
  • Evolution wrt Model Mappings

33
Scenario 1
34
Evolution Metadata
  • Metadata Repository maintaining
  • Graph constructs
  • Annotations
  • What if analysis scenarios

35
Extending SQL With Evolution Semantics
  • Used for annotating graph constructs

ON TO THEN
E.g.
SELECT Emp, NAME, AGE FROM V ON condition
addition TO V THEN propagate, ON attribute
deletion TO V.AGE THEN block
36
Hecataeus
  • A tool for visualizing and performing what-if
    analysis for several evolution scenarios

37
Adapted activities per Event
38
Evolution changes occurred on source and lookup
tables
39
Annotation of graph constructs with policies for
kinds of events
Write a Comment
User Comments (0)
About PowerShow.com