Handling Evolutions in Multidimensional Structures - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Handling Evolutions in Multidimensional Structures

Description:

Star Schema: models data as a simple cube ... Q2: Split Jones's Dept. New Proposed Models. Mendelson and Vaisman. End-user needs ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 37
Provided by: jayko
Category:

less

Transcript and Presenter's Notes

Title: Handling Evolutions in Multidimensional Structures


1
Handling Evolutions in Multidimensional
Structures
  • Presented by Jay Kothari

M. Body, M. Miquel, Y. Bédard, and A.
Tchounikine, Handling Evolutions in
Multidimensional Structures, ICDE, 2003.
2
Overview
  • Multidimensional systems require gathering data
    from heterogeneous sources throughout time
  • Data is integrated in Multidimensional structures
    organized around several dimensions.
  • Structures are likely to vary over time

3
Problem for DW Designer
  • Do you keep the trace of the evolutions?
  • Do you map all data in a given version of the
    structure?

4
Tradeoffs for Problems
  • Do you keep the trace of the evolutions?
  • Limits the capability of comparison for analysts
  • Do you map all data in a given version of the
    structure?
  • Entails alteration and (potential) loss of data

5
Goal
  • Tack History
  • Compare data mapped into static structures

6
Introduction DW
  • DW single site repository of information
    collected from multiple sources where the
    information is organized around major subjects
  • DM Requirements
  • Must be easy for the end-user to understand
  • Must maximize efficiency of the queries

7
DW Models
  • Multidimensional, or hypercube
  • Represent measurable facts and the dimensions
    that characterize the facts
  • Retail Example
  • Facts price, amount of a purchase
  • Dimensions product, location, time, customer
  • Location City, State, Country
  • Star Schema models data as a simple cube
  • Hierarchical relationship is not explicit, but
    encapsulated
  • Snowflake Schema normalizes dimension tables
  • Represent hierarchies by identifying dimensions
    in various granularities
  • Galaxy fact constellation, allows for the
    collection of stars when multiple fact tales are
    needed

8
Multi-tier OLAP Architecture
  • Tier 1 Warehouse Server
  • Tier 2 Data Mart
  • Tier 3 OLAP Server
  • Tier 4 OLAP Client

9
OLAP Architecture Warehouse
  • Tier 1 Warehouse Server
  • Tier 2 Data Mart
  • Tier 3 OLAP Server
  • Tier 4 OLAP Client

10
Multi-tier OLAP Architecture
  • Tier 1 Warehouse Server
  • Tier 2 Data Mart
  • Tier 3 OLAP Server
  • Tier 4 OLAP Client

11
Multi-tier OLAP Architecture
  • Tier 1 Warehouse Server
  • Tier 2 Data Mart
  • Tier 3 OLAP Server
  • Tier 4 OLAP Client

12
Multi-tier OLAP Architecture
  • Tier 1 Warehouse Server
  • Tier 2 Data Mart
  • Tier 3 OLAP Server
  • Tier 4 OLAP Client

13
Considering Time
  • Multidimensional models consider
  • Facts are dynamic
  • Dimensions are static
  • Time is recorded as a Time-dimension
  • Often, changes occur on analysis structure

14
Considering Evolution Mapping
  • OLAP systems map data into most recent analysis
    structure
  • Allows for temporal comparison
  • Loses data

15
3 Slowly Changing Dimensions
  • Three possible ways of handling changes in a
    structure
  • Update data structure
  • Keep all versions of members
  • All evolutions are kept inside members

16
Problems with SCDs
  • Three possible ways of handling changes in a
    structure
  • Update data structure
  • Doesnt track history
  • Keep all versions of members
  • Comparisons across transitions cannot be made,
    since links are not kept, even if evolutions are
  • All evolutions are kept inside members
  • Overlapping between versions may occur and cannot
    be handled

17
Motivational Case Study
  • Studying Institution Restructuring
  • Fact table
  • Amount Measure
  • Hierarchical Time measure year
  • Hierarchical Org. dimensiondiv.gtdept.

18
Dimensions and Data
19
Query 1 Total Amount by Year/Div.
20
Q2 Split Joness Dept
21
New Proposed Models
  • Mendelson and Vaisman
  • End-user needs
  • temporal multidimensional model based on
    timestamps
  • Define TOLAP that allows the end user to choose
    aggregation
  • Allows end-user to choose b/w temporally
    consistent or last version
  • Track links b/w transitions
  • Only allows for last version reporting
  • Eder and Koncilia
  • Mapping functions to allow conversions b/w
    structure versions
  • Store links across transitions (old and new
    member)
  • Does not consider schema evolutions nor time
    consistent presentation

22
This approach..
  • Handle Evolutions in multidimensional structures
  • Solutions to exploit knowledge on evolutions in
    order to map data in a given representation

23
Changes occurring on a multidimensional schema
- Creation and deletion of a dimension. -
Creation and deletion of a hierarchy. - Creation
and deletion of a level. - Move of a level in
the hierarchical schema structure. - Creation
and deletion of a measure.
24
Evolutions on Instances of Dimensions 6 Operators
  • Creation of a dimension member.
  • Deletion of a dimension member.
  • Transformation of a member (change of an
    attribute, its name or meaning...).
  • Merging of n members into one member.
  • Splitting of one member into n members.
  • Reclassification of a member in the dimension
    structure.

25
Deducing Hierarchical Models
  • Deduced from dimension instances
  • No longer necessary to define schema structure
  • Current MD models do not support complex
    hierarchical structures, and by not imposing a
    structure, the model gains genericity
  • Consider non-onto, non-covering and multiple
    hierarchies
  • Every evolution on schema has an impact on its
    instance
  • Accounts both evolutions on schema and instances

26
Temporal Multidimensional Model
  • Definition 1 Member Version
  • Object or an abstract entity of special interest
  • Definition 2 Temporal Relationship
  • Establishes explicit hierarchy link between two
    member versions and represents a rill-up function
  • Definition 3 Temporal Dimension
  • Directed graph where nodes are member versions of
    a set and the arcs are the relationships of the
    Temporal Relationships
  • Definition 4 Levels in a dimensions
  • Set of member versions having the same level
    value if defined, or the depth in the DAG of a
    Temporal Dimension
  • Shows how the temporal dimension evolved over
    time

27
Temporal Multidimensional Model 2
  • Definition 5 Temporally Consistent Fact Table
  • Keeps links across transitions given by the
    temporal dimensions, time dimension, and set of
    measures
  • Definition 6 Confidence Factor
  • Reliability of Data
  • Used to distinguish between source and mapped
    data
  • Definition 7 Mapping Relationship
  • Allows for dealing with the problem of keeping
    links between transitions
  • Example Gives the mapping from the previous
    department problem of percentages
  • Definition 8 Temporal Multidimensional Schema
  • Relates Temporal Dimension, Time Dimension,
    Mapping Relationships, and a Temporally
    consistent Fact Table
  • Definition 9 Structure Version
  • Defines valid time of a structure version and
    restricts it to its elements
  • Valid and and unchanged structure over its given
    valid time
  • Example What structures in the case study are
    valid during a given time frame (Paul and Bill)
    b/w 01/01 and 12/02

28
Last set of Definitions
  • Definition 10 Temporal mode or Presentation
  • Satisfies end-user needs of presentation of
    requests
  • Definition 11 Multiversion Fact Table
  • Calculated from temporal dimensions, Mapping
    Relations, and Temporally consistent fact table
  • Definition 12 Data Aggregation
  • Computes DA of a cube based on MultiVersion Fact
    Table and Temporal Relationships between Member
    Versions.

29
Evolution Operators
  • Insert
  • Inserts member version
  • Exclude
  • Excludes member versions from a dimension
  • Associate
  • Defines a new transition link b/w two versions of
    a member, and if consistent then it will be added
    to the set of Mapping Relationships
  • Reclassify
  • Changes the position of a member in the
    hierarchical structure by redefining its parents.

30
Using Evo Operations
31
Logical Temporal Model
  • De Facto Multi Dim systems are only made up of
    dimensions and fact tables
  • Temporal mode of representation
  • Similar to a dimension and a confidence factor
    appearing as a measure

32
More on the Evo Ops
  • Difference b/w member and relationships in a
    dimension
  • First problem with current tools Hierarchical
    links are stored as keys in child attributes.
  • Solution Interpreting change in hierarchical
    structure
  • Problem Lots of changes O(bd)
  • -gtso they create a new Reclassify operatorwhich
    inserts, excludes and associates.

33
Their Implementation (architecture)
  • 3 Parts
  • Temporal Datawarehouse
  • Contains the Temporal Multidimensional Schema
  • Mapping Relationships
  • Multiversion Datawarehouse
  • TMP can be preceeded and Multiversion fact table
    can be inferred from the Temporally Consistent
    fact table and Mapping Relations
  • OLAP cube
  • Built from the MVDW using aggregations and allows
    requests to integrate the temporal models of
    presentation

34
Adding MetaData
  • Categories
  • Related to versions of members
  • Related to evolution of members
  • Example of Type 2

35
Table of Mapping relationships
  • Used to build the MultiVer Fact Table
  • User has User can obtain transformations

36
Conclusion
  • Focus on designers needs
  • Handling the most evo ops
  • Allowing to have nontrivial hierarchical
    structures in dimensions to match with real life
    cases
  • Big contribution
  • TMP
  • Changes in analysis Framework
  • Confidence facotrs
  • Still flawed
Write a Comment
User Comments (0)
About PowerShow.com