Title: Handling Evolutions in Multidimensional Structures
1Handling Evolutions in Multidimensional
Structures
M. Body, M. Miquel, Y. Bédard, and A.
Tchounikine, Handling Evolutions in
Multidimensional Structures, ICDE, 2003.
2Overview
- Multidimensional systems require gathering data
from heterogeneous sources throughout time - Data is integrated in Multidimensional structures
organized around several dimensions. - Structures are likely to vary over time
3Problem for DW Designer
- Do you keep the trace of the evolutions?
- Do you map all data in a given version of the
structure?
4Tradeoffs for Problems
- Do you keep the trace of the evolutions?
- Limits the capability of comparison for analysts
- Do you map all data in a given version of the
structure? - Entails alteration and (potential) loss of data
5Goal
- Tack History
- Compare data mapped into static structures
6Introduction DW
- DW single site repository of information
collected from multiple sources where the
information is organized around major subjects - DM Requirements
- Must be easy for the end-user to understand
- Must maximize efficiency of the queries
7DW Models
- Multidimensional, or hypercube
- Represent measurable facts and the dimensions
that characterize the facts - Retail Example
- Facts price, amount of a purchase
- Dimensions product, location, time, customer
- Location City, State, Country
- Star Schema models data as a simple cube
- Hierarchical relationship is not explicit, but
encapsulated - Snowflake Schema normalizes dimension tables
- Represent hierarchies by identifying dimensions
in various granularities - Galaxy fact constellation, allows for the
collection of stars when multiple fact tales are
needed
8Multi-tier OLAP Architecture
- Tier 1 Warehouse Server
- Tier 2 Data Mart
- Tier 3 OLAP Server
- Tier 4 OLAP Client
9OLAP Architecture Warehouse
- Tier 1 Warehouse Server
- Tier 2 Data Mart
- Tier 3 OLAP Server
- Tier 4 OLAP Client
10Multi-tier OLAP Architecture
- Tier 1 Warehouse Server
- Tier 2 Data Mart
- Tier 3 OLAP Server
- Tier 4 OLAP Client
11Multi-tier OLAP Architecture
- Tier 1 Warehouse Server
- Tier 2 Data Mart
- Tier 3 OLAP Server
- Tier 4 OLAP Client
12Multi-tier OLAP Architecture
- Tier 1 Warehouse Server
- Tier 2 Data Mart
- Tier 3 OLAP Server
- Tier 4 OLAP Client
13Considering Time
- Multidimensional models consider
- Facts are dynamic
- Dimensions are static
- Time is recorded as a Time-dimension
- Often, changes occur on analysis structure
14Considering Evolution Mapping
- OLAP systems map data into most recent analysis
structure - Allows for temporal comparison
- Loses data
153 Slowly Changing Dimensions
- Three possible ways of handling changes in a
structure - Update data structure
- Keep all versions of members
- All evolutions are kept inside members
16Problems with SCDs
- Three possible ways of handling changes in a
structure - Update data structure
- Doesnt track history
- Keep all versions of members
- Comparisons across transitions cannot be made,
since links are not kept, even if evolutions are - All evolutions are kept inside members
- Overlapping between versions may occur and cannot
be handled
17Motivational Case Study
- Studying Institution Restructuring
- Fact table
- Amount Measure
- Hierarchical Time measure year
- Hierarchical Org. dimensiondiv.gtdept.
18Dimensions and Data
19Query 1 Total Amount by Year/Div.
20Q2 Split Joness Dept
21New Proposed Models
- Mendelson and Vaisman
- End-user needs
- temporal multidimensional model based on
timestamps - Define TOLAP that allows the end user to choose
aggregation - Allows end-user to choose b/w temporally
consistent or last version - Track links b/w transitions
- Only allows for last version reporting
- Eder and Koncilia
- Mapping functions to allow conversions b/w
structure versions - Store links across transitions (old and new
member) - Does not consider schema evolutions nor time
consistent presentation
22This approach..
- Handle Evolutions in multidimensional structures
- Solutions to exploit knowledge on evolutions in
order to map data in a given representation
23Changes occurring on a multidimensional schema
- Creation and deletion of a dimension. -
Creation and deletion of a hierarchy. - Creation
and deletion of a level. - Move of a level in
the hierarchical schema structure. - Creation
and deletion of a measure.
24Evolutions on Instances of Dimensions 6 Operators
- Creation of a dimension member.
- Deletion of a dimension member.
- Transformation of a member (change of an
attribute, its name or meaning...). - Merging of n members into one member.
- Splitting of one member into n members.
- Reclassification of a member in the dimension
structure.
25Deducing Hierarchical Models
- Deduced from dimension instances
- No longer necessary to define schema structure
- Current MD models do not support complex
hierarchical structures, and by not imposing a
structure, the model gains genericity - Consider non-onto, non-covering and multiple
hierarchies - Every evolution on schema has an impact on its
instance - Accounts both evolutions on schema and instances
26Temporal Multidimensional Model
- Definition 1 Member Version
- Object or an abstract entity of special interest
- Definition 2 Temporal Relationship
- Establishes explicit hierarchy link between two
member versions and represents a rill-up function - Definition 3 Temporal Dimension
- Directed graph where nodes are member versions of
a set and the arcs are the relationships of the
Temporal Relationships - Definition 4 Levels in a dimensions
- Set of member versions having the same level
value if defined, or the depth in the DAG of a
Temporal Dimension - Shows how the temporal dimension evolved over
time
27Temporal Multidimensional Model 2
- Definition 5 Temporally Consistent Fact Table
- Keeps links across transitions given by the
temporal dimensions, time dimension, and set of
measures - Definition 6 Confidence Factor
- Reliability of Data
- Used to distinguish between source and mapped
data - Definition 7 Mapping Relationship
- Allows for dealing with the problem of keeping
links between transitions - Example Gives the mapping from the previous
department problem of percentages - Definition 8 Temporal Multidimensional Schema
- Relates Temporal Dimension, Time Dimension,
Mapping Relationships, and a Temporally
consistent Fact Table - Definition 9 Structure Version
- Defines valid time of a structure version and
restricts it to its elements - Valid and and unchanged structure over its given
valid time - Example What structures in the case study are
valid during a given time frame (Paul and Bill)
b/w 01/01 and 12/02
28Last set of Definitions
- Definition 10 Temporal mode or Presentation
- Satisfies end-user needs of presentation of
requests - Definition 11 Multiversion Fact Table
- Calculated from temporal dimensions, Mapping
Relations, and Temporally consistent fact table - Definition 12 Data Aggregation
- Computes DA of a cube based on MultiVersion Fact
Table and Temporal Relationships between Member
Versions.
29Evolution Operators
- Insert
- Inserts member version
- Exclude
- Excludes member versions from a dimension
- Associate
- Defines a new transition link b/w two versions of
a member, and if consistent then it will be added
to the set of Mapping Relationships - Reclassify
- Changes the position of a member in the
hierarchical structure by redefining its parents.
30Using Evo Operations
31Logical Temporal Model
- De Facto Multi Dim systems are only made up of
dimensions and fact tables - Temporal mode of representation
- Similar to a dimension and a confidence factor
appearing as a measure
32More on the Evo Ops
- Difference b/w member and relationships in a
dimension - First problem with current tools Hierarchical
links are stored as keys in child attributes. - Solution Interpreting change in hierarchical
structure - Problem Lots of changes O(bd)
- -gtso they create a new Reclassify operatorwhich
inserts, excludes and associates.
33Their Implementation (architecture)
- 3 Parts
- Temporal Datawarehouse
- Contains the Temporal Multidimensional Schema
- Mapping Relationships
- Multiversion Datawarehouse
- TMP can be preceeded and Multiversion fact table
can be inferred from the Temporally Consistent
fact table and Mapping Relations - OLAP cube
- Built from the MVDW using aggregations and allows
requests to integrate the temporal models of
presentation
34Adding MetaData
- Categories
- Related to versions of members
- Related to evolution of members
- Example of Type 2
35Table of Mapping relationships
- Used to build the MultiVer Fact Table
- User has User can obtain transformations
36Conclusion
- Focus on designers needs
- Handling the most evo ops
- Allowing to have nontrivial hierarchical
structures in dimensions to match with real life
cases - Big contribution
- TMP
- Changes in analysis Framework
- Confidence facotrs
- Still flawed