Torbens talk - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Torbens talk

Description:

Diab. Ins. Dep. Diab. Preg. DAT5 Presentation, September 6th, 2000. 9. Hierarchy Properties ... Diab. Preg. No Low-level Diagnosis. 1. 1. 1. 1. 1. 0. 2? 1. 0 ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 20
Provided by: csA2
Category:
Tags: diab | talk | torbens

less

Transcript and Presenter's Notes

Title: Torbens talk


1
Torbens talk
  • Topics Data Warehousing, On-Line Analytical
    Processing (OLAP), pre-aggregation/pre-computation
    for better performance
  • Note how color and animation are used

2
Extending Practical Pre-Aggregation for On-Line
Analytical Processing
  • T. B. Pedersen1,2, C. S. Jensen2, and C. E.
    Dyreson2
  • 1 Center for Health Information Services
  • Kommunedata, www.kmd.dk
  • 2 Nykredit Center for Database Research
  • Department of Computer Science
  • Aalborg University, www.cs.auc.dk/NDB

3
Motivation
  • On-Line Analytical Processing (OLAP) systems are
    popular.
  • Based on a dimensional view of data.
  • Require fast query response times even for large
    amounts of data.
  • Pre-aggregation is used to speed up queries.
  • Full pre-aggregation (all combinations of
    aggregate levels are materialized) is infeasible.
  • Takes up too much space (200 times the size of
    the raw data).
  • Takes too long to update when data changes.
  • Practical (or partial) pre-aggregation techniques
  • Store only select combinations of aggregates.
  • Re-use these to compute higher-level results.
  • Provides a good trade-off between storage
    space/update time and response time.

4
Motivation
  • Practical pre-aggregation requires well-behaved
    hierarchies and fact-dimension relationships.
  • Hierarchies must be
  • strict
  • covering
  • onto
  • Facts and dimensions must be
  • many-to-one
  • have uniform granularity.
  • Often, these requirements are not satisfied in
    practice.
  • We present techniques and algorithms for applying
    practical pre-aggregation to general,
    non-summarizable hierarchies and fact-dimension
    relationships.

5
Patient Case Study
  • Patients are the facts in multidimensional terms.
  • Each patient has zero or more diagnoses
    (many-to-many).
  • The diagnoses may be registered at any level
    (Low-level, Family, or Group), i.e., the
    granularity is varying.
  • A diagnosis may have no children (non-onto) or
    several parents (non-strict).
  • Each patient has one address, in a city or a
    rural area (non-covering). Cities are located in
    counties.

6
Talk Overview
  • Motivation
  • Data model context
  • Hierarchy properties
  • Hierarchy transformations
  • Integration in current systems
  • Conclusion and future work

7
Data Model - Schema
  • Fact type Patient
  • Dimension types Diagnosis and Residence
  • There are no measures, all data are dimensions.
  • Category types Low-level Diagnosis, Diagnosis
    Family, Diagnosis Group, Address, City, County
  • Top category types corresponds to ALL of the
    dimension.
  • Bottom category types the lowest level in each
    dimension
  • The category types of a dimension type form a
    lattice.

Residence Dimension Type
Diagnosis Dimension Type
TDiagnosis
TResidence
Diagnosis Group
County
Diagnosis Family
City
LL Diagnosis
Address
Patient
8
Data Model - Instances
  • Categories instances of category types, consist
    of dimension values.
  • Top categories contain only one T value.
  • Dimensions categories partial order on
    category values, hierarchy may be non-strict.
  • Facts instances of fact type with separate
    identity.
  • Fact-dimension relations links facts to
    dimensions, may map to several values of any
    granularity in each dimension.
  • Multidimensional object (MO) schema dimension
    facts fact-dimension relations

Residence Dimension
Diagnosis Dimension
T
T
Cancer PregRel Diabetes
Sydney Outback
Lung Canc.
Ins. Dep. Diab.
Diab. Preg.
Sydney
Ins. Dep. Diab. Preg.
1 Central 2 Rural
John
Jim
Jane
9
Hierarchy Properties
  • Summarizability requires that the dimension
    hierarchies are covering, onto, and strict.
  • Non-covering hierarchies occur when links between
    dimension values skip levels.
  • Non-strict hierarchies occur when one lower-level
    item has several parents.
  • Non-onto (into) hierarchies occur when the height
    of the hierarchy is varying.
  • These properties cause problems when re-using
    stored counts of patients to compute new values.

Residence Dimension
Diagnosis Dimension
T
T
Cancer PregRel Diabetes
Sydney Outback
1
0
2?
1
?
Lung Canc.
Ins. Dep. Diab.
Diab. Preg.
Sydney
1
1
1
0
?
Ins. Dep. Diab. Preg.
1 Central 2 Rural
1
No Low-level Diagnosis
10
Hierarchy Transformations
  • The overall task is to transform non-summarizable
    hierarchies to summarizable hierarchies
    automatically.
  • A hierarchy is transformed in three steps
  • 1) Transform the hierarchy to be covering.
  • 2) Transform the result from 1) to be onto.
  • 3) Transform the result from 2) to be strict.
  • We give an algorithm for each transformation.
    Each algorithm assumes that the previous
    algorithm(s) has been applied.

11
Non-covering Hierarchies
  • The MakeCovering algorithm starts with the bottom
    category C (Address).
  • For each parent category P of C (City and County)
    it looks for parent categories H that are higher
    than P (City lt County).
  • The algorithms finds all the links L(h,c) from H
    to C that are not covered by going through P.
  • For each link L an intermediate value is inserted
    into P and linked to h and c.
  • Finally, the algorithm is applied recursively to
    P (nothing changes in this case).

Residence Dimension
TResidence
T
Sydney Outback
County
H
L
Sydney
CityOutback
City
P
Address
C
1 Central 2 Rural
12
Non-onto Hierarchies
  • The MakeOnto algorithm starts with the top
    category P (Tdiagnosis).
  • For each child category C of P it finds the
    values n in P with no children (none are found in
    the first two calls).
  • For each childless n in P, the algorithms inserts
    a placeholder value cn in C and links it to n.
  • Finally, the algorithm is called recursively on
    C.

Diagnosis Dimension
TDiagnosis
T
P
Cancer PregRel Diabetes
Diag. Group
P
C
Lung Canc.
Ins. Dep. Diab.
Diab. Preg.
P
Diag. Family
C
n
Ins. Dep. Diab. Preg.
LL Diagnosis
C
LLLC
13
Non-strict Hierarchies
  • Schema changes shown first
  • Idea fuse sets into one value.
  • The MakeStrict algorithm starts with a child
    category C.
  • For each parent category P of C, it looks for
    non-strictness between C and P if P has parents.
  • If non-strictness occurs, a new category N
    (holding sets of P) is inserted between C and P.
  • Grandparents G of C are linked to the new
    category.
  • Unsafe links are removed.
  • Finally, the algorithm is called recursively on N.

TDiagnosis
G
Diag. Group
G

P
Set-of-DG
P

Diag. Family
Set-of-DF
C
LL Diagnosis
C

14
Non-strict Hierarchies
  • Instance changes shown now.
  • The MakeStrict algorithm starts with a child
    category C.
  • For each parent category P of C, it looks for
    non-strictness between C and P if P has parents.
  • If non-strictness occurs, new fused values
    (sets of P) are inserted into N.
  • Values in grandparents G of C are linked to the
    new values.
  • Unsafe links are removed.
  • Finally, the algorithm is called recursively on N.

Diagnosis Dimension
T
Cancer PregRel Diabetes
PR,D
C
Lung Canc.
Diab. Preg.
Ins. Dep. Diab.
DP,IDD
LC
Ins. Dep. Diab. Preg.
LLLC
15
Did We Solve The Problem ?
  • Now we can re-use stored values to correctly
    compute new higher-level values.

Diagnosis Dimension
Residence Dimension
T
1
T
3
Cancer PregRel Diabetes
1
1
0
Sydney Outback
1
2
PR,D
C
1
0
Diab. Preg.
Ins. Dep. Diab.
Lung Canc.
Sydney
CityOutback
1
2
1
1
0
1 Central 2 Rural
DP,IDD
LC
1
0
Ins. Dep. Diab. Preg.
LLLC
1
0
16
Integration in Current Systems
  • The transformations should be transparent to the
    user.
  • Modern OLAP systems have client and server
    components, communicating using, e.g., OLE DB for
    OLAP or MD-API.
  • OLAP queries consist of 80 navigation and 20
    aggregation queries Kimball.
  • Integration is achieved using a Query Handler
    that sends navigation queries to the original DB
    and aggregation queries to the transformed DB.
  • Small space overhead (the hierarchies only are
    stored twice).

17
Conclusion and Future Work
  • We have presented algorithms that automatically
    transform non-summarizable (non-covering,
    non-onto, non-strict) hierarchies to become
    summarizable.
  • The algorithms have a practical, low complexity
    and can be applied incrementally to minimize the
    cost of updates.
  • The algorithms can be applied to non-summarizable
    relationships between facts and dimensions.
  • The presented technique can be implemented in
    current OLAP systems transparently to the user.
  • Future work
  • Transform only the part of the hierarchy that has
    been selected for materialization.
  • Take the aggregate function into account, i.e.,
    MAX and MIN are insensitive to duplicates.

18
Previous Work
  • Much work has focused on selecting the optimal
    subset of aggregate levels for practical
    pre-aggregation, given space and/or time
    constraints.
  • This work assumes summarizability. Thus, it
    cannot be applied to general hierarchies and
    fact-dim. relationships.
  • One paper has shown how to manually, and not
    transparently to the user, achieve
    summarizability for non-covering hierarchies.
  • We give algorithms and techniques that
    automatically achieves summarizability for
    non-covering, non-onto, and non-strict
    hierarchies, transparently to the user.
  • The algorithms and techniques can also be applied
    to non-summarizable relationships between facts
    and dimensions.

19
Evaluation
  • Paper
  • Why no experiments with real data ? ?
  • Why no performance studies ? ?
  • Theoretical evaluation of benefits/blowup in
    dimension size ? ?
  • Its all in the journal version(to appear) ?
  • Talk
  • Good idea to use animation for presenting an
    algorithm ?
  • Did we solve the problem? re-visit problem
    definition ?
  • Intelligent use of color ? ? or ? or ? ?
Write a Comment
User Comments (0)
About PowerShow.com