Introduction to Data Warehousing - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Introduction to Data Warehousing

Description:

To appreciate the importance of aggregation in data warehouses ... Dimension Tables used to categorise data. often composed of one or more hierarchies ... – PowerPoint PPT presentation

Number of Views:274
Avg rating:3.0/5.0
Slides: 20
Provided by: Hel51
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Data Warehousing


1
Introduction to Data Warehousing
  • Sheila Baron
  • Southampton Institute, England
  • April 2004

2
Objectives
  • To understand the differences in schema design
    objectives between data warehouses and OLTP
    systems
  • To appreciate the importance of aggregation in
    data warehouses
  • To gain experience in using Oracles aggregation
    features.

3
What is a data warehouse?
  • A data warehouse is essentially a database
    designed to support corporate decision making
  • William Inmon (1993) defines the characteristics
    of a data warehouse as
  • Subject oriented
  • Integrated
  • Time-variant
  • Non-volatile

4
Comparison of OLTP and DW system features
  • OLTP systems feature
  • Holds current data
  • Stores detailed data
  • Data is dynamic
  • Repetitive processing
  • High transaction throughput
  • Predictable pattern of usage
  • Transaction driven
  • Application oriented
  • Supports day-to-day decisions
  • Large number of clerical/operational users
  • DW systems feature
  • Holds historical data
  • Includes summarised data
  • Data is largely static
  • Ad hoc, queries
  • Less transaction throughput
  • Unpredictable usage
  • Analysis driven
  • Subject oriented
  • Supports strategic decisions
  • Serves relatively few managerial users

5
Therefore
  • OLTP system design must emphasise avoiding data
    inconsistencies caused by
  • Duplicate or calculated values
  • Conflicts between concurrent transactions
  • Decision support system design emphasises
  • Fast efficient access to large volumes of data
  • Support for complex queries
  • THESE DIFFERING OBJECTIVES ARE INHERENTLY
    INCOMPATIBLE

6
Schema Design Approaches
  • OLTP normalisation
  • many joins / link tables / compound foreign keys
  • good data integrity but poor query performance
  • DSS dimensional modelling
  • few simple joins / aggregated values
  • poor support for integrity but faster queries.

7
Typical Data Warehouse Schema Objects (from
Oracle DW Guide)
8
Typical Dimension Hierarchy
  • many customers live in a single county
  • many counties make up a single sub-region
  • many subregions exist in a single region
  • Such a hierarchy is used to DRILL DOWN
    through aggregated sales totals by region,
    sub-region, county or customer.

9
Dimensional Modeling Summary
  • Fact Tables with two types of columns
  • numeric facts (often called measurements),
  • Detail-level or aggregated. Additive.
  • Simple foreign keys to dimension tables making up
    a compound primary key
  • Dimension Tables used to categorise data
  • often composed of one or more hierarchies
  • normally descriptive, textual values
  • Several distinct dimensions, combined with facts,
    enable you to answer business questions. Commonly
    used dimensions are customers, products, and time.

10
Aggregation
  • Managers tend to view information at a high level
    of abstraction, and view more detailed figures
    only when unusual results are found
  • The process is called DRILLING DOWN
  • Example
  • Monthly sales in region X gt 100,000 for every
    month last year except April
  • Manager looks at April figures by subregion and
    finds one county is under performing
  • Looks at detailed figures for that county by
    product type to see if there are any specific
    problem areas.

11
Aggregation and Oracle
  • Aggregation is a fundamental part of data
    warehousing
  • One of the key concepts in decision support
    systems is multi-dimensional analysis examining
    the enterprise from all necessary combinations of
    dimensions
  • To improve aggregation performance and facilitate
    multi-dimensional analysis Oracle provides two
    extensions to the GROUP BY clause

12
Sample Star Schema for use in Aggregation
exercises
13
Standard GROUP BY Clause
  • SELECT cust_id, prod_id, SUM(amount_sold) "SALES
    VALUE"
  • FROM sales s, times t
  • WHERE prod_id IN (4200, 385)
  • AND cust_id IN (100, 2620, 19400)
  • AND s.time_id t.time_id
  • AND calendar_year '1999'
  • GROUP BY cust_id, prod_id
  • Produces regular aggregation rows only.  

14
ROLLUP EXTENSION
  • SELECT cust_id, prod_id, SUM(amount_sold) "SALES
    VALUE"
  • FROM sales s, times t
  • WHERE prod_id IN (4200, 385)
  • AND cust_id IN (100, 2620, 19400)
  • AND s.time_id t.time_id
  • AND calendar_year '1999'
  • GROUP BY ROLLUP (cust_id, prod_id) 
  • Produces sub-total rows aggregating across
    products for each customer and
  • Produces a grand total row

15
CUBE EXTENSION
  • SELECT cust_id, prod_id, SUM(amount_sold) "SALES
    VALUE"
  • FROM sales s, times t
  • WHERE prod_id IN (4200, 385)
  • AND cust_id IN (100, 2620, 19400)
  • AND s.time_id t.time_id
  • AND calendar_year '1999'
  • GROUP BY CUBE (cust_id, prod_id) 
  • Produces sub-total rows aggregating across all
    dimension combinations
  • Produces a grand total row.

16
Making output more readable
  • Totals calculated by the ROLLUP and CUBE
    functions do not, by default, have labels
  • Therefore it can be difficult to determine which
    total refers to what
  • To overcome this Oracle provides the GROUPING BY
    function to identify which dimensions are being
    aggregated.
  • SEE HANDOUT

17
Where to from here
  • So far we have considered only two issues in
    Data Warehousing - Schema design and aggregation.
    There are many others including firstly
  • Extraction (Transport) Transformation and Loading
    E(T)TL
  • Merging data from multiple heterogenous sources
  • Inconsistent data formats / level of detail
  • Missing / inconsistent data
  • Integrity constraint handling
  • Periodic updating of data (inc refreshing stored
    aggregated values).

18
Data warehousing issues cont
  • Also
  • Query optimisation issues
  • Tuning
  • For Oracle dimension objects, materialised
    views, Query Rewrite.
  • Physical storage issues such as partitioning
  • Parallel processing architectures
  • Data Marts.

19
Further Information
  • Key sources for further reading
  • Connolly and Begg, Database Systems, Chapters 30
    and 31
  • Oracle 9i Data Warehousing Guide
  • Ch 1 Data Warehousing Concepts
  • Ch 3 Logical Design in DWs
  • Ch 18 SQL for aggregation in DWs
Write a Comment
User Comments (0)
About PowerShow.com