Overview of Advanced Databases FT228 - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Overview of Advanced Databases FT228

Description:

Example Dimensional Model. Time Dimension Exercise ... of the attribute are simultaneously accessible in the same dimension record. ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 45
Provided by: brendan84
Category:

less

Transcript and Presenter's Notes

Title: Overview of Advanced Databases FT228


1
Dr. Brian Mac Namee (www.comp.dit.ie/bmacnamee)
Business Systems Intelligence1. Data
Warehousing 2
2
Entity Relationship Data Model (ERD)
  • ER Approach works by dividing the data into many
    discreet entities
  • Each entity becomes a Table in the physical
    schema
  • Why it has been so successful
  • Coupled with the concept of Normalization it
    drives all the redundancy out of the database
  • Change (or add or delete) the data at just one
    point
  • Can build very fast access methods (index)
  • Results in efficient transactional processing
  • Where is the catch?

3
ERD Where Is The Catch?
  • Lets have a look at a typical ER data model first
  • Some Observations
  • A Symmetric Model
  • All the tables look the same
  • Which table is more important ? Which is the
    largest?
  • Which tables contain numerical measurements of
    the business?
  • Which table contain nearly static descriptive
    attributes?
  • Very hard to visualize and keep it in head
  • A large number of possible connections to any two
    (or more) tables

4
A Typical OLTP Oriented ER Data Model
5
ERD Catch Continues
  • ERD and Normalization result in large number of
    tables
  • Hard to be understood by the users (db
    programmers)
  • Hard to navigate by DBMS software in an optimum
    way
  • Real value of ERD is in using tables individually
    or in pairs
  • Too complex for queries that span multiple tables
    with a large number of records

6
How to Simplify a Data Model
  • Two general methods
  • De-Normalization
  • Dimensional Modeling
  • De-Normalization
  • Reverses the effect of Normalization
  • Reintroduce redundancy while reducing the number
    of tables
  • Popular approaches Pre-Join de-normalization,
    Column Replication or Movement and Aggregation

7
Data Warehouse Design
  • The database component of a data warehouse is
    described using a technique called dimensionality
    modelling
  • Logical design technique that aims to present the
    data in a standard, intuitive form that allows
    for high-performance access
  • Uses the concepts of ER modelling with some
    important restrictions

8
Data Warehouse Design (cont)
  • Every dimensional model (DM) is composed of
  • One table with a composite primary key, called
    the fact table
  • A set of smaller tables called dimension tables
  • Each dimension table has a simple (non-composite)
    primary key that corresponds exactly to one of
    the components of the composite key in the fact
    table.

9
Data Warehouse Design
  • Forms star-like structure, which is called a star
    schema or star join
  • Star schema is a logical structure that has a
    fact table containing factual data in the centre,
    surrounded by dimension tables containing
    reference data, can be denormalised.
  • Facts are generated by events that occurred in
    the past, and are unlikely to change, regardless
    of how they are analysed

10
Data Warehouse Design (cont)
Dimension Table
Dimension Table
Fact Table
Dimension Table
Dimension Table
Star Schema
11
Data Warehouse Design (cont)
  • Bulk of data in data warehouse is in fact tables,
    which can be extremely large.
  • Important to treat fact data as read-only
    reference data that will not change over time.
  • Most useful fact tables contain one or more
    numerical measures, or facts that occur for
    each record and are numeric and additive.

12
Data Warehouse Design (cont)
  • Dimension tables usually contain descriptive
    textual information.
  • Dimension attributes are used as the constraints
    in data warehouse queries.
  • Star schemas can be used to speed up query
    performance by denormalizing reference
    information into a single dimension table.

13
Dimensional Modeling
  • Models the data around two basic concepts Facts
    Dimensions.
  • Facts
  • Facts are numeric measurements (values) that
    represent a specific business aspect or activity.
  • Facts can be computed or derived at run-time
    (metrics).
  • Examples Unit Cost, Sale Amount, Quantity Sold

14
Dimensional Modeling (cont)
  • Dimensions
  • Dimensions are qualifying characteristics that
    provide additional perspectives to a given fact.
  • Examples Date (Day, Month, Qtr, Year), Product
    (Type, Category)

15
Dimensional Modeling (cont...)
  • Every dimensional model (DM) is composed of one
    (or more) fact tables, and a set of smaller
    dimension tables.
  • Look on Fact table through one (or more)
    dimensions.
  • What is the sale amount in Consumer Product
    category, for elderly customers in the second
    quarter of 2004?
  • Forms star-like structure, which is called a
    star schema or star join.

16
Example Dimensional Model
17
Time Dimension Exercise
  • Every Data Warehouse will need Time information
  • I.e. a Time Dimension
  • Compose a generic Time Dimension Table
  • E.g. what are the different attributes you can
    use to describe 11th February, 2008.

18
Time Dimension Exercise
create table time_dimension ( date_key Number
not null, full_date Date, day_of_week
Number, day_num_in_month Number, day_num_overa
ll Number, day_name Varchar2(9), day_abbrev
Varchar2(3), weekday_flag Varchar2(1), week_n
um_in_year Number, week_num_overall
Number, week_begin_date Date, week_begin_date_
key Number, month Number, month_num_overall
Number, month_name Varchar2(9), month_abbrev
Varchar2(3), quarter Number, year
Number, yearmo Number, fiscal_month
Number, fiscal_quarter Number, fiscal_year
Number, last_day_in_month_flag
Varchar2(1), same_weekday_year_ago_date
Date, primary key (date_key))
19
Data Model Design for Data Warehouses
  • Nine-Step Methodology includes following steps
  • Choosing the subject
  • Choosing the grain
  • Identifying and conforming the dimensions
  • Choosing the facts
  • Storing pre-calculations in the fact table
  • Rounding out the dimension tables
  • Choosing the duration of the database
  • Tracking slowly changing dimensions
  • Deciding the query priorities and the query mode

20
Step 1 Choosing The Subject
  • The subject (or function) refers to the subject
    matter of a particular data mart
  • A business process is a major operational process
    in an organization
  • Typically supported by a legacy system (database)
    or an OLTP
  • Examples Orders, Invoices, Inventory etc.

21
Step 1 Choosing The Subject (cont)
  • First data mart built should be the one that is
    most likely to be
  • Delivered on time
  • Within budget
  • To answer the most commercially important
    business questions

22
Step 2 Choosing The Grain
  • Grain is the fundamental, atomic level of data to
    be represented.
  • Decide what a record of the fact table is to
    represent.
  • Grain is also termed as unit of analyses.
  • Typical grains
  • Individual Transactions
  • Daily aggregates (snapshots)
  • Monthly aggregates

23
Step 2 Choosing The Grain (cont)
  • Identify dimensions of the fact table. The grain
    decision for the fact table also determines the
    grain of each dimension table.
  • Also include time as a core dimension, which is
    always present in star schemas.
  • Sometimes grain varies for different facts within
    same business process. How?

24
Step 3 Identifying Conforming The Dimensions
  • Dimensions set the context for asking questions
    about the facts in the fact table.
  • If any dimension occurs in two data marts, they
    must be exactly the same dimension, or one must
    be a mathematical subset of the other.
  • A dimension used in more than one data mart is
    referred to as being conformed.

25
Step 3 Identifying Conforming The Dimensions
  • Choose the dimensions that apply to each fact in
    the fact table.
  • Typical dimensions time, product, customer etc.
  • Need to identify the descriptive attributes that
    explain each dimension
  • Need to determine hierarchies within each
    dimension?

26
Steps 4 5 Choosing The Facts
  • The grain of the fact table determines which
    facts can be used in the data mart.
  • Facts should be numeric and additive.
  • Example Quantity Sold, Amount etc.
  • Unusable facts include
  • Non-numeric facts
  • Non-additive facts
  • Fact at different granularity from other facts in
    table
  • Storing Pre-Calculations in the Fact Table
  • Once the facts have been selected each should be
    re-examined to determine whether there are
    opportunities to use pre-calculations.

27
Step 6 Rounding Out The Dimension Tables
  • Text descriptions are added to the dimension
    tables.
  • Text descriptions should be as intuitive and
    understandable to the users as possible.
  • Usefulness of a data mart is determined by the
    scope and nature of the attributes of the
    dimension tables.
  • (See exercise on Time Dimension)

28
Step 7 Choosing The Duration Of The Database
  • Duration measures how far back in time the fact
    table goes.
  • Very large fact tables raise at least two very
    significant data warehouse design issues.
  • Often difficult to source increasing old data.
  • It is mandatory that the old versions of the
    important dimensions be used, not the most
    current versions. Known as the Slowly Changing
    Dimension problem.

29
Step 8 Tracking Slowly Changing Dimensions
  • The slowly changing dimension problem means that
    the proper description of the old dimension data
    must be used with old fact data.
  • Often, a generalized key must be assigned to
    important dimensions in order to distinguish
    multiple snapshots of dimensions over a period of
    time.

30
Step 8 Tracking Slowly Changing Dimensions
(cont)
  • Three basic types of slowly changing dimensions
  • Where a changed dimension attribute is
    overwritten.
  • Where a changed dimension attribute causes a new
    dimension record to be created.
  • Where a changed dimension attribute causes an
    alternate attribute to be created so that both
    the old and new values of the attribute are
    simultaneously accessible in the same dimension
    record.

31
Step 9 Deciding The Query Priorities And The
Query Modes
  • Most critical physical design issues affecting
    the end-users perception includes
  • Physical sort order of the fact table on disk
  • Presence of pre-stored summaries or aggregations.
  • Additional physical design issues include
    administration, backup, indexing performance, and
    security.

32
Dimensional Modelling
  • Dimensional Modelling is a logical design
    technique that seeks to present the data in a
    standard framework that is intuitive and allows
    for high-performance access

33
Dimensional Modelling (cont)
  • Fact table
  • Consists of a multi-part primary key and usually
    contains numeric data
  • Numeric data is aggregated based on the
    multi-part primary key
  • Additive is crucial because the DW applications
    typically retrieve data based on more than one
    set of facts
  • Dimension table
  • Contains descriptive information
  • Are the entry point for queries

34
Dimensional Modelling
  • Dimensional Model
  • SELECT description, SUM(quoted_price),
    SUM(quantity),
  • SUM(unit_price) , SUM(total_comm)
  • FROM order_fact of,
  • part_dimension pd
  • WHERE of.part_nr pd.part_nr
  • GROUP BY description
  •  
  • ER-Model
  • SELECT description, SUM(quoted_price),
    SUM(quantity),
  • SUM(unit_price), SUM(total_comm)
  • FROM order o,
  • order_detail od,
  • part p,
  • customer c,
  • slsrep s
  • WHERE o.order_nr od.order_nr
  • AND p.part_nr od.part_nr
  • AND o.customer_nr c.customer_nr

35
Dimensional Modelling
36
Simple DW Example
  • Supermarket (Chain Store)
  • Business Area Sales
  • Grain Individual Purchases
  • Dimensions
  • Time
  • Product
  • Store
  • Customer
  • Employee
  • Facts
  • Total Sales
  • Number of items
  • Total Cost Value

37
Data Warehouse Design (cont)
Customer
Time
Individual Purchases
Employee
Products
Store
38
More Detailed Exercise
  • A more detailed exercise See handout

39
Exam Question Example
  • You are working on a data warehousing project for
    the examinations department at the Dublin
    Institute of Technology (DIT). The examinations
    department looks after all exam and continuous
    asessment results for all of the students within
    the Institute. The purpose the data warehousing
    project is to allow new reporting capabilities so
    that examinations department staff can examine
    grade patterns for particular courses monitor
    average grades for modules and patterns in grades
    for courses given by particular staff members
    and to help with student retention efforts.
  • Currently in the examinations departments
    transactional systems information about each
    student is indexed by a student number, and
    includes name, address, date of birth, etc.
    Similarly, information stored about the
    instructors working within the Institute is
    indexed by staff ID and includes name, address,
    department, etc. The information that needs to be
    stored about each course includes the course
    title, course code, and the weighting given to
    the continuous assessment element and exam
    element of a particular course. At the
    Institutes progression boards each year the
    continuous assessment and exam results for each
    student, for each course, are entered into the
    Institutes transactional grade storage system
    and this informaion should be trasnferred across
    to the data warehouse.
  • Design a star schema for the above scenario. (20
    marks)
  • Discuss how the star schema supports the
    reporting requirements outlined in the above
    scenario. (15 marks)

40
Different Types Of Dimensional Model
  • The star-schema can be extended in two ways
  • Snow flake model
  • Multi-star model (also know as fact
    constellation)

41
Example Star Schema

Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
42
Example Snowflake Schema
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
43
Example Fact Constellation
Shipping Fact Table
time_key
Sales Fact Table
item_key
time_key
shipper_key
item_key
from_location
branch_key
to_location
location_key
dollars_cost
units_sold
units_shipped
dollars_sold
avg_sales
Measures
44
Summary
  • Over the last two lectures we have introduced
    them idea of data warehouses
  • Data warehouses evolved to address the issues of
    using transactional databases to answer new kinds
    of questions
  • IBM have a very detailed RedBook ondimensional
    modelling that is well worth looking at
  • http//www.redbooks.ibm.com/redbooks/pdfs/sg247138
    .pdf
Write a Comment
User Comments (0)
About PowerShow.com