Dimensional Modeling - PowerPoint PPT Presentation

About This Presentation
Title:

Dimensional Modeling

Description:

OD charge. Pay type. Requestor of. Owner of. Video #Video No. One ... OD Charge. OneDayCharge. ExtraDaysCharge. WeekendCharge. DaysReserved. DaysOverdue. CustID ... – PowerPoint PPT presentation

Number of Views:119
Avg rating:3.0/5.0
Slides: 23
Provided by: busin7
Learn more at: https://www.ou.edu
Category:

less

Transcript and Presenter's Notes

Title: Dimensional Modeling


1
Dimensional Modeling
  • Chapter 2

2
The Dimensional Data Model
  • An alternative to the normalized data model
  • Present information as simply as possible (easier
    to understand)
  • Return queries as quickly as possible (efficient
    for queries)
  • Track the underlying business processes (process
    focused)

3
The Dimensional Data Model
  • Contains the same information as the normalized
    model
  • Has far fewer tables
  • Grouped in coherent business categories
  • Pre-joins hierarchies and lookup tables resulting
    in fewer join paths and fewer intermediate tables
  • Normalized fact table with denormalized dimension
    tables.

4
GB Video E-R Diagram
5
GB Video Data Mart
6
Fact Table
  • Measurements associated with a specific business
    process
  • Grain level of detail of the table
  • Process events produce fact records
  • Facts (attributes) are usually
  • Numeric
  • Additive
  • Derived facts included
  • Foreign (surrogate) keys refer to dimension
    tables (entities)
  • Classification values help define subsets

7
Dimension Tables
  • Entities describing the objects of the process
  • Conformed dimensions cross processes
  • Attributes are descriptive
  • Text
  • Numeric
  • Surrogate keys
  • Less volatile than facts (1m with the fact
    table)
  • Null entries
  • Date dimensions
  • Produce by questions

8
Bus Architecture
  • An architecture that permits aggregating data
    across multiple marts
  • Conformed dimensions and attributes
  • Drill Down vs. Drill Across
  • Bus matrix

9
Keys and Surrogate Keys
  • A surrogate key is a unique identifier for data
    warehouse records that replaces source primary
    keys (business/natural keys)
  • Protect against changes in source systems
  • Allow integration from multiple sources
  • Enable rows that do not exist in source data
  • Track changes over time (e.g. new customer
    instances when addresses change)
  • Replace text keys with integers for efficiency

10
Slowly Changing Dimensions
  • Attributes in a dimension that change more slowly
    than the fact granularity
  • Type 1 Current only
  • Type 2 All history
  • Type 3 Most recent few (rare)
  • Note rapidly changing dimensions usually
    indicate the presence of a business process that
    should be tracked as a separate dimension or as a
    fact table

11
CustKey BKCustID CustName CommDist Gender HomOwn?
1552 31421 Jane Rider 3 F N
Fact Table
Date CustKey ProdKey Item Count Amount
1/7/2004 1552 95 1 1,798.00
3/2/2004 1552 37 1 27.95
5/7/2005 1552 87 2 320.26
2/21/2006 1552 2387 42 1 19.95
Dimension with a slowly changing attribute
Cust Key BKCust ID Cust Name Comm Dist Gender Hom Own? Eff End
1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006
2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999
12
Date Dimensions
  • One row for every day for which you expect to
    have data for the fact table (perhaps generated
    in a spreadsheet and imported)
  • Usually use a meaningful integer surrogate key
    (such as yyyymmdd 20060926 for Sep. 26, 2006).
    Note this order sorts correctly.
  • Include rows for missing or future dates to be
    added later.

13
Degenerate Dimensions
  • Dimensions without attributes. (Such as a
    transaction number or order number.)
  • Put the attribute value into the fact table even
    though it is not an additive fact.

14
Snowflaking (Outrigger Dimensions or Reference
Dimensions)
  • Connects entities to dimension tables rather than
    the fact table
  • Complicates coding and requires additional
    processing for retrievals
  • Makes type 2 slowly changing dimensions harder to
    maintain
  • Useful for seldom used lookups

15
MN Multivalued Dimensions
  • Fact to Dimension
  • Dimension to Dimension
  • Try to avoid these. Solutions can be very
    misleading.

16
Multivalued Dimensions
SALESREP SalesRepKey Name Address
ORDERS (FACT) SalesRepKey ProductKey SalesRepGrpKe
y CustomerKey OrderQty
SALESREP-ORDER-BRIDGE SalesRepKey SalesrepGroupKey
Weight (1/NumReps)
17
Hierarchies
  • Group data within dimensions SalesRep
  • Region
  • State
  • County
  • Neighborhood
  • Problem structures
  • Variable depth
  • Frequently changing

18
Heterogeneous Products
  • Several different kinds of entry with different
    attributes for each
  • (The sub-class problem)

19
Aggregate Dimensions
  • Dimensions that represent data at different
    levels of granularity
  • Remove a dimension
  • Roll up the hierarchy (provide a new shrunken
    dimension with new surr-key that represents
    rolled up data)

20
Junk Dimensions
  • Miscellaneous attributes that dont belong to
    another entity, usually representing processing
    levels
  • Flags
  • Categories
  • Types

21
Fact Tables
  • Transaction
  • Track processes at discrete points in time when
    they occur
  • Periodic snapshot
  • Cumulative performance over specific time
    intervals
  • Accumulating snapshot
  • Constantly updated over time. May include
    multiple dates representing stages.

22
Aggregates
  • Precalculated summary tables
  • Improve performance
  • Record data an coarser granularity
Write a Comment
User Comments (0)
About PowerShow.com