Title: Dimensional Modeling
1Dimensional Modeling
2The Dimensional Data Model
- An alternative to the normalized data model
- Present information as simply as possible (easier
to understand) - Return queries as quickly as possible (efficient
for queries) - Track the underlying business processes (process
focused)
3The Dimensional Data Model
- Contains the same information as the normalized
model - Has far fewer tables
- Grouped in coherent business categories
- Pre-joins hierarchies and lookup tables resulting
in fewer join paths and fewer intermediate tables - Normalized fact table with denormalized dimension
tables.
4GB Video E-R Diagram
5GB Video Data Mart
6Fact Table
- Measurements associated with a specific business
process - Grain level of detail of the table
- Process events produce fact records
- Facts (attributes) are usually
- Numeric
- Additive
- Derived facts included
- Foreign (surrogate) keys refer to dimension
tables (entities) - Classification values help define subsets
7Dimension Tables
- Entities describing the objects of the process
- Conformed dimensions cross processes
- Attributes are descriptive
- Text
- Numeric
- Surrogate keys
- Less volatile than facts (1m with the fact
table) - Null entries
- Date dimensions
- Produce by questions
8Bus Architecture
- An architecture that permits aggregating data
across multiple marts - Conformed dimensions and attributes
- Drill Down vs. Drill Across
- Bus matrix
9Keys and Surrogate Keys
- A surrogate key is a unique identifier for data
warehouse records that replaces source primary
keys (business/natural keys) - Protect against changes in source systems
- Allow integration from multiple sources
- Enable rows that do not exist in source data
- Track changes over time (e.g. new customer
instances when addresses change) - Replace text keys with integers for efficiency
10Slowly Changing Dimensions
- Attributes in a dimension that change more slowly
than the fact granularity - Type 1 Current only
- Type 2 All history
- Type 3 Most recent few (rare)
- Note rapidly changing dimensions usually
indicate the presence of a business process that
should be tracked as a separate dimension or as a
fact table
11CustKey BKCustID CustName CommDist Gender HomOwn?
1552 31421 Jane Rider 3 F N
Fact Table
Date CustKey ProdKey Item Count Amount
1/7/2004 1552 95 1 1,798.00
3/2/2004 1552 37 1 27.95
5/7/2005 1552 87 2 320.26
2/21/2006 1552 2387 42 1 19.95
Dimension with a slowly changing attribute
Cust Key BKCust ID Cust Name Comm Dist Gender Hom Own? Eff End
1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006
2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999
12Date Dimensions
- One row for every day for which you expect to
have data for the fact table (perhaps generated
in a spreadsheet and imported) - Usually use a meaningful integer surrogate key
(such as yyyymmdd 20060926 for Sep. 26, 2006).
Note this order sorts correctly. - Include rows for missing or future dates to be
added later.
13Degenerate Dimensions
- Dimensions without attributes. (Such as a
transaction number or order number.) - Put the attribute value into the fact table even
though it is not an additive fact.
14Snowflaking (Outrigger Dimensions or Reference
Dimensions)
- Connects entities to dimension tables rather than
the fact table - Complicates coding and requires additional
processing for retrievals - Makes type 2 slowly changing dimensions harder to
maintain - Useful for seldom used lookups
15MN Multivalued Dimensions
- Fact to Dimension
- Dimension to Dimension
- Try to avoid these. Solutions can be very
misleading.
16Multivalued Dimensions
SALESREP SalesRepKey Name Address
ORDERS (FACT) SalesRepKey ProductKey SalesRepGrpKe
y CustomerKey OrderQty
SALESREP-ORDER-BRIDGE SalesRepKey SalesrepGroupKey
Weight (1/NumReps)
17Hierarchies
- Group data within dimensions SalesRep
- Region
- State
- County
- Neighborhood
- Problem structures
- Variable depth
- Frequently changing
18Heterogeneous Products
- Several different kinds of entry with different
attributes for each - (The sub-class problem)
19Aggregate Dimensions
- Dimensions that represent data at different
levels of granularity - Remove a dimension
- Roll up the hierarchy (provide a new shrunken
dimension with new surr-key that represents
rolled up data)
20Junk Dimensions
- Miscellaneous attributes that dont belong to
another entity, usually representing processing
levels - Flags
- Categories
- Types
21Fact Tables
- Transaction
- Track processes at discrete points in time when
they occur - Periodic snapshot
- Cumulative performance over specific time
intervals - Accumulating snapshot
- Constantly updated over time. May include
multiple dates representing stages.
22Aggregates
- Precalculated summary tables
- Improve performance
- Record data an coarser granularity