Title: Dimensional Modeling Overview
1Dimensional Modeling Overview
2Agenda
- Dimensional Modeling Overview
- Dimensional Modeling Steps
- Dimensional Modeling Framework
- Retail Sales - Case Study
- Dimensional Modeling Tips
3(No Transcript)
4From Requirement to Data Design
- The requirements definition completely drives the
Data design for the DW. - Data design consists of putting together the data
structures. - A group of data elements form a data structure.
- Logical data design includes determination of the
various data elements that are needed and
combination of the data elements into structures
of data and also establishing relationships among
the data structures.
5Design Decisions
- Choosing the process.
- Selecting the subjects from the information
packages for the first set of logical structures
to be designed. - Choosing the grain.
- Determining the level of detail for the data
structures. - Identifying and conforming the dimensions.
- Choosing the business dimensions (such as
product, market, time, etc.) to be included in
the first set of structures . - Choosing the facts.
- Selecting the metrics or units of measurements
(such as product sale units, dollar sales, dollar
revenue, etc.) included in set of structures. - Choosing the duration of the database.
- Determining how far back in time you should go
for historical data.
6Dimensional Modeling Overview
- What is Dimensional Modeling
- It is a logical design technique to structure the
business dimensions and the metrics that are
analyzed along these dimensions. - A logical design technique that seeks to present
the data in a standard framework that is
intuitive and allows for high performance access - It is inherently dimensional and adheres to a
discipline that uses relational model with some
important restrictions - The fundamental idea of dimensional modeling is
that nearly every type of business data can be
represented as a kind of cube of data - The model has also proved to provide high
performance for queries and analysis
7Dimensional Modeling Overview
- Components of Dimensional Model
- Fact Table
- The fact table contains facts or measurements of
the business - Dimension Table
- The dimension tables contain textual attributes
that describe the facts
8Dimensional Modeling Overview Sample Report
Translation
Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report Sales Rep Performance Report
Central Region    Â
   Jul-00 Aug-00
   (Dollars) (Dollars)
Chicago District Chicago District Chicago District 879 878
Adams Adams Adams 345 456
Brown Brown Brown 564 565
Frederickson Frederickson Frederickson 657 768
    Â
Minneapolis District Minneapolis District Minneapolis District 890 789
Andersen Smith Andersen Smith Andersen Smith 909 978
Central Region Total Central Region Total Central Region Total 4244 4434
Facts
Dimension
Report,row and column heading
Numeric report values
9Dimensional Modeling Overview
- Dimension Tables
- Dimension Tables are the entry points into the
data warehouse - Dimension tables are designed especially for
selection and grouping under a common head - Determine contextual background for facts
- Parameters for OLAP
- Common Dimensions
- Date
- Product
- Location/Region
- Customers
- The dimensional model to represent the
information contained in the information package,
the data structure must be represent with
Metrics, business dimensions and attributes for
each business dimension.
10Dimensional Modeling Overview
- Dimension Tables
- Dimension Table Characteristics
- Serve as report labels and query constraints
- By words
- Where clauses
- Provide Descriptive Information
- Minimal codes
- Embedded meaning as attributes
- Represent hierarchical relationships
- Let see Product business dimension example,
- When we want to analyze the fact by products.
11Dimensional Modeling Overview
Product Dimension Table with Sample Rows
Product Key
Product Desc
Size
Brand Desc
Class
CHEERIOS
0001
CHEERIOS 10 OZ
Family
10 OZ
0002
CHEERIOS 24 OZ
CHEERIOS
Family
24 OZ
0003
LUCKY CHARMS 10 OZ
LUCKY CHARMS
Kids
10 OZ
12Big Dimensions
Retail_Fact
time_Key
Store_Key
Customer_Key
Product_key
Sales_dollars
Units_sold
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
13Big Dimensions
Customer Dimension
Customer_key
Customer_ID
Customer_name
Customer_address
Date_of_birth
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
Customer Demographics Dimension
Customer_demographics_key
Age_Band
Gender
Income_Band
Number_of_children
Marital_status
14Dirty dimensions
- A dirty dimension is the one in which data
quality cannot be guaranteed - Data about the same customer can appear multiple
times
Customer Dimension
Customer_key
Customer_ID (natural key)
Customer_name
Customer_address
Date_of_birth
Age
Gender
Annual_Income
Number_of_children
Marital_status
Other_attributes...
15Hierarchies in Dimensions
- Multiple Hierarchies
- Dimension tables can represent multiple
hierarchies roll-ups - For example ,Store Dimension could have
- the following hierarchies
- Physical Geography
- Zip, City, County, State, Country
- Sales Organization
- District, Region, Zone
- Distribution Roll-up
- Distribution Center, Distribution Center Region
Store Dimension
Store_key
Store_description
Store_type
Zip
City
State
Sales_region
Sales_zone
Distribution Center
Distribution Center Region
16Hierarchies in Dimensions
Dimension Tables can represent multiple
hierarchical roll-ups
17Dimensional Modeling Overview
- Fact Table
- The fact table is where the numerical
measurements of the business are stored - Facts
- The detail information in a Fact tables
- For Examples Sales Quantity, Unit Sales Price,
Sales Amount etc. - Key performance indicators of the business
- Numeric in Nature
- Analyzed across the dimensions
- Multi-part key
- Foreign keys to dimension tables
- Date is always a key
Sales Fact
DATE KEY
ITEM KEY
STORE KEY
PROMOTION KEY
POS TRXN
Sales Quantity
Unit Sales Price
Sales Amount
18Dimensional Modeling Overview
- So far we have formed fact table and dimension
tables. - How should these tables be arranged in the
dimensional model? - What are the relationships and how should we mark
the relationships in the model? - The dimensional model should primarily facilitate
queries and analyses. What would be the types of
queries and analyses? - Before combining these tables in dimensional
model. What are the requirements? - The model should provide the best data access.
- The whole model must be query-centric.
- It must be optimized for queries and analysis.
- The model must show that the dimension tables
interact with the fact table. - It should structured in such a way that every
dimension can interact equally with the fact
table. - The model should allow drilling down or rolling
up along dimension hierarchies.
19- With this rqts., each of the dimension tables are
directly relates to fact table in the middle. - Such an arrangement in the dimensional model
looks like a star formation, with the fact table
at the core of a star and the dimension tables
along the spikes of the star. - The dimensional model is therefore called a STAR
schema. - See figure
20Star like Model
Dimension2
Dimension3
Fact
Dimension1
Dimensionn
21STAR schema for AUTO Sales
PRODUCT
AUTO SALES
TIME
DEALER
PAYMENT METHOD
CUSTOMER DEMO GRAPHICS
22Typical Star model
Product Dimension
Product_key
Description
Brand
Category
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Sales Fact
Date_key
Product_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
23Star model..
- It consists of sales fact table in the middle of
schema diagram. It have 3 dimension tables of
Date, Product and Store. - The user will analyze the sales using dollar
sold, unit sold and dollar cost. - From the STAR schema structure intuitively
answers the questions for a given amount of
dollars, what was the product sold? Who was the
customer? Which store sold the product? When was
the order placed? - Constraints and filters of queries are easily
understood by looking at the star. - A common type of analysis is the drilling down
the summary numbers to get at the details at the
lower levels by filtering queries.
24Snowflake Design
Low cardinality redundant attributes moved to
sub dimension tables
25Snowflake Design
- Issues
- Only few tools optimized for snowflake schema
- When the tool is not optimized for snowflake
design - Presentation more complex
- Browsing is slower
- Problems with multiple joins
26 Snowflake Design
27Dimensional Modeling Steps
- Identify the Business Process
- A major operational process that is supported by
some kind of legacy system(s) from which data can
be collected for the purpose of the data
warehouse - Example orders, invoices, shipments, inventory,
sales - Identify the Grain
- The fundamental lowest level of data represented
in a fact table for the business process - Example individual transactions, individual
daily snapshots - Identify the Dimensions
- Choose the dimensions that will apply to each
fact table record - Identify the Facts
- Choose the measured facts that will populate each
fact table record
28Dimensional Modeling Steps
Key Inputs
Dimensional Modeling Steps
Output
Business Requirements
- Identify the business Process
- Identify the Grain
- Identify the Dimensions
- Identify the Facts
Dimensional Model
Data Reality
Resist the temptation to model data by looking
copy books alone
29Shared Dimensions Must Conform
Conformed Dimension
- Option 1 Identical dimensions with the same
keys, labels, definitions and values
DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales Fact
Item Key
Item Desc.
Brand Desc.
Category
..
Sales Schema
Inventory Schema
DATE KEY
ITEM KEY
STORE KEY
Inventory Fact
Item Key
Item Desc.
Brand Desc.
Category
..
30 Conformed Dimension
- Option 2 Subset of base dimension
Sales Schema
DATE KEY
ITEM KEY
STORE KEY
PROMO KEY
Sales
Item Key
Item Desc.
Brand Desc.
Category
Desc.
..
DATE KEY
Day-of-week
Week Desc
Month Desc
Item key Item Desc Brand Desc
Category Desc 0001 Cheerios 10oz
Cheerios Cereal
Forecast Schema
Month Key
Brand Key
Estimate
Sales
Brand Key
Brand Desc.
Category
Desc.
..
Month KEY
Month Desc
Brand key Brand Desc Category Desc 1001
Cheerios Cereal
31 Slowly Changing Dimensions
- Dimension attributes evolve over time
- For example, customers change their names, move,
have children, adjust their Incomes - For every dimension attribute, need to identify
Changes strategy - May use combination of strategies within same
dimension table
32 Slowly Changing Dimensions
Type 1 Overwrite the changed
attributes Original record Item Key Item
Desc Dept 12345 Sim City 3000 Educational
S/W Updated record Item Key Item
Desc Dept 12345 Sim City 3000 Strategy S/W
33 Slowly Changing Dimensions
Type 2 Add a New Dimension Record Original
record Item Key Item Desc Dept 12345 Sim City
3000 Educational S/W Additional record Item
Key Item Desc Dept 12345 Sim City
3000 Strategy S/W
34 Slowly Changing Dimensions
Type 3 Add a Prior Attribute Original
record Item Key Item Desc Dept 12345 Sim City
3000 Educational S/W Updated record Item
Key Item Desc Dept Prior Dept 12345 Sim
City 3000 Strategy S/W Educational S/W
35 Slowly Changing Dimensions
- Data Warehouse Keys ie., STAR schema keys
- All tables (facts and dimensions) should use Data
Warehouse generated surrogate keys - It is possible that the customer no., of
discontinued customers are reassigned to new
customers. We will have a problem because the
same customer no,. Could relate to the data for
the newer customer and also to the data of the
retried customer. Therefore do not use such keys
as a primary keys for dimensional tables. - The surrogate keys are simply system generated
sequence numbers. - Each row in a dimension table is identified by a
unique value of an attribute designated as the
primary key of the dimension. - Each dimension table is in 1M relationship with
central fact table. So the primary key of each
dimension table must be a foreign key in the fact
table.
36 Additive Measures
- The ability of measures to be added across all
dimensions of the fact table. - Measures could be fully additive, semi additive
or non additive - Fully Additive - The values of the attributes
summed up by simple addition, Aggregation is a
fully additive measures is done by simple
addition. Sales Quantity, Revenue - Semi Additive - Account Balance, Inventory,
number of customers (Measure of Intensity, head
counts) - Non-Additive - Profit margin (Ratios and
Percentages) i.e., Ratio or Percentages should
not be added 13, 30, etc.
37Factless Fact tables to track attendance of
students
Time Dimension
time_key
time attributes..
Student Dimension
Student_key
Student attributes..
Student Attendance Fact
time_Key
Student_Key
Course_Key
Faculty_Key
Facility_key
Course Dimension
Course_key
Course attributes..
Faculty Dimension
Faculty_key
Faculty attributes..
Facility Dimension
Facility_key
Facility attributes..
38Factless Fact tables Coverage tables
Sales Fact (revisited)
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Product Dimension
Product_key
Description
Brand
Category
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Dollars_sold
Units_sold
Dollars_cost
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
39Factless Fact tables Coverage tables
Date Dimension
Date_key
Day_of_week
Month
Quarter
Year
Holiday_flag
Product Dimension
Product_key
Description
Brand
Category
Sales Fact
Date_key
Product_key
Promotion_key
Store_key
Promotion Dimension
Promotion_key
Promotion_name
Discount
Event
Store Dimension
Store_key
Store_name
Address
Floor_plan_type
40(No Transcript)
41Dimensional Modeling Framework
Identify Subject Area, Grain
Conceptual Level
Identify Major Dimension Facts, Conform
Dimensions across Facts
Detail Facts with Measures
Level of detail
Logical Level
- Detail Dimensions with Hierarchies Attributes
- Slowly changing Dimensions Policies
Source-Data Model Mapping
Physical Level
Pre-calculations, Aggregates, Indexes, Data
Structures, Source-Physical Model Mapping
42Dimensional Modeling Framework
- STEP1 Choose the process
- Chose a process or subject area from the list of
subject areas identified - Examples of this could be Sales Analysis,
Strategic Sourcing, Human Resources - STEP2 Choose the Grain
- Choose the level of detail
- Every data mart / warehouse should be based on
the most granular (atomic) data that can possibly
be collected and stored.
43Dimensional Modeling Framework
- STEP3 Identifying Dimensions Dimension
Hierarchy
44Dimensional Modeling Framework
- STEP4 Choose the Measures
- Identify all the measures for the fact table
- STEP5 Conforming the dimensions
- Common dimensions across the Facts/ data marts
have to be exactly same or subset of the main
dimension table
45Dimensional Modeling Framework
- STEP6 Adding Attributes to Dimension Tables
- Enhance the depth of analysis
- Examples Customer Age, Address, Profession,
Product color, flavor, product size, packaging
type etc.
46Dimensional Modeling Framework
- STEP7 Storing Pre-calculations in the Fact
table - Calculated based on one or more base measures
- STEP8 Choosing the Duration of the Database
- Need for analyzing the data over a period of time
47Dimensional Modeling Framework
- STEP9 Track Slowly Changing Dimensions
- Certain kinds of dimension attribute changes need
to be handled differently in Data Warehouse - Type I Overwrite
- Type II - History
- Type III Add new column example -
Organizational changes
48(No Transcript)
49Retail Sales - Case Study
- Background
- Chain consists of over 100 grocery stores in five
states - Stores average 60,000 SKUs in departments such as
frozen foods, dairy etc. - Bar codes are scanned directly into the cash
registers PoS system - Items are promoted via coupons, temporary price
reductions, ads and in-store promotions - Analytical Requirements
- Need to know what is selling in store each day in
order to evaluate product movement, as well as to
see how sales are impacted by promotions - Need to understand the mix of items in a consumer
market basket
50Retail Sales - Case Study
- Identify the Business Process
- Identify the Grain
- Identify the Dimensions
- Identify the Measures
- Sales
- Transaction Item (Daily Sales)
- Date, Location, Item, Promotion
- Quantity, Price, Amount
51Retail Sales - Case Study
Resultant Sales Schema
52Retail Sales - Case Study
Time Dimension Time Dimension Time Dimension
Date Key Date Day of Week Day Number in Month Month Quarter Year Holiday Indicator
1 1/1/1999 Friday 1 January Q1 1999 Holiday
2 1/2/1999 Saturday 2 January Q1 1999 Non-Holiday
3 1/3/1999 Sunday 3 January Q1 1999 Non-Holiday
4 1/4/1999 Monday 4 January Q1 1999 Non-Holiday
Item Dimension Item Dimension Item Dimension
Itemkey Item Description SKU Number Dept Size Package Type Brand Category
1 Lasagna 6 OZ 90706287103 Grocery 6 OZ Box Cold Gourmet Frozen Foods
2 Beef Stew 6 OZ 16005393282 Grocery 6 OZ Box Cold Gourmet Frozen Foods
3 Extra Nougat 2 OZ 46817560065 Grocery 6 OZ Can Chewy Candy
53Retail Sales - Case Study
Promotion Dimension Promotion Dimension Promotion Dimension
Promo Key Promo Name Price Reduction Ad Type Media Type Promo Begin Date End Date
1 Blue Ribbon Discounts Temporary Daily Paper Paper 2000 1/1/1999 1/15/1999
2 Red Carpet Closeout Markdown Sunday Paper Paper 1000 1/3/1999 1/10/1999
3 Ad Blitz None Paper and Radio Paper and Radio 7000 1/15/1999 1/30/1999
Sales Fact Sales Fact
Date Key Item Key Store Key Promo Key POS Trxn Sales Qty Unit Sales Price Sales Amt
1 1 1 15 763457893 1 4.59 4.59
1 2 1 1 763457893 2 0.89 1.78
1 5 11 19 763457894 1 2.56 2.56
2 13 5 8 763457923 1 0.33 0.33
2 5 11 12 763457998 1 1.29 1.29
54(No Transcript)
55Dimensional Modeling Tips
- Carefully choose the labels to identify data
marts, dimension, attributes and facts - An attribute can live in one and only one
dimension, whereas a fact can be repeated in
multiple fact tables - If a single dimension appears to reside in more
than one places, several roles are probably being
played. Name the roles uniquely and treat them as
separate dimensions - A single field in the underlying source data can
have one or more logical columns associated with
it - E.g., A product attribute field may translate to
product code, product short description, and
product long description - Every fact should have a default aggregation rule
(sum, min, max, latest, semi-additive, special
algorithm, and not aggregatable) - This will serve as a requirements list for query
and report writers tools evaluations
56Thank You
- References
- Ralph Kimball
- The Data Warehouse Toolkit
- The Data Warehouse Lifecycle Toolkit