Dimensional Modeling Techniques - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

Dimensional Modeling Techniques

Description:

Time Dimension in different. countries. Slowly Changing Dimensions ... e.g. miscellaneous flags like order type. These can be gathered into a junk dimension ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 47

Provided by: rosshig

Category:

more less

Transcript and Presenter's Notes

Title: Dimensional Modeling Techniques

1
Dimensional Modeling Techniques
2
Dimensional Models

A E-R model for an organization represents all
the business processes in the enterprise
Sales calls, order entry, shipment, billing, etc.
First step in developing dimensional model
architecture is to separate out individual
business processes
Some approaches call these subject areas

3
Info Cubes

Basic idea of dimensional modeling is that every
type of business data can be represented as a
kind of cube (hypercube) of data
Cells of the cube contain measured values (facts)
Edges of the cube define the natural dimensions
of the data

4
(No Transcript)
5
Keys

All keys should be anonymous (surrogate) data
warehouse keys
Do not use the keys used in the production
systems
Production keys can change because data is
temporary whereas data in the data warehouse is
long term
Surrogate keys take up less space usually

6
Facts

Facts are
Something that is not known in advance
An observation in the marketplace
Usually numeric
Any numeric field (especially floating point
values) is likely to be a fact especially if it
is subject to change
Dimensions usually dont change or change very
slowly
Fact tables are usually very narrow but have many
(millions, billions) of records

7
Example

Quantity sold is not floating point but
It is numeric
It is unique for each transaction
It is a measurable quantity
Standard price is a value that doesnt change
often but
It is floating point value
It does change occasionally
Choice of putting it in the fact table probably
depends on how often it changes

8
Attributes

Attributes are usually text fields and usually
describe a characteristic of a tangible thing
For example, product flavor is an attribute that
describes a product
It is not a measured quantity
It is known in advance

9
Dimensions

Attributes that describe things are organized
into dimensions
Dimensions are collections of attributes that are
highly correlated with one another
Attributes in a single dimension often come from
multiple tables in the production database
Dimensions usually have relatively few records
but are very wide

10
Retail Dimensions

Product
prod id, name, description, etc.
Customer
Customer id, name, address, email, etc.
Promotion
Prom id, name, terms, dates
Store
Store ID, location, contact, contact phone, etc.
Time
Day, week, month, year, etc.

11
Granularity

Granularity refers to the level of aggregation of
the data
Individual sales quantities vs. average
Individual product units vs. cases
Dimensions are almost always at their most
granular (atomic) level
Each record is an individual customer
Facts should also be atomic since any other level
of aggregation can be derived from atomic data
and the table can be gracefully extended
Can use a technique called aggregates to improve
performance

12
Combining Entities in Dimensions

Often there are more than one way to define
dimensions
Product dimension and store dimension
Product-Store dimension
Assume there are 1000 products and 100 stores
If all (or most) products appear in all stores
then the dimension will have up to 100,000
records
If each store only carries a few products then
the single Product-Store dimension may not only
make sense it may be interesting

13
Combining Entities in Dimensions

In general, entities should can be combined if
There is a one-to-one or one-to-many relationship
between attributes
The attributes are highly correlated
Entities shouldnt be combined if
There is a many-to-many relationship between them
The entities participate independently in
separate business process fact tables

14
Many-to-Many Attributes
There may be a many-to-manyrelationship between
ShipTo andBillTo. Since there must be a
recordfor every combination of ShipTo andBillTo
the table would become verylarge.
15
Dimensions Drilling Down

Dimension attributes take on the role of
Query constraints
Column headers in reports
Attributes can be added to SQL SELECT list and be
used in SQL GROUP BY or Order BY clauses

16
District Brand Total Dollars Total
Cost Gross Profict Atherton Clean Fast
1,233 1,058 175 Atherton More
Power 2,239 2,200 39
Every attribute can be used as a column header.
17
Dimensions Drilling Down

Drilling down involves adding another attribute
to the SQL SELECT list
Drilling Down can take advantage of hierarchical
relationships among attributes in dimensions but
can use any attributes whether they are related
or not

18
Finance and Marketing Hierarchy in Product
Dimension
The marketinghierarchy
The finance hierarchy
Other attributesunrelated to anyhierarchy
SELECT product_category,
Drill Down
SELECT product_category, product_brand
19
Time Dimension Hierarchy
Multiple hierarchies canco-exist within the
samedimension. Notice that the dimensionsare
not normalized.
20
Partial Customer Dimension
This partial customerdimension contains
threehierarchies ShipTo Address, Corporate
hierarchy, and our sales force hierarchy.
Customer Hierarchy
21
Snowflaking

Snowflaking occurs when the low cardinality
fields in dimension tables are removed into
separate tables
Normalizing the dimension tables
Snowflaking is tempting to database types who
like to normalize but it is usually a bad idea
Decreases performance
Decreases clarity of database
Prevents the use of bitmap indexes which increase
performance dramatically

22
Snowflaked Product Dimension
23
Permissible Snowflaking

Creating subdimensions is permissible under
specific circumstances
If some attributes in a dimension are at a
different granularity than the others
If we save a significant amount of storage space
(very large dimension)
If we often want to browse among the attributes
in the subdimension

24
Demographic Subdimension of Customer Dimension
Aggregated at County Level
Individual Customers
25
Verbose Attributes

Dimensions should contain readable fields that
describe the members of the dimension
Not good C001AX247
Not good GRN PAPR TOWLS 32OZ RVS
Remember these appear in reports so should be
descriptive
Attributes can be translated in the extraction
process

26
Good Attributes

Verbose
Descriptive
Complete (no missing values)
Quality assured
Indexed
Equally available
Documented

27
Time Dimension
Subdimension created in this casebecause we want
to use the sameTime Dimension in
different countries
Time dimension is critical in every data
warehouse. This dimension has adaily
granularity and has enough detail that
applications dont have to haveany hard coded
date logic.
28
Slowly Changing Dimensions

Dimensions are usually fairly static but
sometimes can change slowly
A well designed data warehouse will specify a
change strategy for every attribute of every
dimension
Three strategies for handling changing dimensions

29
Type 1 Overwrite the value

The old attribute value is overwritten so the
dimension always reflects the most recent value

Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Education ABC922-Z
Assume IntelliKidz 1.0 is reclassified as
Strategy. The Type 1 responsesimply updates the
existing record
Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Strategy ABC922-Z
30
Type 1 Overwrite the value

Advantage is that its quick and easy
Disadvantage is that there is no historical
record
So if the sales if IntelliKidz 1.0 takes off
because of the reclassification we have no
historical record of the reason
Correct strategy if were simply correcting a bad
value

31
Type 2 Add a Dimension Row

A new record is created and the old record is kept

Product
SKU Number
Product Key Description Department
(Natural Key)
IntelliKidz 1.0 Education
ABC922-Z
67890 IntelliKidz 1.0 Strategy
ABC922-Z

May include a effective date field but it isnt
necessary
32
Type 2 Add a Dimension Row

An example of why we cant use the natural key as
our primary key
Type 2 changes are said to perfectly partition
history because fact table records that are added
after the change will reference record 67890
We can look at the entire history using the SKU
or can look at before and after using the Product
Key
Most common approach

33
Type 3 Add a Dimension Column

Adding a column is useful if we want the ability
to see fact data as if the change never occurred
Sales district boundaries are altered but users
want to still see sales in terms of the old
boundaries

Product
Prior SKU
Number Product Key Description
Department Department (Natural Key) 12345
IntelliKidz 1.0 Strategy
Education ABC922-Z
34
Type 3 Add a Dimension Column

Useful when we want to maintain multiple
alternate realities
We can look at all the fact data by either value
of the attribute
Often occurs with an attribute which is human
applied and has limited hard effects
Rarely used

35
Rapidly Changing Dimensions

If changes to dimensions are rapid then
If the dimension is small use the Type 2 approach
If the dimension is large then create a
subdimension with the changing attributes

36
Rapidly Changing Dimensions
Demographics are convertedinto banded values
(e.g.0-12,000, 12,000-20,000,etc.) and then
we have a recordfor every combination of
demographic attributes. If we have five
attributes with 10 levels each then we have 105
100,000 records. If this gets too large
createmultiple demographic dimensions
(i.e.demographics and purchase-demogs) Whenever
a change occurs we adda record to the fact table
and sinceeach fact table record includes
thecustomer and demog keys, wellalways be able
to classify customers.
37
Degenerate Dimensions

Many dimensional designs revolve around a line
item oriented document like a sales order
Line item is the natural granularity for the fact
table
The problem is what to do with the document
number (e.g. order number)
It isnt really a fact and doesnt belong to a
dimension
The answer is to add it to the fact table as a
degenerate dimension

38
Junk Dimensions

Sometimes when all the descriptive attributes are
assigned to dimensions there are data elements
left over
e.g. miscellaneous flags like order type
These can be gathered into a junk dimension

39
Additive Facts

Whenever possible facts should be perfectly
additive
If a fact can be added across all dimensions it
is perfectly additive
Usually measures of activity are perfectly
additive
Measures of intensity are not perfectly additive
Account balances, inventory levels
These are snapshots
Some can be added across time dimension unless
you sum and divide by the time units
Some cant be added across any dimension (e.g.
temperature)

40
Textual Facts

Textual attributes should almost always be in
dimension tables
There are some specific instances when a textual
attribute can be a fact
A description of weather on an accident report
This value is unique for each record so it would
be difficult to add to a dimension since every
accident would then require a unique dimension
record

41
Design Process

Single-source or multi-source data marts
Declare the fact table grain
Very precisely define what a record in the fact
table is (Sales order line item)
Choosing the dimensions
Choosing the facts

42
Fact Table Grain

Individual transactions
Each sales transaction is a fact record
Each insurance claim transaction is a fact record
Each ATM transaction is a fact record
Very simple structure and can vary wildly in
number

43
Fact Table Grain

Snapshots
Each daily product sales total in each store is a
fact record
Each monthly account snapshot is a fact record
Can be complex with many attributes including
many non-additive attributes
In many cases snapshots can be constructed from
transaction facts so we would only create a
snapshot fact table for performance reasons
In some cases (like the second above) the
snapshot is not a simple aggregation

44
Fact Table Grain

Line items
Each line item on each order is a fact record
Each line item on each shipments invoice is a
fact record
Each coverage in each individual insurance policy
is a fact record
Represent a line item on some type of control
document (i.e. order)

Back
45
Choosing the Dimensions

Once the fact table is established choosing the
dimensions may be straightforward
Some are natural (i.e. customer for order line
items)
Some are required for the users reporting
requirements
Dimensions must be at the same granularity or
coarser then the fact table (cant have monthly
sales in the fact table and have days in the time
dimension)

Back
46
Choosing the Facts

Transaction facts are usually just the amount of
the transaction
Snapshot facts can be complex
Line item facts can contain several values (i.e.
quantities, gross amounts, adjustments, etc.)
Facts should always match the grain of the table
Aggregates can be created to increase performance

Write a Comment

User Comments (0)