Dimensional Modeling Techniques - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Dimensional Modeling Techniques

Description:

Time Dimension in different. countries. Slowly Changing Dimensions ... e.g. miscellaneous flags like order type. These can be gathered into a junk dimension ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 47
Provided by: rosshig
Category:

less

Transcript and Presenter's Notes

Title: Dimensional Modeling Techniques


1
Dimensional Modeling Techniques
2
Dimensional Models
  • A E-R model for an organization represents all
    the business processes in the enterprise
  • Sales calls, order entry, shipment, billing, etc.
  • First step in developing dimensional model
    architecture is to separate out individual
    business processes
  • Some approaches call these subject areas

3
Info Cubes
  • Basic idea of dimensional modeling is that every
    type of business data can be represented as a
    kind of cube (hypercube) of data
  • Cells of the cube contain measured values (facts)
  • Edges of the cube define the natural dimensions
    of the data

4
(No Transcript)
5
Keys
  • All keys should be anonymous (surrogate) data
    warehouse keys
  • Do not use the keys used in the production
    systems
  • Production keys can change because data is
    temporary whereas data in the data warehouse is
    long term
  • Surrogate keys take up less space usually

6
Facts
  • Facts are
  • Something that is not known in advance
  • An observation in the marketplace
  • Usually numeric
  • Any numeric field (especially floating point
    values) is likely to be a fact especially if it
    is subject to change
  • Dimensions usually dont change or change very
    slowly
  • Fact tables are usually very narrow but have many
    (millions, billions) of records

7
Example
  • Quantity sold is not floating point but
  • It is numeric
  • It is unique for each transaction
  • It is a measurable quantity
  • Standard price is a value that doesnt change
    often but
  • It is floating point value
  • It does change occasionally
  • Choice of putting it in the fact table probably
    depends on how often it changes

8
Attributes
  • Attributes are usually text fields and usually
    describe a characteristic of a tangible thing
  • For example, product flavor is an attribute that
    describes a product
  • It is not a measured quantity
  • It is known in advance

9
Dimensions
  • Attributes that describe things are organized
    into dimensions
  • Dimensions are collections of attributes that are
    highly correlated with one another
  • Attributes in a single dimension often come from
    multiple tables in the production database
  • Dimensions usually have relatively few records
    but are very wide

10
Retail Dimensions
  • Product
  • prod id, name, description, etc.
  • Customer
  • Customer id, name, address, email, etc.
  • Promotion
  • Prom id, name, terms, dates
  • Store
  • Store ID, location, contact, contact phone, etc.
  • Time
  • Day, week, month, year, etc.

11
Granularity
  • Granularity refers to the level of aggregation of
    the data
  • Individual sales quantities vs. average
  • Individual product units vs. cases
  • Dimensions are almost always at their most
    granular (atomic) level
  • Each record is an individual customer
  • Facts should also be atomic since any other level
    of aggregation can be derived from atomic data
    and the table can be gracefully extended
  • Can use a technique called aggregates to improve
    performance

12
Combining Entities in Dimensions
  • Often there are more than one way to define
    dimensions
  • Product dimension and store dimension
  • Product-Store dimension
  • Assume there are 1000 products and 100 stores
  • If all (or most) products appear in all stores
    then the dimension will have up to 100,000
    records
  • If each store only carries a few products then
    the single Product-Store dimension may not only
    make sense it may be interesting

13
Combining Entities in Dimensions
  • In general, entities should can be combined if
  • There is a one-to-one or one-to-many relationship
    between attributes
  • The attributes are highly correlated
  • Entities shouldnt be combined if
  • There is a many-to-many relationship between them
  • The entities participate independently in
    separate business process fact tables

14
Many-to-Many Attributes
There may be a many-to-manyrelationship between
ShipTo andBillTo. Since there must be a
recordfor every combination of ShipTo andBillTo
the table would become verylarge.
15
Dimensions Drilling Down
  • Dimension attributes take on the role of
  • Query constraints
  • Column headers in reports
  • Attributes can be added to SQL SELECT list and be
    used in SQL GROUP BY or Order BY clauses

16
District Brand Total Dollars Total
Cost Gross Profict Atherton Clean Fast
1,233 1,058 175 Atherton More
Power 2,239 2,200 39
Every attribute can be used as a column header.
17
Dimensions Drilling Down
  • Drilling down involves adding another attribute
    to the SQL SELECT list
  • Drilling Down can take advantage of hierarchical
    relationships among attributes in dimensions but
    can use any attributes whether they are related
    or not

18
Finance and Marketing Hierarchy in Product
Dimension
The marketinghierarchy
The finance hierarchy
Other attributesunrelated to anyhierarchy
SELECT product_category,
Drill Down
SELECT product_category, product_brand
19
Time Dimension Hierarchy
Multiple hierarchies canco-exist within the
samedimension. Notice that the dimensionsare
not normalized.
20
Partial Customer Dimension
This partial customerdimension contains
threehierarchies ShipTo Address, Corporate
hierarchy, and our sales force hierarchy.
Customer Hierarchy
21
Snowflaking
  • Snowflaking occurs when the low cardinality
    fields in dimension tables are removed into
    separate tables
  • Normalizing the dimension tables
  • Snowflaking is tempting to database types who
    like to normalize but it is usually a bad idea
  • Decreases performance
  • Decreases clarity of database
  • Prevents the use of bitmap indexes which increase
    performance dramatically

22
Snowflaked Product Dimension
23
Permissible Snowflaking
  • Creating subdimensions is permissible under
    specific circumstances
  • If some attributes in a dimension are at a
    different granularity than the others
  • If we save a significant amount of storage space
    (very large dimension)
  • If we often want to browse among the attributes
    in the subdimension

24
Demographic Subdimension of Customer Dimension
Aggregated at County Level
Individual Customers
25
Verbose Attributes
  • Dimensions should contain readable fields that
    describe the members of the dimension
  • Not good C001AX247
  • Not good GRN PAPR TOWLS 32OZ RVS
  • Remember these appear in reports so should be
    descriptive
  • Attributes can be translated in the extraction
    process

26
Good Attributes
  • Verbose
  • Descriptive
  • Complete (no missing values)
  • Quality assured
  • Indexed
  • Equally available
  • Documented

27
Time Dimension
Subdimension created in this casebecause we want
to use the sameTime Dimension in
different countries
Time dimension is critical in every data
warehouse. This dimension has adaily
granularity and has enough detail that
applications dont have to haveany hard coded
date logic.
28
Slowly Changing Dimensions
  • Dimensions are usually fairly static but
    sometimes can change slowly
  • A well designed data warehouse will specify a
    change strategy for every attribute of every
    dimension
  • Three strategies for handling changing dimensions

29
Type 1 Overwrite the value
  • The old attribute value is overwritten so the
    dimension always reflects the most recent value

Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Education ABC922-Z
Assume IntelliKidz 1.0 is reclassified as
Strategy. The Type 1 responsesimply updates the
existing record
Product
SKU Number Product Key
Description Department (Natural
Key) 12345 IntelliKidz 1.0
Strategy ABC922-Z
30
Type 1 Overwrite the value
  • Advantage is that its quick and easy
  • Disadvantage is that there is no historical
    record
  • So if the sales if IntelliKidz 1.0 takes off
    because of the reclassification we have no
    historical record of the reason
  • Correct strategy if were simply correcting a bad
    value

31
Type 2 Add a Dimension Row
  • A new record is created and the old record is kept
  • Product
    SKU Number
  • Product Key Description Department
    (Natural Key)
  • IntelliKidz 1.0 Education
    ABC922-Z
  • 67890 IntelliKidz 1.0 Strategy
    ABC922-Z

May include a effective date field but it isnt
necessary
32
Type 2 Add a Dimension Row
  • An example of why we cant use the natural key as
    our primary key
  • Type 2 changes are said to perfectly partition
    history because fact table records that are added
    after the change will reference record 67890
  • We can look at the entire history using the SKU
    or can look at before and after using the Product
    Key
  • Most common approach

33
Type 3 Add a Dimension Column
  • Adding a column is useful if we want the ability
    to see fact data as if the change never occurred
  • Sales district boundaries are altered but users
    want to still see sales in terms of the old
    boundaries

Product
Prior SKU
Number Product Key Description
Department Department (Natural Key) 12345
IntelliKidz 1.0 Strategy
Education ABC922-Z
34
Type 3 Add a Dimension Column
  • Useful when we want to maintain multiple
    alternate realities
  • We can look at all the fact data by either value
    of the attribute
  • Often occurs with an attribute which is human
    applied and has limited hard effects
  • Rarely used

35
Rapidly Changing Dimensions
  • If changes to dimensions are rapid then
  • If the dimension is small use the Type 2 approach
  • If the dimension is large then create a
    subdimension with the changing attributes

36
Rapidly Changing Dimensions
Demographics are convertedinto banded values
(e.g.0-12,000, 12,000-20,000,etc.) and then
we have a recordfor every combination of
demographic attributes. If we have five
attributes with 10 levels each then we have 105
100,000 records. If this gets too large
createmultiple demographic dimensions
(i.e.demographics and purchase-demogs) Whenever
a change occurs we adda record to the fact table
and sinceeach fact table record includes
thecustomer and demog keys, wellalways be able
to classify customers.
37
Degenerate Dimensions
  • Many dimensional designs revolve around a line
    item oriented document like a sales order
  • Line item is the natural granularity for the fact
    table
  • The problem is what to do with the document
    number (e.g. order number)
  • It isnt really a fact and doesnt belong to a
    dimension
  • The answer is to add it to the fact table as a
    degenerate dimension

38
Junk Dimensions
  • Sometimes when all the descriptive attributes are
    assigned to dimensions there are data elements
    left over
  • e.g. miscellaneous flags like order type
  • These can be gathered into a junk dimension

39
Additive Facts
  • Whenever possible facts should be perfectly
    additive
  • If a fact can be added across all dimensions it
    is perfectly additive
  • Usually measures of activity are perfectly
    additive
  • Measures of intensity are not perfectly additive
  • Account balances, inventory levels
  • These are snapshots
  • Some can be added across time dimension unless
    you sum and divide by the time units
  • Some cant be added across any dimension (e.g.
    temperature)

40
Textual Facts
  • Textual attributes should almost always be in
    dimension tables
  • There are some specific instances when a textual
    attribute can be a fact
  • A description of weather on an accident report
  • This value is unique for each record so it would
    be difficult to add to a dimension since every
    accident would then require a unique dimension
    record

41
Design Process
  • Single-source or multi-source data marts
  • Declare the fact table grain
  • Very precisely define what a record in the fact
    table is (Sales order line item)
  • Choosing the dimensions
  • Choosing the facts

42
Fact Table Grain
  • Individual transactions
  • Each sales transaction is a fact record
  • Each insurance claim transaction is a fact record
  • Each ATM transaction is a fact record
  • Very simple structure and can vary wildly in
    number

43
Fact Table Grain
  • Snapshots
  • Each daily product sales total in each store is a
    fact record
  • Each monthly account snapshot is a fact record
  • Can be complex with many attributes including
    many non-additive attributes
  • In many cases snapshots can be constructed from
    transaction facts so we would only create a
    snapshot fact table for performance reasons
  • In some cases (like the second above) the
    snapshot is not a simple aggregation

44
Fact Table Grain
  • Line items
  • Each line item on each order is a fact record
  • Each line item on each shipments invoice is a
    fact record
  • Each coverage in each individual insurance policy
    is a fact record
  • Represent a line item on some type of control
    document (i.e. order)

Back
45
Choosing the Dimensions
  • Once the fact table is established choosing the
    dimensions may be straightforward
  • Some are natural (i.e. customer for order line
    items)
  • Some are required for the users reporting
    requirements
  • Dimensions must be at the same granularity or
    coarser then the fact table (cant have monthly
    sales in the fact table and have days in the time
    dimension)

Back
46
Choosing the Facts
  • Transaction facts are usually just the amount of
    the transaction
  • Snapshot facts can be complex
  • Line item facts can contain several values (i.e.
    quantities, gross amounts, adjustments, etc.)
  • Facts should always match the grain of the table
  • Aggregates can be created to increase performance
Write a Comment
User Comments (0)
About PowerShow.com