Title: ISQS 3358, Business Intelligence Dimensional Modeling
1ISQS 3358, Business IntelligenceDimensional
Modeling
- Zhangxi Lin
- Texas Tech University
1
2Outline
- Data Warehousing Approaches
- Dimensional Modeling
- Data Warehousing with Microsoft SQL Server 2005
- Case Adventure Works Cycles (AWC)
- Data Warehouse Design Phases
3Data Warehousing Approaches
4Data Warehouse Development Approaches
- Data warehouse development approaches
- Inmon Model EDW approach
- Kimball Model Data mart approach
- Which model is better?
- There is no one-size-fits-all strategy to data
warehousing - One alternative is the hosted warehouse
5General Data Warehouse Development Approaches
- Big bang approach
- Incremental approach
- Top-down incremental approach
- Bottom-up incremental approach
6Big Bang Approach
7Incremental Approach to Warehouse Development
- Multiple iterations
- Shorter implementations
- Validation of each phase
Increment 1
Strategy
Definition
Analysis
Design
Build
Iterative
Production
8Top-Down Approach
- Analyze requirements at the enterprise level
- Develop conceptual information model
- Identify and prioritize subject areas
- Complete a model of selected subject area
- Map to available data
- Perform a source system analysis
- Implement base technical architecture
- Establish metadata, extraction, and load
processes for the initial subject area
- Create and populate the initial subject area
data mart within the overall warehouse - framework
9Bottom-Up Approach
- Define the scope and coverage of the data
warehouse and analyze the source systems
within this scope
- Define the initial increment based on the
political pressure, assumed business benefit
and data volume
- Implement base technical architecture and
establish metadata, extraction, and load
processes as required by increment
- Create and populate the initial subject
areas within the overall warehouse framework
10Dimensional Modeling
11Dimensional Model
- Also called star schema
- Fact table is in the middle and dimensions
serving as the points on the star. - A normalized fact table plus denormalized
dimension tables - Facts
- Measurements associated with a specific business
process. - Most facts are additive (calculative) others are
semi-additive, non-additive, or descriptive (e.g.
factless fact table). - Many facts can be derived from other facts. So,
non-additive facts can be avoided by calculating
it from additive facts. - Grain
- The level of detail contained in the fact table
- The lowest level of detail is called atomic fact
table
11
12Dimensions
- The foundation of the dimensional model to
describe the objects of the business - The nouns of the DW/BI system
- Business processes (facts) are the verbs of the
business - Dimension tables link to all the business
processes. - A dimension shared across all processes is called
conformed dimension - The analysis involving data from more than one
business process is called drill-across.
12
13Data Cube
- Data cubes are multidimensional extensions of 2-D
tables, just as in geometry a cube is a
three-dimensional extension of a square. The word
cube brings to mind a 3-D object, and we can
think of a 3-D data cube as being a set of
similarly structured 2-D tables stacked on top of
one another. - Data cubes aren't restricted to just three
dimensions. Most OLAP systems can build data
cubes with many more dimensions allows up to 64
dimensions. - In practice, we often construct data cubes with
many dimensions, but we tend to look at just
three at a time. What makes data cubes so
valuable is that we can index the cube on one or
more of its dimensions.
13
14Determining Granularity
15Star Schema Model
Product Table Product_id Product_disc,...
Store Table Store_id District_id,...
Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_amount Sales_units, ...
Time Table Day_id Month_id Year_id,...
Item Table Item_id Item_desc,...
15
16Snowflake Schema Model
Store Table Store_id Store_desc District_id
Product Table Product_id Product_desc
District Table District_id District_desc
Sales Fact Table Item_id Store_id Product_id Week_
id Sales_amount Sales_units
Dept Table Dept_id Dept_desc Mgr_id
Mgr Table Dept_id Mgr_id Mgr_name
Item Table Item_id Item_desc Dept_id
Time Table Week_id Period_id Year_id
16
17Snowflake Schema Model
- Direct use by some tools
- More flexible to change
- Provides for speedier data loading
- Can become large and unmanageable
- Degrades query performance
- More complex metadata
Country
State
County
City
17
18Dimensional Modeling Process
- High level dimensional model design
- Choosing business model
- Declaring the grain
- Choosing dimensions
- Identifying the facts
- Detailed dimensional model development
- Dimensional model review and validation
- IS
- Core users
- Business community
- Final design iteration
19Example Commrex Real Estate Data Warehousing
- Analytic themes
- How to encourage realtors to use the online ASP
services - Value Chain
- Listors create their account
- Listors post their real estate properties to the
web-based database services and pay listing fees - Property buyers search the website-based database
and buy properties from listors. This is the
incentive for listors to use the ASP services - Business Processes
- Listor sign up
- Listor account management
- Property data posting
- Property search
- Property database maintenance
19
20IMWs Database ERD Model
Property Listing Database
Membership Database
Property ID
M1
Listor ID
Listor ID
Listor Name
? ? ?
MM
Property Type
Property Type
? ? ?
Type Name
Address
Company ID
City
? ? ?
Subtype 1
Chapter
? ? ?
Subtype 2
Functions
Chapter
MM
? ? ?
Specializations
Feature
Subtype n
? ? ?
? ? ?
? ? ?
Company ID
Legends
Comp Name
Primary Key
? ? ?
Address
Secondary Key
? ? ?
Telephone
Link to a table
? ? ?
21IMWs Data Warehouse Dimensional Model
Property Listing Fact
Membership Dimension
Property SubType Dimension
Property ID
Listor ID
Listor ID
Listor Name
? ? ?
Prop SubType
Prop SubType
? ? ?
SubType Name
Address
Company ID
City
? ? ?
Property Type
Chapter
? ? ?
Functions
Chapter
Specializations
Feature
Property Type
Property Type Dimension
Type Name
? ? ?
? ? ?
Company ID
Legends
Comp Name
? ? ?
Company Dimension
Primary Key
Address
? ? ?
Secondary Key
Telephone
? ? ?
Link to a table
22Data Warehousing with Microsoft SQL Server 2005
23Unified Dimensional Model (UDM)
- A SQL Server 2005 technology
- A UDM is a structure that sits over the top of a
data mart and looks exactly like an OLAP system
to the end user. - Advantages
- No need for a data mart.
- Can be built over one or more OLTP systems.
- Mixed data mart and OLTP system data
- Can include data from database from other vendors
and XML-formatted data - Allows OLAP cubes to be built directly on top of
transactional data - Low latency
- Ease of creation and maintenance
24Microsoft BI Toolset
- Relational engine (RDBMS)
- T-SQL
- .NET Framework Command Language Runtime (CLR)
- SQL Server Integration Services (SSIS) ETL
- Data Transformation Pipeline (DTP)
- Data Transformation Runtime (DTR)
- SQL Server Analysis Service (SSAS) queries, ad
hoc use, OLAP, data mining - Multi-Dimensional eXpressions (MDX) a scripting
language for data retrieval from dimensional
database - Dimension design
- Cube design
- Data mining
- SQL Server Reporting Services (SSRS) ad hoc
query, report building - Microsoft Visual Studio .NET is the fundamental
tool for application development
25Structure and Components of Business Intelligence
MS SQL Server 2005
SSMS
SSIS
SSAS
BIDS
SSRS
SAS EG
SAS EM
26Understanding the Cube Designer Tabs
- Cube Structure Use this tab to modify the
architecture of a cube. - Dimension Usage Use this tab to define the
relationships between dimensions and measure
groups, and the granularity of each dimension
within each measure group. - Calculations Use this tab to examine
calculations that are defined for the cube, to
define new calculations for the whole cube or for
a subcube, to reorder existing calculations, and
to debug calculations step by step by using
breakpoints. - KPIs Use this tab to create, edit, and modify
the Key Performance Indicators (KPIs) in a cube. - Actions Use this tab to create or modify
drillthrough, reporting, and other actions for
the selected cube.. - Partitions Use this tab to create and manage the
partitions for a cube. Partitions let you store
sections of a cube in different locations with
different properties, such as aggregation
definitions. - Perspectives Use this tab to create and manage
the perspectives in a cube. A perspective is a
defined subset of a cube, and is used to reduce
the perceived complexity of a cube to the
business user. - Translations Use this tab to create and manage
translated names for cube objects, such as month
or product names. - Browser Use this tab to view data in the cube.
27Case Adventure Works Cycles (AWC)
28Case Adventure Works Cycles (AWC)
- A fictitious multinational manufacturer and
seller of bicycles and accessories - Based on Bothell, Washington, USA and has
regional sales offices in several countries - http//www.msftdwtoolkit.com/
29Basic Business Information
- Product orders by category
- Product Orders by Country/Region
- Product Orders by Sales Channel
- Customers by Sales Channel Snapshot
30AWC Business Requirements - Interview summary
- Interviewee Brian Welker, VP of Sales
- Sales to resellers 37 million last year
- 17 people report to him including 3 regional
sales managers - Previous problem Hard to get information out of
the companys system - Major analytic areas
- Sales planning
- Growth analysis
- Customer analysis
- Territory analysis
- Sales performance
- Basic sales reporting
- Price lists
- Special offers
- Customer satisfaction
- International support
- Success criteria
- Easy data access, Flexible reporting and
analyzing, All data in one place - Whats missing? A lot No indication of
business value
31Business Processes
- Purchase Orders
- Distribution Center Deliveries
- Distribution Center Inventory
- Store Deliveries
- Store Inventory
- Store Sales
32Analytic Themes
- See the Excel file AW_Analytic_Themes_List.xls
33AWCs Bus Matrix
Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions
Business Process Date Product Employee Customer (Reseller) Customer (Internet) Sales Territory Currency Channel Promotion Call Reason Facility
Sales Forecasting X X X X X X X
Orders X X X X X X X X X
Call Tracking X X X X X X X
Returns X X X X X X X X
34Prioritization Grid
High
Customer Profitability
Orders
Orders Forecast
Product Profitability
Business Value / Impact
Call Tracking
Exchange Rates
Returns
Manufacturing Costs
Feasibility
Low
High
Low
35Exercise 2 A quick walk through an SSAS
application
- Learning Objectives
- How to design a data source view with SSAS based
on an existing data warehouse - How to design and deploy a cube.
- Tasks
- Analysis Service Tutorial Lesson 1 Defining a
Data Source View within an Analysis Services
Project - Analysis Service Tutorial Lesson 2 Defining and
Deploying a Cube - Deliverable
- A Word file with the screenshot of the star
schema emailed to zhangxi.lin_at_hotmail.com - The subject of the email is ISQS 3358 Exercise
2
36Supplemental Slides Data Warehouse Design
Phases
37Data Warehouse Database Design Phases
- Phase 1 Defining the business model
- Phase 2 Defining the dimensional model
- Phase 3 Defining the physical model
37
38Phase 1 Defining the Business Model
- Performing strategic analysis
- Creating the business model
- Documenting metadata
38
39Performing Strategic Analysis
- Identify crucial business processes
- Understand business processes
- Prioritize and select the business processes to
implement
High
Business Benefit
Low
Low
High
Feasibility
39
40Creating the Business Model
- Defining business requirements
- Identifying the business measures
- Identifying the dimensions
- Identifying the grain
- Identifying the business definitions and rules
- Verifying data sources
40
41Business Requirements Drive the Design Process
- Primary input
- Secondary input
Business Requirements
Existing Metadata
Production ERD Model
Research
41
42Identifying Measuresand Dimensions
- The attribute varies continuously
- Balance
- Units Sold
- Cost
- Sales
Measures
- The attribute is perceived as constant or
discrete - Product
- Location
- Time
- Size
Dimensions
42
43Using a Business Process Matrix
Business Dimensions Business Processes Business Processes Business Processes
Business Dimensions Sales Returns Inventory
Customer
Date
Product
Channel
Promotion
Sample of business process matrix
43
44Determining Granularity
44
45Identifying Business Rules
Location Geographic proximity 0 - 1 miles 1 - 5
miles gt 5 miles
45
46Documenting Metadata
- Documenting metadata should include
- Documenting the design process
- Documenting the development process
- Providing a record of changes
- Recording enhancements over time
46
47Metadata Documentation Approaches
- Automated
- Data modeling tools
- ETL tools
- End-user tools
- Manual
47
48Phase 2 Defining the Dimensional Model
- Identify fact tables
- Translate business measures into fact tables
- Analyze source system information for additional
measures - Identify dimension tables
- Link fact tables to the dimension tables
- Model the time dimension
48
49Star Dimensional Modeling
49
50Fact Table Characteristics
- Contain numerical metrics of the business
- Can hold large volumes of data
- Can grow quickly
- Can contain base, derived,and summarized data
- Are typically additive
- Are joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_amount Sales_units ...
50
51Dimension Table Characteristics
- Dimension tables have the following
characteristics - Contain textual information that represents the
attributes of the business - Contain relatively static data
- Are joined to a fact table through a foreign key
reference
51
52Star DimensionalModel Characteristics
- The model is easy for users to understand.
- Primary keys represent a dimension.
- Nonforeign key columns are values.
- Facts are usually highly normalized.
- Dimensions are completely denormalized.
- Fast response to queries is provided.
- Performance is improved by reducing table joins.
- End users can express complex queries.
- Support is provided by many front-end tools.
52
53Using Time in the Data Warehouse
- Defining standards for time is critical.
- Aggregation based on time is complex.
53
54The Time Dimension
- Time is critical to the data warehouse. A
consistent representation of time is required for
extensibility.
54
55Using Data Modeling Tools
- Tools with a GUI enable definition, modeling, and
reporting. - Avoid a mix of modeling techniques caused by
- Development pressures
- Developers with lack of knowledge
- No strategy
- Determine a strategy.
- Write and publish formally.
- Make available electronically.
55
56Phase 3 Defining the Physical Model
- Why
- Huge amount of data must be effectively processed
and retrieved in realtime. - How
- Translate the dimensional design to a physical
model for implementation. - Define storage strategy for tables and indexes.
- Perform database sizing.
- Define initial indexing strategy.
- Define partitioning strategy.
- Update metadata document with physical
information.
56
57Storage and Performance Considerations
- Database sizing
- Data partitioning
- Indexing
- Star query optimization
57
58Database Sizing - Test Load Sampling
- Analyze a representative sample of the data
chosen using proven statistical methods. - Ensure that the sample reflects
- Test loads for different periods
- Day-to-day operations
- Seasonal data and worst-case scenarios
- Indexes and summaries
58
59Data Partitioning
- Breaking up of data into separate physicalunits
that can be handled independently - Types of data partitioning
- Horizontal partitioning.
- Vertical partitioning
59
60Indexing
- Indexing is used for the following reasons
- It is a huge cost saving, greatly improving
performance and scalability. - It can replace a full table scan by a quick read
of the index followed by a read of only those
disk blocks that contain the rows needed.
60
61Parallelism
Sales table
Customers table
Parallel Execution Servers
61
62Using Summary Data
- Designing summary tables offers the following
benefits - Provides fast access to precomputed data
- Reduces use of I/O, CPU, and memory
62