ISQS 3358, Business Intelligence Dimensional Modeling - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

ISQS 3358, Business Intelligence Dimensional Modeling

Description:

ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University * * * The Time Dimension Because online transaction data, typically the source ... – PowerPoint PPT presentation

Number of Views:368
Avg rating:3.0/5.0
Slides: 63
Provided by: Jud8176
Category:

less

Transcript and Presenter's Notes

Title: ISQS 3358, Business Intelligence Dimensional Modeling


1
ISQS 3358, Business IntelligenceDimensional
Modeling
  • Zhangxi Lin
  • Texas Tech University

1
2
Outline
  • Data Warehousing Approaches
  • Dimensional Modeling
  • Data Warehousing with Microsoft SQL Server 2005
  • Case Adventure Works Cycles (AWC)
  • Data Warehouse Design Phases

3
Data Warehousing Approaches
4
Data Warehouse Development Approaches
  • Data warehouse development approaches
  • Inmon Model EDW approach
  • Kimball Model Data mart approach
  • Which model is better?
  • There is no one-size-fits-all strategy to data
    warehousing
  • One alternative is the hosted warehouse

5
General Data Warehouse Development Approaches
  • Big bang approach
  • Incremental approach
  • Top-down incremental approach
  • Bottom-up incremental approach

6
Big Bang Approach
7
Incremental Approach to Warehouse Development
  • Multiple iterations
  • Shorter implementations
  • Validation of each phase

Increment 1
Strategy
Definition
Analysis
Design
Build
Iterative
Production
8
Top-Down Approach
  • Analyze requirements at the enterprise level
  • Develop conceptual information model
  • Identify and prioritize subject areas
  • Complete a model of selected subject area
  • Map to available data
  • Perform a source system analysis
  • Implement base technical architecture
  • Establish metadata, extraction, and load
    processes for the initial subject area
  • Create and populate the initial subject area
    data mart within the overall warehouse
  • framework

9
Bottom-Up Approach
  • Define the scope and coverage of the data
    warehouse and analyze the source systems
    within this scope
  • Define the initial increment based on the
    political pressure, assumed business benefit
    and data volume
  • Implement base technical architecture and
    establish metadata, extraction, and load
    processes as required by increment
  • Create and populate the initial subject
    areas within the overall warehouse framework

10
Dimensional Modeling
11
Dimensional Model
  • Also called star schema
  • Fact table is in the middle and dimensions
    serving as the points on the star.
  • A normalized fact table plus denormalized
    dimension tables
  • Facts
  • Measurements associated with a specific business
    process.
  • Most facts are additive (calculative) others are
    semi-additive, non-additive, or descriptive (e.g.
    factless fact table).
  • Many facts can be derived from other facts. So,
    non-additive facts can be avoided by calculating
    it from additive facts.
  • Grain
  • The level of detail contained in the fact table
  • The lowest level of detail is called atomic fact
    table

11
12
Dimensions
  • The foundation of the dimensional model to
    describe the objects of the business
  • The nouns of the DW/BI system
  • Business processes (facts) are the verbs of the
    business
  • Dimension tables link to all the business
    processes.
  • A dimension shared across all processes is called
    conformed dimension
  • The analysis involving data from more than one
    business process is called drill-across.

12
13
Data Cube
  • Data cubes are multidimensional extensions of 2-D
    tables, just as in geometry a cube is a
    three-dimensional extension of a square. The word
    cube brings to mind a 3-D object, and we can
    think of a 3-D data cube as being a set of
    similarly structured 2-D tables stacked on top of
    one another.
  • Data cubes aren't restricted to just three
    dimensions. Most OLAP systems can build data
    cubes with many more dimensions allows up to 64
    dimensions.
  • In practice, we often construct data cubes with
    many dimensions, but we tend to look at just
    three at a time. What makes data cubes so
    valuable is that we can index the cube on one or
    more of its dimensions.

13
14
Determining Granularity
15
Star Schema Model
Product Table Product_id Product_disc,...
Store Table Store_id District_id,...
Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_amount Sales_units, ...
Time Table Day_id Month_id Year_id,...
Item Table Item_id Item_desc,...
15
16
Snowflake Schema Model
Store Table Store_id Store_desc District_id
Product Table Product_id Product_desc
District Table District_id District_desc
Sales Fact Table Item_id Store_id Product_id Week_
id Sales_amount Sales_units
Dept Table Dept_id Dept_desc Mgr_id
Mgr Table Dept_id Mgr_id Mgr_name
Item Table Item_id Item_desc Dept_id
Time Table Week_id Period_id Year_id
16
17
Snowflake Schema Model
  • Direct use by some tools
  • More flexible to change
  • Provides for speedier data loading
  • Can become large and unmanageable
  • Degrades query performance
  • More complex metadata

Country
State
County
City
17
18
Dimensional Modeling Process
  • High level dimensional model design
  • Choosing business model
  • Declaring the grain
  • Choosing dimensions
  • Identifying the facts
  • Detailed dimensional model development
  • Dimensional model review and validation
  • IS
  • Core users
  • Business community
  • Final design iteration

19
Example Commrex Real Estate Data Warehousing
  • Analytic themes
  • How to encourage realtors to use the online ASP
    services
  • Value Chain
  • Listors create their account
  • Listors post their real estate properties to the
    web-based database services and pay listing fees
  • Property buyers search the website-based database
    and buy properties from listors. This is the
    incentive for listors to use the ASP services
  • Business Processes
  • Listor sign up
  • Listor account management
  • Property data posting
  • Property search
  • Property database maintenance

19
20
IMWs Database ERD Model
Property Listing Database
Membership Database
Property ID
M1
Listor ID
Listor ID
Listor Name
? ? ?
MM
Property Type
Property Type
? ? ?
Type Name
Address
Company ID
City
? ? ?
Subtype 1
Chapter
? ? ?
Subtype 2
Functions
Chapter
MM
? ? ?
Specializations
Feature
Subtype n
? ? ?
? ? ?
? ? ?
Company ID
Legends
Comp Name
Primary Key
? ? ?
Address
Secondary Key
? ? ?
Telephone
Link to a table
? ? ?
21
IMWs Data Warehouse Dimensional Model
Property Listing Fact
Membership Dimension
Property SubType Dimension
Property ID
Listor ID
Listor ID
Listor Name
? ? ?
Prop SubType
Prop SubType
? ? ?
SubType Name
Address
Company ID
City
? ? ?
Property Type
Chapter
? ? ?
Functions
Chapter
Specializations
Feature
Property Type
Property Type Dimension
Type Name
? ? ?
? ? ?
Company ID
Legends
Comp Name
? ? ?
Company Dimension
Primary Key
Address
? ? ?
Secondary Key
Telephone
? ? ?
Link to a table
22
Data Warehousing with Microsoft SQL Server 2005
23
Unified Dimensional Model (UDM)
  • A SQL Server 2005 technology
  • A UDM is a structure that sits over the top of a
    data mart and looks exactly like an OLAP system
    to the end user.
  • Advantages
  • No need for a data mart.
  • Can be built over one or more OLTP systems.
  • Mixed data mart and OLTP system data
  • Can include data from database from other vendors
    and XML-formatted data
  • Allows OLAP cubes to be built directly on top of
    transactional data
  • Low latency
  • Ease of creation and maintenance

24
Microsoft BI Toolset
  • Relational engine (RDBMS)
  • T-SQL
  • .NET Framework Command Language Runtime (CLR)
  • SQL Server Integration Services (SSIS) ETL
  • Data Transformation Pipeline (DTP)
  • Data Transformation Runtime (DTR)
  • SQL Server Analysis Service (SSAS) queries, ad
    hoc use, OLAP, data mining
  • Multi-Dimensional eXpressions (MDX) a scripting
    language for data retrieval from dimensional
    database
  • Dimension design
  • Cube design
  • Data mining
  • SQL Server Reporting Services (SSRS) ad hoc
    query, report building
  • Microsoft Visual Studio .NET is the fundamental
    tool for application development

25
Structure and Components of Business Intelligence
MS SQL Server 2005
SSMS
SSIS
SSAS
BIDS
SSRS
SAS EG
SAS EM
26
Understanding the Cube Designer Tabs
  • Cube Structure Use this tab to modify the
    architecture of a cube.
  • Dimension Usage Use this tab to define the
    relationships between dimensions and measure
    groups, and the granularity of each dimension
    within each measure group.
  • Calculations Use this tab to examine
    calculations that are defined for the cube, to
    define new calculations for the whole cube or for
    a subcube, to reorder existing calculations, and
    to debug calculations step by step by using
    breakpoints.
  • KPIs Use this tab to create, edit, and modify
    the Key Performance Indicators (KPIs) in a cube.
  • Actions Use this tab to create or modify
    drillthrough, reporting, and other actions for
    the selected cube..
  • Partitions Use this tab to create and manage the
    partitions for a cube. Partitions let you store
    sections of a cube in different locations with
    different properties, such as aggregation
    definitions.
  • Perspectives Use this tab to create and manage
    the perspectives in a cube. A perspective is a
    defined subset of a cube, and is used to reduce
    the perceived complexity of a cube to the
    business user.
  • Translations Use this tab to create and manage
    translated names for cube objects, such as month
    or product names.
  • Browser Use this tab to view data in the cube.

27
Case Adventure Works Cycles (AWC)
28
Case Adventure Works Cycles (AWC)
  • A fictitious multinational manufacturer and
    seller of bicycles and accessories
  • Based on Bothell, Washington, USA and has
    regional sales offices in several countries
  • http//www.msftdwtoolkit.com/

29
Basic Business Information
  • Product orders by category
  • Product Orders by Country/Region
  • Product Orders by Sales Channel
  • Customers by Sales Channel Snapshot

30
AWC Business Requirements - Interview summary
  • Interviewee Brian Welker, VP of Sales
  • Sales to resellers 37 million last year
  • 17 people report to him including 3 regional
    sales managers
  • Previous problem Hard to get information out of
    the companys system
  • Major analytic areas
  • Sales planning
  • Growth analysis
  • Customer analysis
  • Territory analysis
  • Sales performance
  • Basic sales reporting
  • Price lists
  • Special offers
  • Customer satisfaction
  • International support
  • Success criteria
  • Easy data access, Flexible reporting and
    analyzing, All data in one place
  • Whats missing? A lot No indication of
    business value

31
Business Processes
  • Purchase Orders
  • Distribution Center Deliveries
  • Distribution Center Inventory
  • Store Deliveries
  • Store Inventory
  • Store Sales

32
Analytic Themes
  • See the Excel file AW_Analytic_Themes_List.xls

33
AWCs Bus Matrix
  Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions Dimensions
Business Process Date Product Employee Customer (Reseller) Customer (Internet) Sales Territory Currency Channel Promotion Call Reason Facility
Sales Forecasting X X X X X X X        
Orders X X X X X X X X X    
Call Tracking X X X X X X       X  
Returns X X   X X X X X   X
34
Prioritization Grid
High
Customer Profitability
Orders
Orders Forecast
Product Profitability
Business Value / Impact
Call Tracking
Exchange Rates
Returns
Manufacturing Costs
Feasibility
Low
High
Low
35
Exercise 2 A quick walk through an SSAS
application
  • Learning Objectives
  • How to design a data source view with SSAS based
    on an existing data warehouse
  • How to design and deploy a cube.
  • Tasks
  • Analysis Service Tutorial Lesson 1 Defining a
    Data Source View within an Analysis Services
    Project
  • Analysis Service Tutorial Lesson 2 Defining and
    Deploying a Cube
  • Deliverable
  • A Word file with the screenshot of the star
    schema emailed to zhangxi.lin_at_hotmail.com
  • The subject of the email is ISQS 3358 Exercise
    2

36
Supplemental Slides Data Warehouse Design
Phases
37
Data Warehouse Database Design Phases
  • Phase 1 Defining the business model
  • Phase 2 Defining the dimensional model
  • Phase 3 Defining the physical model

37
38
Phase 1 Defining the Business Model
  • Performing strategic analysis
  • Creating the business model
  • Documenting metadata

38
39
Performing Strategic Analysis
  • Identify crucial business processes
  • Understand business processes
  • Prioritize and select the business processes to
    implement

High
Business Benefit
Low
Low
High
Feasibility
39
40
Creating the Business Model
  • Defining business requirements
  • Identifying the business measures
  • Identifying the dimensions
  • Identifying the grain
  • Identifying the business definitions and rules
  • Verifying data sources

40
41
Business Requirements Drive the Design Process
  • Primary input
  • Secondary input

Business Requirements
Existing Metadata
Production ERD Model
Research
41
42
Identifying Measuresand Dimensions
  • The attribute varies continuously
  • Balance
  • Units Sold
  • Cost
  • Sales

Measures
  • The attribute is perceived as constant or
    discrete
  • Product
  • Location
  • Time
  • Size

Dimensions
42
43
Using a Business Process Matrix
Business Dimensions Business Processes Business Processes Business Processes
Business Dimensions Sales Returns Inventory
Customer
Date
Product
Channel
Promotion
Sample of business process matrix
43
44
Determining Granularity
44
45
Identifying Business Rules
Location Geographic proximity 0 - 1 miles 1 - 5
miles gt 5 miles
45
46
Documenting Metadata
  • Documenting metadata should include
  • Documenting the design process
  • Documenting the development process
  • Providing a record of changes
  • Recording enhancements over time

46
47
Metadata Documentation Approaches
  • Automated
  • Data modeling tools
  • ETL tools
  • End-user tools
  • Manual

47
48
Phase 2 Defining the Dimensional Model
  • Identify fact tables
  • Translate business measures into fact tables
  • Analyze source system information for additional
    measures
  • Identify dimension tables
  • Link fact tables to the dimension tables
  • Model the time dimension

48
49
Star Dimensional Modeling
49
50
Fact Table Characteristics
  • Contain numerical metrics of the business
  • Can hold large volumes of data
  • Can grow quickly
  • Can contain base, derived,and summarized data
  • Are typically additive
  • Are joined to dimension tables through foreign
    keys that reference primary keys in the
    dimension tables

Sales Fact Table Product_id Store_id Item_id Day_i
d Sales_amount Sales_units ...
50
51
Dimension Table Characteristics
  • Dimension tables have the following
    characteristics
  • Contain textual information that represents the
    attributes of the business
  • Contain relatively static data
  • Are joined to a fact table through a foreign key
    reference

51
52
Star DimensionalModel Characteristics
  • The model is easy for users to understand.
  • Primary keys represent a dimension.
  • Nonforeign key columns are values.
  • Facts are usually highly normalized.
  • Dimensions are completely denormalized.
  • Fast response to queries is provided.
  • Performance is improved by reducing table joins.
  • End users can express complex queries.
  • Support is provided by many front-end tools.

52
53
Using Time in the Data Warehouse
  • Defining standards for time is critical.
  • Aggregation based on time is complex.

53
54
The Time Dimension
  • Time is critical to the data warehouse. A
    consistent representation of time is required for
    extensibility.

54
55
Using Data Modeling Tools
  • Tools with a GUI enable definition, modeling, and
    reporting.
  • Avoid a mix of modeling techniques caused by
  • Development pressures
  • Developers with lack of knowledge
  • No strategy
  • Determine a strategy.
  • Write and publish formally.
  • Make available electronically.

55
56
Phase 3 Defining the Physical Model
  • Why
  • Huge amount of data must be effectively processed
    and retrieved in realtime.
  • How
  • Translate the dimensional design to a physical
    model for implementation.
  • Define storage strategy for tables and indexes.
  • Perform database sizing.
  • Define initial indexing strategy.
  • Define partitioning strategy.
  • Update metadata document with physical
    information.

56
57
Storage and Performance Considerations
  • Database sizing
  • Data partitioning
  • Indexing
  • Star query optimization

57
58
Database Sizing - Test Load Sampling
  • Analyze a representative sample of the data
    chosen using proven statistical methods.
  • Ensure that the sample reflects
  • Test loads for different periods
  • Day-to-day operations
  • Seasonal data and worst-case scenarios
  • Indexes and summaries

58
59
Data Partitioning
  • Breaking up of data into separate physicalunits
    that can be handled independently
  • Types of data partitioning
  • Horizontal partitioning.
  • Vertical partitioning

59
60
Indexing
  • Indexing is used for the following reasons
  • It is a huge cost saving, greatly improving
    performance and scalability.
  • It can replace a full table scan by a quick read
    of the index followed by a read of only those
    disk blocks that contain the rows needed.

60
61
Parallelism
Sales table
Customers table
Parallel Execution Servers
61
62
Using Summary Data
  • Designing summary tables offers the following
    benefits
  • Provides fast access to precomputed data
  • Reduces use of I/O, CPU, and memory

62
Write a Comment
User Comments (0)
About PowerShow.com