Title: Zhangxi Lin
1ISQS 6339 Business IntelligenceData Warehousing
Approaches
- Zhangxi Lin
- Texas Tech University
2Outline
- Microsoft SQL Server for data warehousing
- Data warehousing methodology
- Case - Adventure Works Cycles (AWC) data
warehousing - Course Structure
3Microsoft SQL Server for Data Warehousing
4Unified Dimensional Model (UDM)
- A SQL Server 2005 technology
- A UDM is a structure that sits over the top of a
data mart and looks exactly like an OLAP system
to the end user. - Advantages
- No need for a data mart.
- Can be built over one or more OLTP systems.
- Mixed data mart and OLTP system data
- Can include data from database from other vendors
and XML-formatted data - Allows OLAP cubes to be built directly on top of
transactional data - Low latency
- Ease of creation and maintenance
5Microsoft BI Toolset
- Relational engine (RDBMS)
- T-SQL
- .NET Framework Command Language Runtime (CLR)
- SQL Server Integration Services (SSIS) ETL
- Data Transformation Pipeline (DTP)
- Data Transformation Runtime (DTR)
- SQL Server Analysis Service (SSAS) queries, ad
hoc use, OLAP, data mining - Multi-Dimensional eXpressions (MDX) a scripting
language for data retrieval from dimensional
database - Dimension design
- Cube design
- Data mining
- SQL Server Reporting Services (SSRS) ad hoc
query, report building - Microsoft Visual Studio .NET is the fundamental
tool for application development
6Structure and Components of Business Intelligence
MS SQL Server 2005
SSMS
SSIS
SSAS
BIDS
SSRS
SAS EG
SAS EM
7Understanding the Cube Designer Tabs
- Cube Structure Use this tab to modify the
architecture of a cube. - Dimension Usage Use this tab to define the
relationships between dimensions and measure
groups, and the granularity of each dimension
within each measure group. - Calculations Use this tab to examine
calculations that are defined for the cube, to
define new calculations for the whole cube or for
a subcube, to reorder existing calculations, and
to debug calculations step by step by using
breakpoints. - KPIs Use this tab to create, edit, and modify
the Key Performance Indicators (KPIs) in a cube. - Actions Use this tab to create or modify
drillthrough, reporting, and other actions for
the selected cube.. - Partitions Use this tab to create and manage the
partitions for a cube. Partitions let you store
sections of a cube in different locations with
different properties, such as aggregation
definitions. - Perspectives Use this tab to create and manage
the perspectives in a cube. A perspective is a
defined subset of a cube, and is used to reduce
the perceived complexity of a cube to the
business user. - Translations Use this tab to create and manage
translated names for cube objects, such as month
or product names. - Browser Use this tab to view data in the cube.
8Data Warehousing Methodology
9Dimensional Modeling Process
- High level dimensional model design
- Choosing business model
- Declaring the grain
- Choosing dimensions
- Identifying the facts
- Detailed dimensional model development
- Dimensional model review and validation
- IS
- Core users
- Business community
- Final design iteration
10Warehouse Development Approaches
- Big bang approach
- Incremental approach
- Top-down incremental approach
- Bottom-up incremental approach
11Big Bang Approach
12Top-Down Approach
- Analyze requirements at the enterprise level
- Develop conceptual information model
- Identify and prioritize subject areas
- Complete a model of selected subject area
- Map to available data
- Perform a source system analysis
- Implement base technical architecture
- Establish metadata, extraction, and load
processes for the initial subject area
- Create and populate the initial subject area
data mart within the overall warehouse - framework
13Bottom-Up Approach
- Define the scope and coverage of the data
warehouse and analyze the source systems
within this scope
- Define the initial increment based on the
political pressure, assumed business benefit
and data volume
- Implement base technical architecture and
establish metadata, extraction, and load
processes as required by increment
- Create and populate the initial subject
areas within the overall warehouse framework
14Incremental Approach to Warehouse Development
- Multiple iterations
- Shorter implementations
- Validation of each phase
Increment 1
Strategy
Definition
Analysis
Design
Build
Iterative
Production
15Data Warehousing Process Components
- Methodology
- Architecture
- Extraction, Transformation, and Load (ETL)
- Implementation
- Operation and Support
16Methodology
- Ensures a successful data warehouse
- Encourages incremental development
- Provides a staged approach to an enterprisewide
warehouse - Safe
- Manageable
- Proven
- Recommended
17Architecture
- Provides the planning, structure, and
standardization needed to ensure integration of
multiple components, projects, and processes
across time. - Establishes the framework, standards, and
procedures for the data warehouse at an
enterprise level. - The Data Warehousing Institute
18Extraction, Transformation, and Load (ETL)
- Effective data extract, transform and load (ETL)
processes represent the number one success factor
for your data warehouse project and can absorb up
to 70 percent of the time spent on a typical data
warehousing project. - DM Review, March 2001
19Implementation
Data Warehouse Architecture
Ex., Incremental Implementation
Implementation
Increment 1
Increment 2
Increment n
20Operation and Support
- Data access and reporting
- Refreshing warehouse data
- Monitoring
- Responding to change
21Phases of theIncremental Approach
- Strategy
- Definition
- Analysis
- Design
- Build
- Production
Increment 1
22Strategy Phase Deliverables
- Business goals and objectives
- Data warehouse purpose, objectives, and scope
- Enterprise data warehouse logical model
- Incremental milestones
- Source systems data flows
- Subject area gap analysis
23Strategy Phase Deliverables
- Data acquisition strategy
- Data quality strategy
- Metadata strategy
- Data access environment
- Training strategy
24Data Warehouse Case - AWC
25Adventure Works Cycles (AWC)
- A fictitious multinational manufacturer and
seller of bicycles and accessories - Based on Bothell, Washington, USA and has
regional sales offices in several countries - http//www.msftdwtoolkit.com/
26Basic Business Information
- Product orders by category
- Product Orders by Country/Region
- Product Orders by Sales Channel
- Customers by Sales Channel Snapshot
27AWC Business Requirements - Interview summary
- Interviewee Brian Welker, VP of Sales
- Sales to resellers 37 million last year
- 17 people report to him including 3 regional
sales managers - Previous problem Hard to get information out of
the companys system - Major analytic areas
- Sales planning
- Growth analysis
- Customer analysis
- Territory analysis
- Sales performance
- Basic sales reporting
- Price lists
- Special offers
- Customer satisfaction
- International support
- Success criteria
- Easy data access, Flexible reporting and
analyzing, All data in one place - Whats missing? A lot No indication of
business value
28Business Processes
- Purchase Orders
- Distribution Center Deliveries
- Distribution Center Inventory
- Store Deliveries
- Store Inventory
- Store Sales
29Analytic Themes
- See the Excel file AW_Analytic_Themes_List.xls
30AWCs Bus Matrix
31Prioritization Grid
High
Customer Profitability
Orders
Orders Forecast
Product Profitability
Business Value / Impact
Call Tracking
Exchange Rates
Returns
Manufacturing Costs
Feasibility
Low
High
Low
32Business Dimensional Lifecycle
Business Reqts definition
Technical Arch. Design
Product Selection Installation
Growth
Dimensional Modeling
Physical Design
ETL design Development
Deployment
Project Planning
BI Appl. Specification
BI Application Development
Maintenance
Project Management
33Sub-activities before project requirement
Enterprise (horizontal) Business Requirements Def
inition
Enterprise (business process) Business Requireme
nts Definition
Requirements Prioritization Process
Initial Project Scope
Project Planning
Project management
34Enterprise requirements definition process
Preparation
Conduct Business And IT Interview
Use Data Profiling To Research Data Sources as
Needed
Write up Interview Summaries with Analytic Themes
Identify Business Process from Analytic Themes
Build Initial Bus Matrix
Conduct Senior Management Prioritization Session
Write up the Overall Requirements
Definition Document
35ISQS 6339 Data Management Business
IntelligenceCourse Structure Review
36Outline
- Objectives
- Components of this course
- How this course is conducted
- How to do better in this course
37Objectives
- To learn the general principles of business
intelligence - To be able to design and construct a data
warehouse/data mart - To develop the data processing skills,
specifically ETL system implementation. - To be able to perform OLAP tasks
- To develop general data preparation skills for
analytic tasks
38Technical skills
- Data warehousing with Microsoft SQL Server 2005
- Data analysis with Enterprise Guide
- Basic data processing by SAS programming
39Components of the course
- Principles of data warehousing
- Dimensional data model
- Information integration and flow design in the
ETL (extraction, transformation, and loading)
process - Online analytical processing (OLAP)
- Data reporting and query techniquesÂ
- Data preparation for analytics
40How this course is conducted
- Lectures in basic principles of data warehousing
and data analysis - Students are expected to read the textbook to
enhanced the knowledge - In-class exercises
- Skill acquisition
- Homework assignments
- Review and practice what you are supposed to have
learned
41Sample Quiz for the Graduate Level BI Course
(fall 2007)
- 1. True/False questions (1 point each)
- (Five questions)
- 2. Using SSRS to create a list report in four
columns from any database table (5 points). - 3. Using SAS Enterprise Guide to accomplish the
following tasks (10 points) - Open three Cotton data warehousing tables,
location, variety and the fact table. - Create a query by linking all tables (hint to
improve the efficiency of joining tables, you may
import data into SAS format first) - By filtering the data to keep main variety
information, location, a few interested columns
that have numeric values. - Create a one-way frequency table from the
filtered data showing whatever table that makes
sense.
42Evolving Focuses of BI Courses
- ISQS 3358, Spring 2007
- Focusing on data analysis with SAS Enterprise
Guide, 5 projects based on real dataset from the
industry - Teaching effectiveness evaluation 4.00
- Main comments
- Positive Learned from projects
- Negative Not learned enough as data warehousing
was missing. Some students complained wasting of
time. - ISQS 6339, Fall 2007
- Focusing on data warehousing with Microsoft SQL
Server, plus SAS Programming and the applications
of SAS Enterprise Guide - Teaching effectiveness evaluation 4.67
- Main comments
- Positive The contents are useful
- Negative The instructor is not familiar enough
with SQL Server - ISQS 3358, Spring 2008
- Same focuses as ISQS 6339
- A different textbook with different datasets
coming with the book
43Strategies to Do Better
- You need the textbook.
- Read it and do exercises with it
- Do the in-class exercises seriously. At the
beginning you may feel difficult to follow, but
you will get better soon. Again, you need preview
of the chapters to be covered. - Always follow the pace of the exercises. Your
current results of the exercise could be the
basis of the next one. If you missed one you must
make it up. - Be focused on skill development. So you need to
spend more time practicing. Make use of Lab 363
in other time. - Contact the instructor for any questions
44Grading Policy
- Five quizzes out of six (150 points)
- In-class exercises (100 points)
- Project(s) (150 points)
- Open-book final Exam (100 points)
- The total is 500 points
- Extra bonus credit up to 30 points based on the
evaluation of the involvement in the class
activities.
45Bonus Credit
- Extra exercises for those willing to spend more
time in learning BI - Creative activities in BI learning to benefit the
class - Actively involved in class discussion and helping
others in learning
46Preview/review - Chapter 4, 5 and 6
- Main topics
- Data warehousing process and design phase
- Dimensional modeling (UMD and data mart)
- How to define a data mart with SQL Server
Management Studio (SSMS) - How to create a data mart with Business
Intelligence Development Studio (BIDS) based on a
cube template
47Questions
- What is the relationship between OLAP system and
cubes in the context of data warehousing? - Can you tell what is main difference between UMD
and data mart? - Do you know how to design a dimensional model?
- Do you know how to construct a cube from data
sources you defined? - Do you know how to create a data mart
- Using SSMS (manually defining tables, import from
files or databases) - Using BIDS based on a cube template