Title: Data Warehousing and Data Mining MIS 4133 Spring 200
1Data Warehousing and Data Mining
2Business Intelligence
- The ability of an enterprise to study past
behaviors and actions in order to understand
where the organization has been, determine its
current situation, and predict or change what
will happen in the future.
(Imhoff et al. 2003)
3Evolving Business Intelligence Technologies
Customizable Analytical Appl
Exploration
Exploration
Level of Sophistication
Data Mining
Data Mining
Data Mining
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Mid 1980s
Early 1990s
Mid 1990s
Late 1990s
Early 2000s
(Imhoff et al. 2003)
4Data Warehouse (DW)
- a subject-oriented, integrated, time-variant,
nonvolatile collection of data used to support
the strategic decision-making process for the
enterprise. - the central point of data integration for
business intelligence - the source of data for the data marts,
delivering a common view of enterprise data - A database that stores large amounts of
historical business data - Contains multiple databases
(Imhoff et al. 2003)
5DW Technology Components
- Batch and processing data
- Extracted from operational databases
- Cleaned up to remove redundant data, fill in
blank and missing fields, and organized into
consistent formats - Relational databases
- Where data is loaded
- Data and access reporting software
- Used by business analysts to dig into the data
- Examples OLAP, statistical modeling tools,
geographic information systems (GIS), and data
mining tools
(Data Warehouse 2002)
6DW Configuration
Timing of Advertising
Customer Segment
Year
North Territory
Southeast Territory
Southwest Territory
East Territory
1
2
3
4
Product Lines
Marketing Database
Sales Database
Customer Database
Product Database
(Haag et al. 2004)
7Overview of Data Warehousing Infrastructure
ODS operational data store ETL extract,
transform and load OLTP online transaction
processing OLAP online analytical processing
(Mailvaganam 2004)
8Definitions
- Meta data
- Data about data
- Supplies definitions of data, the calculations
used, information about where the data came from,
what was done to it, who is using it, when they
use it, what the quality metrics are for various
pieces of data, etc. - Data Mart
- Customized and/or summarized data that is derived
from the data warehouse and tailored to support
the specific analytical requirements of a given
business unit or business function - Data Mining
- Designed to give researchers and analysts the
ability to delve into the relationships of data
and events without having preconceived notions of
those relationships - Extracting data from a data warehouse in order to
analyze patterns, trends, and relationships.
9Kinds of Analyses
- Most Basic Query and Reporting
- Usually performed by functional managers
- Use predefined queries
- More Complex OLAP and Statistical Analysis
- Designed for business analysts
- Looks at data across multiple dimensions
- Provides summary info, with drill down
capabilities - Provides relationships between factors
- Most Complex Data Mining
- Analyze very large data sets
- Highlights hidden patterns
(Data Warehouse 2002)
10Cleaning Data Possible Steps?
- Define rules in advance
- Identify homonyms and synonyms
- Homonyms two or more different items with the
same identifier - Synonyms the same items with more than one
identifier - Use data-profiling software
- Highlights errors and inconsistencies
- Use fuzzy matching software
- Looks for duplicates and errors introduced by
keyboard entry - Automatic editing software
- Checks for reasonableness and data consistency
- Britains defense dept data cleansing project
cost 11 million over four years, and has saved
the Ministry of Defense 36 million
(Wheatley 2004)
11DW Uses
- CRM
- Consolidate customer data
- Identify areas of customer satisfaction and
frustration - Fraud detection
- Product repositioning analysis
- Profit center discovery
- Corporate asset management
- Etc.
(Data Warehouse 2002)
12Questions to Ask in the DW Planning Stage?
- What data is needed to make business decisions?
- Which business units will use it?
- What kind of data analysis will be done?
- How granular will the data be and what is the
oldest data to be archived in it? - What are the security requirements?
(Data Warehouse 2002)
13Reasons for Implementing
- Querying and reporting on servers / disks not
used by TPSs - A relatively small amount of knowledge of the
technical aspects of database technology is
required to write and maintain queries and
reports - Speed up the writing and maintaining of queries
and reports by technical personnel - A repository of cleaned up TPSs data
- Easier to query and report data from multiple
TPSs, from external data sources, and/or from
data that must be stored for query / report
purposes only - A repository of TPS data that contains data from
a longer span of time than can efficiently be
held in a TPS
(Greenfield 2004)
14Reasons for NOT Implementing
- DW systems store historical data that have been
generated in internal TPSs. - DW systems can complicate business processes
significantly. - DW can have a learning curve that may be too long
for impatient firms. - DW can become an exercise in data for the sake of
the data. - In certain organizations, ad hoc end user query /
reporting tools do not take.
(Greenfield 2004)
15Reasons for NOT Implementing Cont.
- Many strategic applications of DW have a short
life span and require the developers to put
together a technically inelegant system quickly. - There is a limited number of people available who
have worked with the full DW system project life
cycle. - DW systems can require a great deal of
maintenance which many organizations cannot or
will not support. - Sometimes the cost to capture data, clean it up,
and deliver it in a format and time frame that is
useful for the end users is too much of a cost to
bear.
(Greenfield 2004)
16Data Warehousing ROI
- If done correctly, then
- Cost savings
- Increases in revenues
- Increase in analysis of marketing DBs to
cross-sell products - Less computer storage on the mainframe
- Ability to identify and keep the most profitable
customers, while getting to better idea of who
they really are - Changes users jobs (faster access to more
accurate data better customer service)
(Data Warehouse 2002)
17Disadvantages of a DW
- Expense
- Time
- Money
- Operational DBs are sufficient
- Requires more IT support
(Data Warehouse 2002)
18Case Data Gets a Makeover LOreal
- Current System Problems
- Separate databases
- Inefficient data management
- Poor internal communication
- Data Warehouse Solution
- Incorporate all sales info under one roof
- Reduce wait time on some queries from 20 seconds
to three - View data from a Web browser
- Factors the development team had to deal with
- Resolve organizational semantic inconsistencies
- Ensure content conformed to meta-data
- Reassuring overwhelmed users
19Case Anatomy of a Failure
- Reasons for failure
- Bad timing (during call center expansion)
- Unrealistic time frame
- Inexperienced IS staff
- Lack of demand for a new reporting system
- No clearly defined business objective
- Lack of involvement/coordination in requirements
determination - No way to populate the warehouse
- Too many technology projects at once
- Costs cancelled project 750,000 loss of 50
of IS staff
20References
- Data Warehouse (2002). CXO Media Inc., Sept.
20, http//www.cio.com/summaries/enterprise/data/i
ndex.html accessed October 23, 2002. - Greenfield, L. (2004). The Data Warehousing
Information Center, http//www.dwinfocenter.org/
accessed November 17, 2004. - Haag, S., Cummings, M. and McCubbrey, D.J.
(2004). Management Information Systems for the
Information Age, 4th edition, McGraw-Hill
Companies, Inc. - Imhoff, C., Galemmo, N. and Geiger, J.G. (2003).
Mastering Data Warehouse Design, Wiley
Publishing, Inc., Indianapolis, Indiana. - Mailvaganam, H. (2004). Introduction to
Metadata, Data Warehousing Review,
http//www.dwreview.com/Articles/Metadata.html
accessed December 17, 2004. - Wheatley, M. (2004). Operation Clean Data,
http//www.cio.com/archive/070104/data.html
accessed January 25, 2005.
21Exercise 1 Developing a DW
- Each piece of candy represents a piece of data
- List all of the attributes of the candy
- Ex color
- Sort and count the candy for each attribute
22Exercise 1 Questions
- For each attribute, which has the most?
- For each attribute, which has the least?
- Do you have any bad data?
- If so, how do you deal with it?
23Exercise 1 Conclusion
- What did you learn?
- You should have learned
- The purpose of a data warehouse
- The importance of defining your sorting methods
- How to deal with bad data
- Data is constantly changing How should you deal
with this? - You can never have too much candy!
24Exercise 2 How do all the Systems Fit Together?
- Create a diagram that incorporates ERP, SCM, CRM
and DW