Data Warehousing and Data Mining MIS 4133 Spring 200 - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Data Warehousing and Data Mining MIS 4133 Spring 200

Description:

Data Warehousing and Data Mining MIS 4133 Spring 2005 Business Intelligence The ability of an enterprise to study past behaviors and actions in order to understand ... – PowerPoint PPT presentation

Number of Views:303
Avg rating:3.0/5.0
Slides: 25
Provided by: busCollin
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing and Data Mining MIS 4133 Spring 200


1
Data Warehousing and Data Mining
  • MIS 4133
  • Spring 2005

2
Business Intelligence
  • The ability of an enterprise to study past
    behaviors and actions in order to understand
    where the organization has been, determine its
    current situation, and predict or change what
    will happen in the future.

(Imhoff et al. 2003)
3
Evolving Business Intelligence Technologies
Customizable Analytical Appl
Exploration
Exploration
Level of Sophistication
Data Mining
Data Mining
Data Mining
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Multi-Dimensional Analysis (OLAP)
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Queries, Reports EIS
Mid 1980s
Early 1990s
Mid 1990s
Late 1990s
Early 2000s
(Imhoff et al. 2003)
4
Data Warehouse (DW)
  • a subject-oriented, integrated, time-variant,
    nonvolatile collection of data used to support
    the strategic decision-making process for the
    enterprise.
  • the central point of data integration for
    business intelligence
  • the source of data for the data marts,
    delivering a common view of enterprise data
  • A database that stores large amounts of
    historical business data
  • Contains multiple databases

(Imhoff et al. 2003)
5
DW Technology Components
  • Batch and processing data
  • Extracted from operational databases
  • Cleaned up to remove redundant data, fill in
    blank and missing fields, and organized into
    consistent formats
  • Relational databases
  • Where data is loaded
  • Data and access reporting software
  • Used by business analysts to dig into the data
  • Examples OLAP, statistical modeling tools,
    geographic information systems (GIS), and data
    mining tools

(Data Warehouse 2002)
6
DW Configuration
Timing of Advertising
Customer Segment
Year
North Territory
Southeast Territory
Southwest Territory
East Territory
1
2
3
4
Product Lines
Marketing Database
Sales Database
Customer Database
Product Database
(Haag et al. 2004)
7
Overview of Data Warehousing Infrastructure
ODS operational data store ETL extract,
transform and load OLTP online transaction
processing OLAP online analytical processing
(Mailvaganam 2004)
8
Definitions
  • Meta data
  • Data about data
  • Supplies definitions of data, the calculations
    used, information about where the data came from,
    what was done to it, who is using it, when they
    use it, what the quality metrics are for various
    pieces of data, etc.
  • Data Mart
  • Customized and/or summarized data that is derived
    from the data warehouse and tailored to support
    the specific analytical requirements of a given
    business unit or business function
  • Data Mining
  • Designed to give researchers and analysts the
    ability to delve into the relationships of data
    and events without having preconceived notions of
    those relationships
  • Extracting data from a data warehouse in order to
    analyze patterns, trends, and relationships.

9
Kinds of Analyses
  • Most Basic Query and Reporting
  • Usually performed by functional managers
  • Use predefined queries
  • More Complex OLAP and Statistical Analysis
  • Designed for business analysts
  • Looks at data across multiple dimensions
  • Provides summary info, with drill down
    capabilities
  • Provides relationships between factors
  • Most Complex Data Mining
  • Analyze very large data sets
  • Highlights hidden patterns

(Data Warehouse 2002)
10
Cleaning Data Possible Steps?
  • Define rules in advance
  • Identify homonyms and synonyms
  • Homonyms two or more different items with the
    same identifier
  • Synonyms the same items with more than one
    identifier
  • Use data-profiling software
  • Highlights errors and inconsistencies
  • Use fuzzy matching software
  • Looks for duplicates and errors introduced by
    keyboard entry
  • Automatic editing software
  • Checks for reasonableness and data consistency
  • Britains defense dept data cleansing project
    cost 11 million over four years, and has saved
    the Ministry of Defense 36 million

(Wheatley 2004)
11
DW Uses
  • CRM
  • Consolidate customer data
  • Identify areas of customer satisfaction and
    frustration
  • Fraud detection
  • Product repositioning analysis
  • Profit center discovery
  • Corporate asset management
  • Etc.

(Data Warehouse 2002)
12
Questions to Ask in the DW Planning Stage?
  • What data is needed to make business decisions?
  • Which business units will use it?
  • What kind of data analysis will be done?
  • How granular will the data be and what is the
    oldest data to be archived in it?
  • What are the security requirements?

(Data Warehouse 2002)
13
Reasons for Implementing
  • Querying and reporting on servers / disks not
    used by TPSs
  • A relatively small amount of knowledge of the
    technical aspects of database technology is
    required to write and maintain queries and
    reports
  • Speed up the writing and maintaining of queries
    and reports by technical personnel
  • A repository of cleaned up TPSs data
  • Easier to query and report data from multiple
    TPSs, from external data sources, and/or from
    data that must be stored for query / report
    purposes only
  • A repository of TPS data that contains data from
    a longer span of time than can efficiently be
    held in a TPS

(Greenfield 2004)
14
Reasons for NOT Implementing
  • DW systems store historical data that have been
    generated in internal TPSs.
  • DW systems can complicate business processes
    significantly.
  • DW can have a learning curve that may be too long
    for impatient firms.
  • DW can become an exercise in data for the sake of
    the data.
  • In certain organizations, ad hoc end user query /
    reporting tools do not take.

(Greenfield 2004)
15
Reasons for NOT Implementing Cont.
  • Many strategic applications of DW have a short
    life span and require the developers to put
    together a technically inelegant system quickly.
  • There is a limited number of people available who
    have worked with the full DW system project life
    cycle.
  • DW systems can require a great deal of
    maintenance which many organizations cannot or
    will not support.
  • Sometimes the cost to capture data, clean it up,
    and deliver it in a format and time frame that is
    useful for the end users is too much of a cost to
    bear.

(Greenfield 2004)
16
Data Warehousing ROI
  • If done correctly, then
  • Cost savings
  • Increases in revenues
  • Increase in analysis of marketing DBs to
    cross-sell products
  • Less computer storage on the mainframe
  • Ability to identify and keep the most profitable
    customers, while getting to better idea of who
    they really are
  • Changes users jobs (faster access to more
    accurate data better customer service)

(Data Warehouse 2002)
17
Disadvantages of a DW
  • Expense
  • Time
  • Money
  • Operational DBs are sufficient
  • Requires more IT support

(Data Warehouse 2002)
18
Case Data Gets a Makeover LOreal
  • Current System Problems
  • Separate databases
  • Inefficient data management
  • Poor internal communication
  • Data Warehouse Solution
  • Incorporate all sales info under one roof
  • Reduce wait time on some queries from 20 seconds
    to three
  • View data from a Web browser
  • Factors the development team had to deal with
  • Resolve organizational semantic inconsistencies
  • Ensure content conformed to meta-data
  • Reassuring overwhelmed users

19
Case Anatomy of a Failure
  • Reasons for failure
  • Bad timing (during call center expansion)
  • Unrealistic time frame
  • Inexperienced IS staff
  • Lack of demand for a new reporting system
  • No clearly defined business objective
  • Lack of involvement/coordination in requirements
    determination
  • No way to populate the warehouse
  • Too many technology projects at once
  • Costs cancelled project 750,000 loss of 50
    of IS staff

20
References
  • Data Warehouse (2002). CXO Media Inc., Sept.
    20, http//www.cio.com/summaries/enterprise/data/i
    ndex.html accessed October 23, 2002.
  • Greenfield, L. (2004). The Data Warehousing
    Information Center, http//www.dwinfocenter.org/
    accessed November 17, 2004.
  • Haag, S., Cummings, M. and McCubbrey, D.J.
    (2004). Management Information Systems for the
    Information Age, 4th edition, McGraw-Hill
    Companies, Inc.
  • Imhoff, C., Galemmo, N. and Geiger, J.G. (2003).
    Mastering Data Warehouse Design, Wiley
    Publishing, Inc., Indianapolis, Indiana.
  • Mailvaganam, H. (2004). Introduction to
    Metadata, Data Warehousing Review,
    http//www.dwreview.com/Articles/Metadata.html
    accessed December 17, 2004.
  • Wheatley, M. (2004). Operation Clean Data,
    http//www.cio.com/archive/070104/data.html
    accessed January 25, 2005.

21
Exercise 1 Developing a DW
  • Each piece of candy represents a piece of data
  • List all of the attributes of the candy
  • Ex color
  • Sort and count the candy for each attribute

22
Exercise 1 Questions
  • For each attribute, which has the most?
  • For each attribute, which has the least?
  • Do you have any bad data?
  • If so, how do you deal with it?

23
Exercise 1 Conclusion
  • What did you learn?
  • You should have learned
  • The purpose of a data warehouse
  • The importance of defining your sorting methods
  • How to deal with bad data
  • Data is constantly changing How should you deal
    with this?
  • You can never have too much candy!

24
Exercise 2 How do all the Systems Fit Together?
  • Create a diagram that incorporates ERP, SCM, CRM
    and DW
Write a Comment
User Comments (0)
About PowerShow.com