Title: Chapter 1 Why
1Chapter 1Why What is Data Mining?
- Note Included in this Slide Set is both Chapter
1 material and additional material from the
instructor.
2Data Mining is a subset of Business Intelligence
(BI)
3Topics to Discuss in Session 1
- What is Data Mining (DM)?
- Who uses DM?
- Why DM?
- Where DM?
- When DM?
- How DM?
- Why study DM?
4Data Mining Definition Goal
What, Who
- Definition
- DM is the exploration and analysis of large
quantities of data in order to discover
meaningful patterns and rules. - Goal
- To allow an enterprise to IMPROVE its ______
through better understanding of its ______ . - Potential for Competitive Advantage.
Synonyms include corporation, firm, non-profit
organization, government agency
5Foundations of Data Mining
- Data mining is the process of using raw data to
infer important business relationships. - Despite a consensus on the value of data mining,
a great deal of confusion exists about what it
is. - Data Mining is a collection of powerful
techniques intended for analyzing large amounts
of data. - There is no single data mining approach, but
rather a set of techniques that can be used stand
alone or in combination with each other.
6Data Mining Why now?
Why, Where, When
- So much data are being produced!
- Data are being warehoused
- Computing power is more affordable
- Competitive pressures are enormous
- Data Mining software is available
7Customer Relationship Management (CRM)
How
8Customer Relationship Management (CRM)
How
In order to form a learning relationship with its
customers, an enterprise (firm) must be able to
- Notice what its customers are doing
- Remember what it and its customers have done
over time - Learn from what it has remembered
- Act On what it has learned to make customers
more profitable
9Based on Transaction Data
How
10Based on Transaction Data
How
11Identifying and Remembering Relationships is the
Key!
How
12Group Exercise 1
- Time Box 15 minutes
- Teams of 4 or less
- Discuss DM situations among yourselves and pick
one to report to the class - What to report (verbally 5 minute max)
- Describe the DM situation
- How does it help the enterprise?
- Presentationsanother 15 to 30 minutes
13Why Study Data Mining? Open discussion to
identify these
14Topics to Discuss in Session 2
- Data Mining History
- Data Warehouse
- Data Mart
15Data Mining History
- The approach has roots in practice dating back
over 40 years. - In the early 1960s, data mining was called
statistical analysis, and the pioneers were
statistical software companies such as SAS and
SPSS. - By the late 1980s, the traditional techniques had
been augmented by new methods such as fuzzy
logic, heuristics and neural networks.
16Definitions of a Data Warehouse
A subject-oriented, integrated, time-variant and
non-volatile collection of data in support of
management's decision making process
1.
- W.H. Inmon
A copy of transaction data, specifically
structured for query and analysis
2.
- Ralph Kimball
17Data Warehouse
- For organizational learning to take place, data
from many sources must be gathered together and
organized in a consistent and useful way hence,
Data Warehousing (DW) - DW allows an organization (enterprise) to
remember what it has noticed about its data - Data Mining techniques make use of the data in a
DW
18Data Warehouse
Enterprise Database
Customers
Orders
Transactions
Vendors
Products
Etc
- Data Miners
- Farmers they know
- Explorers - unpredictable
Copied, organized summarized
(Prospectors)
Data Warehouse
Data Mining
19Data Warehouse
- A data warehouse is a copy of transaction data
specifically structured for querying, analysis
and reporting hence, data mining. - Note that the data warehouse contains a copy of
the transactions which are not updated or changed
later by the transaction system. - Also note that this data is specially structured,
and may have been transformed when it was copied
into the data warehouse.
20Data Mart
- A Data Mart is a smaller, more focused Data
Warehouse a mini-warehouse. - A Data Mart typically reflects the business rules
of a specific business unit within an enterprise.
21Data Warehouse to Data Mart
Decision Support Information
Data Warehouse
Decision Support Information
Decision Support Information
22Data Warehouse Mart
- Set of Tables 2 or more dimensions
- Designed for Aggregation
23Group Exercise 2
- Time Box 15 minutes
- Teams of 4 or less
- Discuss Data Warehouse to Data Mart situations
among yourselves and pick one to report to the
class - What to report (verbally 5 minute max)
- Describe the DW to Data Mart situation
- How does it help the enterprises business
unit? - Presentationsanother 15 to 30 minutes
24Topics to Discuss in Session 3
- Data Mining Flavors
- Data Mining Examples
- Data Mining Tasks
- Data Minings Biggest Challenge
- What does all of this mean?
25Data Mining Flavors
- Directed Attempts to explain or categorize some
particular target field such as income or
response. - Undirected Attempts to find patterns or
similarities among groups of records without the
use of a particular target field or collection of
predefined classes.
26Data Mining Examples in Enterprises
For Illustration Purposes Only
- US Government
- FBI track down criminals (Local Police also)
- Treasury Dept suspicious international funds
transfer - Phone companies
- Supermarkets Superstores (Vons, Albertsons,
Wal-Mart, Costco) - Mail-Order, On-Line Order (L.L. Bean, Victorias
Secret, Lands End, Amazon!) - Financial Institutions (BofA, Wells Fargo,
Charles Schwab) - Insurance Companies (USAA, Allstate, State Farm)
- Tons of others
27Data Mining Tasks
- Classification example Fr, So, Jr, Sr, ND
- Estimation example household income
- Prediction example predict credit card balance
transfer average amount - Affinity Grouping Example people who buy X,
often buy Y also with probability Z - Clustering similar to classification but no
predefined classes - Description and Profiling behavior begets an
explanation such as Men tend to prefer
BurgerKing women prefer Wendys.
28Data Minings Biggest Challenge
- The largest challenge a data miner may face is
the sheer volume of data in the data warehouse. - It is quite important, then, that summary data
also be available to get the analysis started. - A major problem is that this sheer volume may
mask the important relationships the data miner
is interested in. - The ability to overcome the volume and be able to
interpret the data is quite important.
29What Does All of This Mean?
- On a regular basis, farmers and explorers
utilize their data warehouses to give guidance
for and/or answer a limitless variety of
questions. - Nothing is free, however, and the costs may be
heavy. - The value of a data warehouse and subsequent data
mining is a result of the new and changed
business processes it enables competitive
advantage also. - There are limitations, though - A Data Warehouse
cannot correct problems with its data, although
it may help to more clearly identify them.
30End of Chapter 1