Title: What is Data Mining
1What is Data Mining?
- Presented by
- Shane Brown
- Erin Bader
- Susan Shanlever
Chapter 1 MIS 6473 Dr. Segall January 26, 2004
2Outline of Topics - Chapter 1
- Data Mining Defined
- Using Data Mining to Solve Specific Problems
- What Data Mining Is Not
- Avoiding the Oversell
- Practical Advice before You Begin
3Some General Facts
- Organizations
- Data Cannot Be Analyzed
- Information is undervalued or underutilized
- Large Volume in DB
- Benefits of Organization
- New Patterns or Trends
4Success in the Application of Data Mining
- Four Examples
- Improved Marketing Campaigns
- Improved Operational Procedures
- Identifying Fraud
- Examining Medical Records
5(No Transcript)
6Examining Medical Records
- Important to the military
- 80 not from battlefield
- Used to predict medications needed
- Because of data mining patterns were discovered
- Chickenpox in ages 17-19
- Cancer from same recruiting location
7What Data Mining Is NOT!
- The focus of the data mining process is to
discover hidden patterns and trends. - Once a pattern is identified, it can be described
as a known quantity. - Once it is discovered, the data mining process is
finished. - Analytic approaches that search data sets on the
basis of known patterns are not doing data mining
8What Data Mining is NOT!
- We do not regard techniques that require
implementation of rules, predefined training
examples, or automated supervised learning to be
data mining approaches. - The techniques are still useful, but they are not
a part of data mining.
9Analysis Versus Monitoring
- The majority of data mining applications are
focused on analyzing information that has been
previously collected. - Data are static and represent the state of the
world in some past interval of time. - You can review the information at your own pace,
confirming the accuracy of the data, making
considered decisions about which patterns are
important.
10Analysis Versus Monitoring
- The data does not change while the analysis is
being performed, therefore it is reliable and
consistent. - Time involved in the decision making process is
not an issue.
11Analysis Versus Monitoring
- Monitoring often involves online pattern matching
operations in which incoming data are compared
against a set of conditions or boundaries. - Monitoring often occurs in real time and involves
the processing of data that are continually being
updated.
12Analysis Versus Monitoring
- Predictive models and forecasters can be used to
help identify critical values, unusual behaviors,
and criteria data. - These systems are not usually performing data
mining since they are not discovering new
patterns or classifications. - True data mining is difficult, but not impossible
in these types of environments.
13Monitoring Credit Card Transactions
- Credit card companies have elaborate systems to
curb the misuse of their services and identify
purchases that do not fit the clients profile. - Companies must distinguish between good and bad
transactions. - There are predefined patterns for bad
transactions - Gasoline purchases in a series
14Monitoring Medical Billing Fraud
- CPT unbundling
- Each medical procedure has a 5 digit code
associated with it - Problem occurs when doctors submit the claim and
break up one actual procedure into several
smaller procedures, therefore charging more. - Constitutes insurance fraud/ happens often
15Marketing with Coupons
- Companies compile lists of items in a grocery
store that consumers are likely to purchase both
if they purchase one. - When you check out and a coupon is printed, it
usually matches something that was in your cart. - This also relates to the placement of the items
within the store (next to each other)
16Avoiding the Oversell
- Data mining services by the year 2000 will reach
20 billion. - Data mining is interactive discovery
- Data mining is unique and challenging
- It is not a silver bullet solution for all your
questions - The approach must be constantly refined.
17Practical Advice Before You Begin
- The field of data mining shows exceptional
promise in terms of its potential contributions
to a host of analytical applications. - Susan will offer some cautionary words of advice
on some real-world issues that can limit the
utility of data mining engagements unless
addressed directly.
18Practical Advice Before You Begin
- Justifying the Data Mining Investment
- Expand Marketing Campaigns
- Reduce Fraud
- Improve Profits
- Based on our experiences, companies usually look
for the investment made in data mining to be
about 15-20 percent of value of estimated losses
or expected improvements. p. 17
19Practical Advice Before You Begin
- Working Efficiently Timeliness Is a Virtue
- Results in Days or Weeks
- A Barrier to Quick Results Lack of Access to
Data Sets - If there are no interesting patterns found
- Poor Selection of the Data Extracted or Analysis
- Poor Quality Control in the Original Collection
Process
20Practical Advice Before You Begin
- Establishing the Limitations of Your Data
Resources - Access Available Data Sources
- Accurate
- Well-Coded
- Properly Maintained
- Data does Not Need to be Online or Interactive
- Attain Permission to the Data
21Practical Advice Before You Begin
- Defining the Problem Up Front
- Find What is Of Interest or Importance
- Do the Analysis in Stages
- Avoid Promising Too Much
22Practical Advice Before You Begin
- Knowing Your Target Audience
- Keep Your Target Audience in Mind
- Degree of Detail will Change for Different Target
Audiences
23Practical Advice Before You Begin
- Anticipating and Overcoming Institutional Inertia
- Understand it may Difficult for an Organization
to Act on the Results of Data Mining Analysis - in making the decision to use data mining
you should give consideration to the types of
data available for analysis and the types of
outcomes that will be most useful within the
context of the particular application area. p. 24
24Questions?
25Shanes Question
- What are some example areas in applying data
mining successfully? (p.7-12 WB)
26Erins Question
- What are the major difference between analysis
versus monitoring?
27Susans Question
- When establishing limitations for data resources
What are the two most important things to know?