Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View

1 / 32
About This Presentation
Title:

Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View

Description:

Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University –

Number of Views:981
Avg rating:3.0/5.0
Slides: 33
Provided by: chen127
Category:

less

Transcript and Presenter's Notes

Title: Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View


1
Part IData Mining Fundamentals Chapter 1Data
Mining A First View
Jason C. H. Chen, Ph.D. Professor of MIS School
of Business Administration Gonzaga
University Spokane, WA 99223 chen_at_jepson.gonzaga.e
du
2
1.1 Data Mining A Definition
3
1.1 Data Mining A Definition
  • The process of employing one or more computer
    learning techniques to automatically analyze and
    extract knowledge from data.

4
Induction-based Learning
  • The process of forming general concept
    definitions by observing specific examples of
    concepts to be learned.

Knowledge Discovery in Databases (KDD)
  • The application of the scientific method to data
    mining. Data mining is one step of the KDD
    process.

5
Data Mining Examples
  • A telephone company used a data mining tool to
    analyze their customers data warehouse. The
    data mining tool found about 10,000 supposedly
    residential customers that were expending over
    1,000 monthly in phone bills.
  • After further study, the phone company discovered
    that they were really small business owners
    trying to avoid paying business rates

6
Other Data Mining Examples
  • 65 of customers who did not use the credit card
    in the last six months are 88 likely to cancel
    their accounts.
  • If age lt 30 and income lt 25,000 and credit
    rating lt 3 and credit amount gt 25,000 then the
    minimum loan term is 10 years.
  • 82 of customers who bought a new TV 27" or
    larger are 90 likely to buy an entertainment
    center within the next 4 weeks.

7
1.2 What Can Computers Learn?
8
Four Levels of Learning
  • Fact
  • a simple statement of truth
  • Concept
  • a set of objects, symbols, or events grouped
    together because they share certain
    characteristics
  • Principle
  • is a step-by-step course of action to achieve a
    goal. We use procedures in our everyday
    functioning as well as in the solution of
    difficult problems
  • Procedure
  • represents the highest level of learning.
    Principles are general truths or laws that are
    basic to other truths.

N
Source Merril and Tennyson, 1977, p.5 of the text
9
Concepts
  • Computers are good at learning concepts. Concepts
    are the output of a data mining session.

Three Concept Views
  • Classical View
  • Probabilistic View
  • Exemplar View

10
Three Concept Views
  • Classical View
  • Attests that all concepts have definite defining
    properties.
  • Probabilistic View
  • Concepts are represented by properties that are
    probable of concept members.
  • Exemplar View
  • States that a given instance is determined to be
    an example of a particular concept if the
    instance is similar enough to a set of one or
    more known examples of the concepts

11
Figure - A hierarchy of data mining strategies
No output attributes
Categorical/discrete (current behavior)
Numeric
Future outcome (categorical/numeric)
12
Supervised Learning
Supervised learning is the process of building
classification models using data instances of
known origin.
  • Two purposes
  • 1. Build a learner (classification) model using
    data instances of known origin.
  • is an induction process
  • 2. Use the model to determine the outcome new
    instances of unknown origin.
  • is a deduction process

13
  • Supervised Learning
  • A Decision Tree Example

14
Decision Tree
  • A tree structure where non-terminal nodes
    represent tests on one or more attributes and
    terminal nodes reflect decision outcomes.

Table 1.1 Hypothetical Training Data for
Disease Diagnosis
15
Figure 1.1 A decision tree for the data in
Table 1.1
16
Table 1.1 Hypothetical Training Data for
Disease Diagnosis
17
Production Rules
We can translate any decision tree into a set of
production rules. They are rules of the form IF
ltantecedent conditionsgt THEN ltconsequent
conditionsgt
  • IF Swollen Glands Yes
  • THEN Diagnosis Strep Throat
  • IF Swollen Glands No Fever Yes
  • THEN Diagnosis Cold
  • IF Swollen Glands No Fever No
  • THEN Diagnosis Allergy

18
Unsupervised Clustering
  • A data mining method that builds models from data
    without predefined classes (see Table 1.3).
  • Data instances are grouped together based on a
    similarity scheme defined by the clustering
    system.
  • With the help of one or several evaluation
    techniques, it is up to us to decide the meaning
    of the formed clusters.

19
Table 1.3 Acme Investors Incorporated
20
Possible Questions
Questions for supervised learning
  • 1. Can I develop a general profile of an online
    investor? If so, what characteristics distinguish
    online investors from investors that use a
    broker?
  • 2. Can I determine if a new customer who does not
    initially open a margin account is likely to do
    so in the future?
  • 3. Can I build a model able to accurately predict
    the average number of trades per month for a new
    investor?
  • 4. What characteristics differentiate female and
    male investors?

Questions for unsupervised learning
  • 1. What attribute similarities group customers of
    Acme Investors together?
  • 2. What differences in attribute values segment
    the customer database?

21
1.3 Is Data Mining Appropriate for My Problem?
22
Data Mining or Data Query?
  • Shallow Knowledge
  • is factual tools used DBMS/SQL
  • Multidimensional Knowledge
  • Is factual tools used OLAP
  • Hidden Knowledge
  • Represents patterns or regularities in data that
    cannot be easily found, tools used data mining
  • Deep Knowledge
  • Knowledge stored in a database that can only be
    found if we are given some direction.

23
Data Mining vs. Data Query An Example
  • Use data query if you already almost know what
    you are looking for.
  • Use data mining to find regularities in data
    that are not obvious.

24
1.4 Expert Systems or Data Mining?
25
Expert System and Knowledge Engineer
  • An expert system is a computer program that
    emulates the problem-solving skills of one or
    more human experts.
  • A knowledge engineer is a person trained to
    interact with an expert in order to capture their
    knowledge.

26
(No Transcript)
27
1.5 A Simple Data Mining Process Model
28
Figure 1.3 - A simples data mining process model
29
Characteristics of Data Warehouse
  • Data Warehouse
  • Definitions a subject-oriented, integrated,
    time-variant, non-updatable collection of data
    used in support of management decision-making
    processes
  • Subject-oriented e.g. customers, patients,
    students, products
  • Integrated Consistent naming conventions,
    formats, encoding structures from multiple data
    sources
  • Time-variant Can study trends and changes
  • Nonupdatable Read-only, periodically refreshed
  • Data Mart
  • A data warehouse that is limited in scope

30
A four-step process for performing a data mining
session
  • 1. Assembling the data
  • Operational database (relational databases and
    flat files) vs. data warehouse
  • 2. Mining the Data (Giving the data to a mining
    tool)
  • Instances for building the model or testing the
    model
  • 3. Interpreting the results
  • 4. Result application

31
1.7 Data Mining Applications (p.24)
  • Fraud Detection
  • Health care
  • Business and finance
  • Scientific applications
  • Sports and gaming

32
Customer Intrinsic Value
B
A
C
Write a Comment
User Comments (0)
About PowerShow.com