Data Mining Techniques: Classification - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Data Mining Techniques: Classification

Description:

White. Black. Yes. Tall. White. Black. Yes. Short. Black. Black. Oriental. Height. Hair. Eye. 1 ... data set S is split into two subsets S1 and S2 with sizes N1 ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 52
Provided by: leeyu2
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Techniques: Classification


1
Data Mining TechniquesClassification
2
Classification
  • What is Classification?
  • Classifying tuples in a database
  • In training set E
  • each tuple consists of the same set of multiple
    attributes as the tuples in the large database W
  • additionally, each tuple has a known class
    identity
  • Derive the classification mechanism from the
    training set E, and then use this mechanism to
    classify general data (in W)

3
Learning Phase
  • Learning
  • The class label attribute is credit_rating
  • Training data are analyzed by a classification
    algorithm
  • The classifier is represented in the form of
    classification rules

4
Testing Phase
  • Testing (Classification)
  • Test data are used to estimate the accuracy of
    the classification rules
  • If the accuracy is considered acceptable, the
    rules can be applied to the classification of new
    data tuples

5
Classification by Decision Tree
A top-down decision tree generation algorithm
ID-3 and its extended version C4.5 (Quinlan93)
J.R. Quinlan, C4.5 Programs for Machine Learning,
Morgan Kaufmann, 1993
6
Decision Tree Generation
  • At start, all the training examples are at the
    root
  • Partition examples recursively based on selected
    attributes
  • Attribute Selection
  • Favoring the partitioning which makes the
    majority of examples belong to a single class
  • Tree Pruning (Overfitting Problem)
  • Aiming at removing tree branches that may lead to
    errors when classifying test data
  • Training data may contain noise,

7
Another Examples
Eye Hair Height Oriental
Black Black Short Yes
Black White Tall Yes
Black White Short Yes
Black Black Tall Yes
Brown Black Tall Yes
Brown White Short Yes
Blue Gold Tall No
Blue Gold Short No
Blue White Tall No
Blue Black Short No
Brown Gold Short No
1 2 3 4 5 6 7 8 9 10 11
8
Decision Tree
9
Decision Tree
10
Decision Tree Generation
  • Attribute Selection (Split Criterion)
  • Information Gain (ID3/C4.5/See5)
  • Gini Index (CART/IBM Intelligent Miner)
  • Inference Power
  • These measures are also called goodness functions
    and used to select the attribute to split at a
    tree node during the tree generation phase

11
Decision Tree Generation
  • Branching Scheme
  • Determining the tree branch to which a sample
    belongs
  • Binary vs. K-ary Splitting
  • When to stop the further splitting of a node
  • Impurity Measure
  • Labeling Rule
  • A node is labeled as the class to which most
    samples at the node belongs

12
Decision Tree Generation Algorithm ID3
ID Iterative Dichotomiser
(7.1) ? Entropy
13
Decision Tree Algorithm ID3
14
Decision Tree Algorithm ID3
15
Decision Tree Algorithm ID3
16
Decision Tree Algorithm ID3
yes
17
Decision Tree Algorithm ID3
18
Exercise 2
19
Decision Tree Generation Algorithm ID3
20
Decision Tree Generation Algorithm ID3
21
Decision Tree Generation Algorithm ID3
22
How to Use a Tree
  • Directly
  • Test the attribute value of unknown sample
    against the tree.
  • A path is traced from root to a leaf which holds
    the label
  • Indirectly
  • Decision tree is converted to classification
    rules
  • One rule is created for each path from the root
    to a leaf
  • IF-THEN is easier for humans to understand

23
Generating Classification Rules
24
Generating Classification Rules
25
Generating Classification Rules
  • There are 4 decision rules are generated by the
    tree
  • Watch the game and home team wins and out with
    friends then bear
  • Watch the game and home team wins and sitting at
    home then diet soda
  • Watch the game and home team loses and out with
    friend then bear
  • Watch the game and home team loses and sitting at
    home then milk
  • Optimization for these rules
  • Watch the game and out with friends then bear
  • Watch the game and home team wins and sitting at
    home then diet soda
  • Watch the game and home team loses and sitting at
    home then milk

26
Decision Tree Generation Algorithm ID3
  • All attributes are assumed to be categorical
    (discretized)
  • Can be modified for continuous-valued attributes
  • Dynamically define new discrete-valued attributes
    that partition the continuous attribute value
    into a discrete set of intervals
  • A ? V A lt V
  • Prefer Attributes with Many Values
  • Cannot Handle Missing Attribute Values
  • Attribute dependencies do not consider in this
    algorithm

27
Attribute Selection in C4.5
28
Handling Continuous Attributes
29
Handling Continuous Attributes
30
Handling Continuous Attributes
Sorted By
Sorted By
?
?
31
Handling Continuous Attributes
Root
First Cut
Price On Date T1 gt 18.02
Price On Date T1 lt 18.02
Second Cut
Price On Date T gt 17.84
Price On Date T lt 17.84
Buy
Third Cut
Price On Date T1 gt 17.70
Price On Date T1 lt 17.70
Sell
Buy
Sell
32
Exercise 3????
CM No. of Homes in Community
ID Location Type Miles SF CM Home Price (K)
1 Urban Detached 2 2000 50 High
2 Rural Detached 9 2000 5 Low
3 Urban Attached 3 1500 150 High
4 Urban Detached 15 2500 250 High
5 Rural Detached 30 3000 1 Low
6 Rural Detached 3 2500 10 Medium
7 Rural Detached 20 1800 5 Medium
8 Urban Attached 5 1800 50 High
9 Rural Detached 30 3000 1 Low
10 Urban Attached 25 1200 100 Medium
SF Square Feet
33
Unknown Attribute Values in C4.5
Training
Testing
34
Unknown Attribute Values Adjustment of
Attribute Selection Measure
35
Fill in Approach
36
Probability Approach
37
Probability Approach
38
Unknown Attribute Values Partitioning
theTraining Set
39
Probability Approach
40
Unknown Attribute Values Classifying an Unseen
Case
41
Probability Approach
42
Evaluation Coincidence Matrix
Cost 190 (closing good account) 10
(keeping bad account open)
Decision Tree Model
Accuracy (???) (36632) / 718
93.0 Precision (???) for Insolvent 36/58
62.01 Recall (???) for Insolvent 36/64
56.25 F Measure 2 Precision Recall /
(Precision Recall ) 2
62.01 56.25 / (62.01 56.25 )
0.7 / 1.1826 0.59 Cost 190 22
10 28 4,460
43
Decision Tree Generation Algorithm Gini Index
  • If a data set S contains examples from n classes,
    gini index, gini(S), is defined aswhere pj is
    the relative frequency of class Cj in S.
  • If a data set S is split into two subsets S1 and
    S2 with sizes N1 and N2 respectively, the gini
    index of the split data contains examples from n
    classes, the gini index, gini(S), is defined as

44
Decision Tree Generation Algorithm Gini Index
  • The attribute provides the smallest ginisplit(S)
    is chosen to split the node
  • The computation cost of gini index is less than
    information gain
  • All attributes are binary splitting in IBM
    Intelligent Miner
  • A ? V A lt V

45
Decision Tree Generation Algorithm Inference
Power
  • A feature that is useful in inferring the group
    identity of a data tuple is said to have a good
    inference power to that group identity.
  • In Table 1, given attributes (features) Gender,
    Beverage, State, try to find their inference
    power to Group id

46
(No Transcript)
47
(No Transcript)
48
Naive Bayesian Classification
  • Each data sample is a n-dim feature vector
  • X (x1, x2, .. xn) for attributes A1, A2, An
  • Suppose there are m classes
  • C C1, C2,.. Cm
  • The classifier will predict X to the class Ci
    that has the highest posterior probability,
    conditioned on X
  • X belongs to Ci iff P(CiX) gt P(CjX) for all
    1ltjltm, j!i

49
Naive Bayesian Classification
  • P(CiX) P(XCi) P(Ci) / P(X)
  • P(CiX) P(Ci?X) / P(X) P(XCi) P(Ci?X) /
    P(Ci) gt P(CiX) P(X) P(XCi) P(Ci)
  • P(Ci) si / s
  • si is the number of training sample of class Ci
  • s is the total number of training samples
  • Assumption Independent between Attributes
  • P(XCi) P(x1Ci) P(x2Ci) P(x3Ci) ...
    P(xnCi)
  • P(X) can be ignored

50
Naive Bayesian Classification
  • Classify X(agelt30, incomemedium,
    studentyes, credit-ratingfair)
  • P(buys_computeryes) 9/14
  • P(buys_computerno)5/14
  • P(agelt30buys_computeryes)2/9
  • P(agelt30buys_computerno)3/5
  • P(incomemediumbuys_computeryes)4/9
  • P(incomemediumbuys_computerno)2/5
  • P(studentyesbuys_computeryes)6/9
  • P(studentyesbuys_computerno)1/5
  • P(credit-ratingfairbuys_computeryes)6/9
  • P(credit-rating fairbuys_computerno)2/5
  • P(Xbuys_computeryes)0.044
  • P(Xbuys_computerno)0.019
  • P(buys_computeryesX) P(Xbuys_computeryes)
    P(buys_computeryes)0.028
  • P(buys_computernoX) P(Xbuys_computerno)
    P(buys_computerno)0.007

51
Homework Assignment
Write a Comment
User Comments (0)
About PowerShow.com