Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Classification

Description:

Classification Today: Basic Problem Decision Trees – PowerPoint PPT presentation

Number of Views:98
Avg rating:3.0/5.0
Slides: 36
Provided by: sriniv2
Category:

less

Transcript and Presenter's Notes

Title: Classification


1
Classification
  • Today Basic Problem
  • Decision Trees

2
Classification Problem
  • Given a database Dt1,t2,,tn and a set of
    classes CC1,,Cm, the Classification Problem
    is to define a mapping fDgC where each ti is
    assigned to one class.
  • Actually divides D into equivalence classes.
  • Prediction is similar, but may be viewed as
    having infinite number of classes.

3
Classification Ex Grading
  • If x gt 90 then grade A.
  • If 80ltxlt90 then grade B.
  • If 70ltxlt80 then grade C.
  • If 60ltxlt70 then grade D.
  • If xlt50 then grade F.

x
A
gt80
lt80
x
B
x
C
D
F
4
Classification Techniques
  • Approach
  • Create specific model by evaluating training data
    (or using domain experts knowledge).
  • Apply model developed to new data.
  • Classes must be predefined
  • Most common techniques use DTs, or are based on
    distances or statistical methods.

5
Defining Classes
6
Issues in Classification
  • Missing Data
  • Ignore
  • Replace with assumed value
  • Measuring Performance
  • Classification accuracy on test data
  • Confusion matrix
  • OC Curve

7
Height Example Data
8
Classification Performance
True Positive
False Negative
True Negative
False Positive
9
Confusion Matrix Example
  • Using height data example with Output1 correct
    and Output2 actual assignment

10
Operating Characteristic Curve
11
Classification Using Decision Trees
  • Partitioning based Divide search space into
    rectangular regions.
  • Tuple placed into class based on the region
    within which it falls.
  • DT approaches differ in how the tree is built DT
    Induction
  • Internal nodes associated with attribute and arcs
    with values for that attribute.
  • Algorithms ID3, C4.5, CART

12
Decision Tree
  • Given
  • D t1, , tn where tiltti1, , tihgt
  • Database schema contains A1, A2, , Ah
  • Classes CC1, ., Cm
  • Decision or Classification Tree is a tree
    associated with D such that
  • Each internal node is labeled with attribute, Ai
  • Each arc is labeled with predicate which can be
    applied to attribute at parent
  • Each leaf node is labeled with a class, Cj

13
DT Induction
14
DT Splits Area
M
Gender
F
Height
15
Comparing DTs
Balanced
Deep
16
DT Issues
  • Choosing Splitting Attributes
  • Ordering of Splitting Attributes
  • Splits
  • Tree Structure
  • Stopping Criteria
  • Training Data
  • Pruning

17
Information/Entropy
  • Given probabilitites p1, p2, .., ps whose sum is
    1, Entropy is defined as
  • Entropy measures the amount of randomness or
    surprise or uncertainty.
  • Goal in classification
  • no surprise
  • entropy 0

18
ID3
  • Creates tree using information theory concepts
    and tries to reduce expected number of
    comparison..
  • ID3 chooses split attribute with the highest
    information gain

19
ID3 Example (Output1)
  • Starting state entropy
  • 4/15 log(15/4) 8/15 log(15/8) 3/15 log(15/3)
    0.4384
  • Gain using gender
  • Female 3/9 log(9/3)6/9 log(9/6)0.2764
  • Male 1/6 (log 6/1) 2/6 log(6/2) 3/6 log(6/3)
    0.4392
  • Weighted sum (9/15)(0.2764) (6/15)(0.4392)
    0.34152
  • Gain 0.4384 0.34152 0.09688
  • Gain using height
  • 0.4384 (2/15)(0.301) 0.3983
  • Choose height as first splitting attribute

20
C4.5
  • ID3 favors attributes with large number of
    divisions
  • Improved version of ID3
  • Missing Data
  • Continuous Data
  • Pruning
  • Rules
  • GainRatio

21
CART
  • Create Binary Tree
  • Uses entropy
  • Formula to choose split point, s, for node t
  • PL,PR probability that a tuple in the training
    set will be on the left or right side of the tree.

22
CART Example
  • At the start, there are six choices for split
    point (right branch on equality)
  • P(Gender)2(6/15)(9/15)(2/15 4/15 3/15)0.224
  • P(1.6) 0
  • P(1.7) 2(2/15)(13/15)(0 8/15 3/15) 0.169
  • P(1.8) 2(5/15)(10/15)(4/15 6/15 3/15)
    0.385
  • P(1.9) 2(9/15)(6/15)(4/15 2/15 3/15)
    0.256
  • P(2.0) 2(12/15)(3/15)(4/15 8/15 3/15)
    0.32
  • Split at 1.8

23
Problem to Work OnTraining Dataset
This follows an example from Quinlans ID3
24
Output A Decision Tree for buys_computer
age?
lt30
overcast
gt40
30..40
student?
credit rating?
yes
no
yes
fair
excellent
no
no
yes
yes
25
Bayesian Classification Why?
  • Probabilistic learning Calculate explicit
    probabilities for hypothesis, among the most
    practical approaches to certain types of learning
    problems
  • Incremental Each training example can
    incrementally increase/decrease the probability
    that a hypothesis is correct. Prior knowledge
    can be combined with observed data.
  • Probabilistic prediction Predict multiple
    hypotheses, weighted by their probabilities
  • Standard Even when Bayesian methods are
    computationally intractable, they can provide a
    standard of optimal decision making against which
    other methods can be measured

26
Bayesian Theorem Basics
  • Let X be a data sample whose class label is
    unknown
  • Let H be a hypothesis that X belongs to class C
  • For classification problems, determine P(H/X)
    the probability that the hypothesis holds given
    the observed data sample X
  • P(H) prior probability of hypothesis H (i.e. the
    initial probability before we observe any data,
    reflects the background knowledge)
  • P(X) probability that sample data is observed
  • P(XH) probability of observing the sample X,
    given that the hypothesis holds

27
Bayes Theorem (Recap)
  • Given training data X, posteriori probability of
    a hypothesis H, P(HX) follows the Bayes theorem
  • MAP (maximum posteriori) hypothesis
  • Practical difficulty require initial knowledge
    of many probabilities, significant computational
    cost insufficient data

28
Naïve Bayes Classifier
  • A simplified assumption attributes are
    conditionally independent
  • The product of occurrence of say 2 elements x1
    and x2, given the current class is C, is the
    product of the probabilities of each element
    taken separately, given the same class
    P(y1,y2,C) P(y1,C) P(y2,C)
  • No dependence relation between attributes
  • Greatly reduces the computation cost, only count
    the class distribution.
  • Once the probability P(XCi) is known, assign X
    to the class with maximum P(XCi)P(Ci)

29
Training dataset
Class C1buys_computer yes C2buys_computer
no Data sample X (agelt30, Incomemedium, Stud
entyes Credit_rating Fair)
30
Naïve Bayesian Classifier Example
  • Compute P(X/Ci) for each class
  • P(agelt30 buys_computeryes)
    2/90.222
  • P(agelt30 buys_computerno) 3/5 0.6
  • P(incomemedium buys_computeryes)
    4/9 0.444
  • P(incomemedium buys_computerno)
    2/5 0.4
  • P(studentyes buys_computeryes) 6/9
    0.667
  • P(studentyes buys_computerno)
    1/50.2
  • P(credit_ratingfair buys_computeryes)
    6/90.667
  • P(credit_ratingfair buys_computerno)
    2/50.4
  • X(agelt30 ,income medium, studentyes,credit_
    ratingfair)
  • P(XCi) P(Xbuys_computeryes) 0.222 x
    0.444 x 0.667 x 0.0.667 0.044
  • P(Xbuys_computerno) 0.6 x
    0.4 x 0.2 x 0.4 0.019
  • Multiply by P(Ci)s and we can conclude that
  • X belongs to class buys_computeryes

31
Naïve Bayesian Classifier Comments
  • Advantages
  • Easy to implement
  • Good results obtained in most of the cases
  • Disadvantages
  • Assumption class conditional independence ,
    therefore loss of accuracy
  • Practically, dependencies exist among variables
  • E.g., hospitals patients Profile age, family
    history etc
  • Symptoms fever, cough etc., Disease lung
    cancer, diabetes etc
  • Dependencies among these cannot be modeled by
    Naïve Bayesian Classifier
  • How to deal with these dependencies?
  • Bayesian Belief Networks

32
Classification Using Distance
  • Place items in class to which they are
    closest.
  • Must determine distance between an item and a
    class.
  • Classes represented by
  • Centroid Central value.
  • Medoid Representative point.
  • Individual points
  • Algorithm KNN

33
K Nearest Neighbor (KNN)
  • Training set includes classes.
  • Examine K items near item to be classified.
  • New item placed in class with the most number of
    close items.
  • O(q) for each tuple to be classified. (Here q is
    the size of the training set.)

34
KNN
35
KNN Algorithm
Write a Comment
User Comments (0)
About PowerShow.com