Classification - PowerPoint PPT Presentation

About This Presentation
Title:

Classification

Description:

The key to building a decision tree - which attribute to choose in order to branch. The heuristic is to choose the attribute with the maximum IG. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 32
Provided by: Huan77
Category:

less

Transcript and Presenter's Notes

Title: Classification


1
Classification
  • A task of induction to find patterns

2
Outline
  • Data and its format
  • Problem of Classification
  • Learning a classifier
  • Different approaches
  • Key issues

3
Data and its format
  • Data
  • attribute-value pairs
  • with/without class
  • Data type
  • continuous/discrete
  • nominal
  • Data format
  • Flat
  • If not flat, what should we do?

4
Sample data
5
Induction from databases
  • Inferring knowledge from data
  • The task of deduction
  • infer information that is a logical consequence
    of querying a database
  • Who conducted this class before?
  • Which courses are attended by Mary?
  • Deductive databases extending the RDBMS

6
Classification
  • It is one type of induction
  • data with class labels
  • Examples -
  • If weather is rainy then no golf
  • If
  • If

7
Different approaches
  • There exist many techniques
  • Decision trees
  • Neural networks
  • K-nearest neighbours
  • Naïve Bayesian classifiers
  • Support Vector Machines
  • Ensemble methods
  • Semi-supervised
  • and many more ...

8
A decision tree
9
Inducing a decision tree
  • There are many possible trees
  • lets try it on the golfing data
  • How to find the most compact one
  • that is consistent with the data (i.e.,
    accurate)?
  • Why the most compact?
  • Occams razor principle
  • Issue of efficiency w.r.t. optimality
  • How to find an optimal tree?
  • Is there any need for a quick review for basic
    probability theory?

10
Information gain
and
  • Entropy -
  • Information gain - the difference between the
    node before and after splitting

11
Building a compact tree
  • The key to building a decision tree - which
    attribute to choose in order to branch.
  • The heuristic is to choose the attribute with the
    maximum IG.
  • Another explanation is to reduce uncertainty as
    much as possible.

12
Learning a decision tree
Should Outlook be chosen first? If not, which one
should?
Outlook
sunny
overcast
rain
Humidity
Wind
No
high
normal
strong
weak
Yes
No
Yes
No
13
Issues of Decision Trees
  • Number of values of an attribute
  • Your solution?
  • When to stop
  • Data fragmentation problem
  • Any solution?
  • Mixed data types
  • Scalability

14
Rules and Tree stumps
  • Generating rules from decision trees
  • One path is a rule
  • We can do better. Why?
  • Tree stumps and 1R
  • For each attribute value, determine a default
    class (of values of rules)
  • Calculate the of errors for each rule
  • Find the total of errors for that attributes
    rule set
  • For n attributes, there are n rule sets
  • Choose the rule set that has the least of
    errors
  • Lets go back to our example data and learn a 1R
    rule

15
K-Nearest Neighbor
  • One of the most intuitive classification
    algorithm
  • An unseen instances class is determined by its
    nearest neighbor
  • The problem is it is sensitive to noise
  • Instead of using one neighbor, we can use k
    neighbors

16
K-NN
  • New problems
  • How large should k be
  • lazy learning does it learn?
  • large storage
  • A toy example (noise, majority)
  • How good is k-NN?
  • How to compare
  • Speed
  • Accuracy

17
Naïve Bayes Classifier
  • This is a direct application of Bayes rule
  • P(CX) P(XC)P(C)/P(X)
  • X - a vector of x1,x2,,xn
  • Thats the best classifier we can build
  • But, there are problems
  • There are only a limited number of instances
  • How to estimate P(xC)
  • Your suggestions?

18
NBC (2)
  • Assume conditional independence between xis
  • We have
  • P(Cx) P(x1C) P(xiC) (xnC)P(C)
  • Whats missing? Is it really correct? Why?
  • An example (Golfing or not)
  • How good is it in reality?
  • Even when the assumption is not held true
  • How to update an NBC when new data stream in?
  • What if one of P(xiC) is 0?
  • Laplace estimator adding 1 to each count

19
No Free Lunch
  • If the goal is to obtain good generalization
    performance, there are no context-independent or
    usage-independent reasons to favor one learning
    or classification method over another.
  • http//en.wikipedia.org/wiki/No-Free-Lunch_theorem
    s
  • What does it indicate?
  • Or is it easy to choose a good classifier for
    your application?
  • Again, there is no off-the-shelf solution for a
    reasonably challenging application.

20
Ensemble Methods
  • Motivation
  • Achieve the stability of classification
  • Model generation
  • Bagging (Bootstrap Aggregating)
  • Boosting
  • Model combination
  • Majority voting
  • Meta learning
  • Stacking (using different types of classifiers)
  • Examples (classify-ensemble.ppt)

21
AdaBoost.M1 (from the Weka Book)
Model generation
  • Assign equal weight to each training instance
  • For t iterations
  • Apply learning algorithm to weighted dataset,
  • store resulting model
  • Compute models error e on weighted dataset
  • If e 0 or e gt 0.5
  • Terminate model generation
  • For each instance in dataset
  • If classified correctly by model
  • Multiply instances weight by e/(1-e)
  • Normalize weight of all instances

Classification
Assign weight 0 to all classes For each of the
t models (or fewer) For the class this model
predicts add log e/(1-e) to this classs
weight Return class with highest weight
22
Using many different classifiers
  • We have learned some basic and often-used
    classifiers
  • There are many more out there.
  • Regression
  • Discriminant analysis
  • Neural networks
  • Support vector machines
  • Pick the most suitable one for an application
  • Where to find all these classifiers?
  • Dont reinvent the wheel that is not as round
  • We will likely come back to classification and
    discuss support vector machines as requested

23
Assignment 3
  • Questions about classification and evaluation
    (deadline 2/14, Wednesday)
  • Manually create a decision tree for the golfing
    data (D)
  • Manually create a NBC for D
  • How to create 1-NN for D in your view? Discuss
    your thoughts.
  • Run your decision tree algorithm (if you dont
    like to implement your own algorithm, you can use
    an available one) on D using 10-fold cross
    validation (or leave-one-out for this particular
    D) and 5 2-fold cross validation
  • Discuss the differences between the above two
    evaluation methods

24
Some software for demo or for teaching
  • C4.5 at the Rulequest site http//www.rulequest.co
    m/download.html
  • The free demo versions of Magnum Opus (for
    association rule mining) can be downloaded from
    the Rulequest site
  • Alphaminer (you probably will like it) at
    http//www.eti.hku.hk/alphaminer/
  • WEKA http//www.cs.waikato.ac.nz/ml/weka/

25
Classification via Neural Networks
Squash
?
A perceptron
26
What can a perceptron do?
  • Neuron as a computing device
  • To separate a linearly separable points
  • Nice things about a perceptron
  • distributed representation
  • local learning
  • weight adjusting

27
Linear threshold unit
  • Basic concepts projection, thresholding

W vectors evoke 1
W .11 .6
L .7 .7
.5
28
E.g. 1 solution region for AND problem
  • Find a weight vector that satisfies all the
    constraints

AND problem 0 0 0 0 1 0 1 0 0 1
1 1
29
E.g. 2 Solution region for XOR problem?
XOR problem 0 0 0 0 1 1 1 0 1 1
1 0
30
Learning by error reduction
  • Perceptron learning algorithm
  • If the activation level of the output unit is 1
    when it should be 0, reduce the weight on the
    link to the ith input unit by rLi, where Li is
    the ith input value and r a learning rate
  • If the activation level of the output unit is 0
    when it should be 1, increase the weight on the
    link to the ith input unit by rLi
  • Otherwise, do nothing

31
Multi-layer perceptrons
  • Using the chain rule, we can back-propagate the
    errors for a multi-layer perceptrons.

Output layer
Hidden layer
Input layer
Write a Comment
User Comments (0)
About PowerShow.com