More Methodology; Nearest-Neighbor Classifiers - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

More Methodology; Nearest-Neighbor Classifiers

Description:

Don't evaluate accuracy (performance) of your classifier (learning system) on ... What happens if one axis is measured in microns and one in lightyears? ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 25
Provided by: csU94
Category:

less

Transcript and Presenter's Notes

Title: More Methodology; Nearest-Neighbor Classifiers


1
More MethodologyNearest-Neighbor Classifiers
  • Sec 4.7

2
Review Properties of DTs
  • Axis orthagonal, hyperrectangular,
    piecewise-constant models
  • Categorical labels
  • Non-metric

3
Separation of train test
  • Fundamental principle (1st amendment of ML)
  • Dont evaluate accuracy (performance) of your
    classifier (learning system) on the same data
    used to train it!

4
Holdout data
  • Usual to hold out a separate set of data for
    testing not used to train classifier
  • A.k.a., test set, holdout set, evaluation set,
    etc.
  • E.g.,
  • is training set accuracy
  • is test set (or
    generalization) accuracy

5
Gotchas...
  • What if youre unlucky when you split data into
    train/test?
  • E.g., all train data are class A and all test are
    class B?
  • No red things show up in training data
  • Best answer stratification
  • Try to make sure class (feature) ratios are same
    in train/test sets (and same as original data)
  • Why does this work?
  • Almost as good randomization
  • Shuffle data randomly before split
  • Why does this work?

6
More gotchas...
  • What if your data set is small?
  • Might not be able to get perfect stratification
  • Cant get really representative accuracy from any
    single train/test split
  • A cross-validation
  • for (i0iltki)
  • Xtrain,Ytrain,Xtest,Ytest
  • splitData(X,Y,N/k,i)
  • modelitrain(Xtrain,Ytrain)
  • cvAccsimeasureAcc(modeli,Xtest,Ytest)
  • avgAccmean(cvAccs)
  • stdAccstddev(cvAccs)

7
(No Transcript)
8
But is it really learning?
  • Now we know how well our models are performing
  • But are they really learning?
  • Maybe any classifier would do as well
  • E.g., a default classifier (pick the most likely
    class) or a random classifier
  • How can we tell if the model is learning
    anything?
  • Go back to first definitions
  • What does it mean to learn something?

9
The learning curve
  • Train on successively larger fractions of data
  • Watch how accuracy (performance) changes

10
Measuring variance
  • Cross validation helps you get better estimate of
    accuracy for small data
  • Randomization (shuffling the data) helps guard
    against poor splits/ordering of the data
  • Learning curves help assess learning
    rate/asymptotic accuracy
  • Still one big missing component variance
  • Definition Variance of a classifier is the
    fraction of error due to the specific data set
    its trained on

11
Measuring variance
  • Variance tells you how much you expect your
    classifier/performance to change when you train
    it on a new (but similar) data set
  • E.g., take 5 samplings of a data source
    train/test 5 classifiers
  • Accuracies 74.2, 90.3, 58.1, 80.6, 90.3
  • Mean accuracy 78.7
  • Std dev of acc 13.4
  • Variance is usually a function of both classifier
    and data source
  • High variance classifiers are very susceptible to
    small changes in data

12
Putting it all together
  • Suppose you want to measure the expected accuracy
    of your classifier, assess learning rate, and
    measure variance all at the same time?
  • for (i0ilt10i) // variance reps
  • shuffle data
  • do 10-way CV partition of data
  • for each train/test partition // xval
  • for (pct0.1pct0.1pctlt0.9) // LC
  • Subsample pct fraction of training set
  • train on subsample, test on test set
  • avg across all folds of CV partition
  • generate learning curve for this partition
  • get mean and std across all curves

13
Putting it all together
hepatitis data
14
5 minutes of math...
  • Decision trees are non-metric
  • Dont know anything about relations between
    instances, except sets induced by feature splits
  • Often, we have well-defined distances between
    points
  • Idea of distance encapsulated by a metric

15
5 minutes of math...
  • Definition a metric function
  • is a function that obeys the following
    properties
  • Identity
  • Symmetry
  • Triangle inequality

16
5 minutes of math...
  • Examples
  • Euclidean distance

Note omitting the square root still yields a
metric and usually wont change our results
17
5 minutes of math...
  • Examples
  • Manhattan (taxicab) distance
  • Distance travelled along a grid between two
    points
  • No diagonals allowed

18
5 minutes of math...
  • Examples
  • What if some attribute is categorical?

19
5 minutes of math...
  • Examples
  • What if some attribute is categorical?
  • Typical answer is 0/1 distance
  • For each attribute, add 1 if the instances differ
    in that attribute, else 0

20
Distances in classification
  • Nearest neighbor find the nearest instance to
    the query point in feature space, return the
    class of that instance
  • Simplest possible distance-based classifier
  • With more notation

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
NN miscellaney
  • Slight generalization k-Nearest neighbors (k-NN)
  • Find k training instances closest to query point
  • Vote among them for label
  • Q How does this affect system?
  • Gotcha unscaled dimensions
  • What happens if one axis is measured in microns
    and one in lightyears?
  • Usual trick is to scale each axis to -1,1 range
Write a Comment
User Comments (0)
About PowerShow.com