More Methodology; Nearest-Neighbor Classifiers - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

More Methodology; Nearest-Neighbor Classifiers

Description:

Don't evaluate accuracy (performance) of your classifier (learning system) on ... What happens if one axis is measured in microns and one in lightyears? ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 25

Provided by: csU94

Category:

more less

Transcript and Presenter's Notes

Title: More Methodology; Nearest-Neighbor Classifiers

1
More MethodologyNearest-Neighbor Classifiers

Sec 4.7

2
Review Properties of DTs

Axis orthagonal, hyperrectangular,
piecewise-constant models
Categorical labels
Non-metric

3
Separation of train test

Fundamental principle (1st amendment of ML)
Dont evaluate accuracy (performance) of your
classifier (learning system) on the same data
used to train it!

4
Holdout data

Usual to hold out a separate set of data for
testing not used to train classifier
A.k.a., test set, holdout set, evaluation set,
etc.
E.g.,
is training set accuracy
is test set (or
generalization) accuracy

5
Gotchas...

What if youre unlucky when you split data into
train/test?
E.g., all train data are class A and all test are
class B?
No red things show up in training data
Best answer stratification
Try to make sure class (feature) ratios are same
in train/test sets (and same as original data)
Why does this work?
Almost as good randomization
Shuffle data randomly before split
Why does this work?

6
More gotchas...

What if your data set is small?
Might not be able to get perfect stratification
Cant get really representative accuracy from any
single train/test split
A cross-validation
for (i0iltki)
Xtrain,Ytrain,Xtest,Ytest
splitData(X,Y,N/k,i)
modelitrain(Xtrain,Ytrain)
cvAccsimeasureAcc(modeli,Xtest,Ytest)
avgAccmean(cvAccs)
stdAccstddev(cvAccs)

7
(No Transcript)
8
But is it really learning?

Now we know how well our models are performing
But are they really learning?
Maybe any classifier would do as well
E.g., a default classifier (pick the most likely
class) or a random classifier
How can we tell if the model is learning
anything?
Go back to first definitions
What does it mean to learn something?

9
The learning curve

Train on successively larger fractions of data
Watch how accuracy (performance) changes

10
Measuring variance

Cross validation helps you get better estimate of
accuracy for small data
Randomization (shuffling the data) helps guard
against poor splits/ordering of the data
Learning curves help assess learning
rate/asymptotic accuracy
Still one big missing component variance
Definition Variance of a classifier is the
fraction of error due to the specific data set
its trained on

11
Measuring variance

Variance tells you how much you expect your
classifier/performance to change when you train
it on a new (but similar) data set
E.g., take 5 samplings of a data source
train/test 5 classifiers
Accuracies 74.2, 90.3, 58.1, 80.6, 90.3
Mean accuracy 78.7
Std dev of acc 13.4
Variance is usually a function of both classifier
and data source
High variance classifiers are very susceptible to
small changes in data

12
Putting it all together

Suppose you want to measure the expected accuracy
of your classifier, assess learning rate, and
measure variance all at the same time?
for (i0ilt10i) // variance reps
shuffle data
do 10-way CV partition of data
for each train/test partition // xval
for (pct0.1pct0.1pctlt0.9) // LC
Subsample pct fraction of training set
train on subsample, test on test set
avg across all folds of CV partition
generate learning curve for this partition
get mean and std across all curves

13
Putting it all together
hepatitis data
14
5 minutes of math...

Decision trees are non-metric
Dont know anything about relations between
instances, except sets induced by feature splits
Often, we have well-defined distances between
points
Idea of distance encapsulated by a metric

15
5 minutes of math...

Definition a metric function
is a function that obeys the following
properties
Identity
Symmetry
Triangle inequality

16
5 minutes of math...

Examples
Euclidean distance

Note omitting the square root still yields a
metric and usually wont change our results
17
5 minutes of math...

Examples
Manhattan (taxicab) distance
Distance travelled along a grid between two
points
No diagonals allowed

18
5 minutes of math...

Examples
What if some attribute is categorical?

19
5 minutes of math...

Examples
What if some attribute is categorical?
Typical answer is 0/1 distance
For each attribute, add 1 if the instances differ
in that attribute, else 0

20
Distances in classification

Nearest neighbor find the nearest instance to
the query point in feature space, return the
class of that instance
Simplest possible distance-based classifier
With more notation

21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
NN miscellaney