K-nearest neighbor methods - PowerPoint PPT Presentation

About This Presentation

Title:

K-nearest neighbor methods

Description:

Suggested Videos for: John A. Jamus. Your must-see list with predicted ratings: ... Time: to classify x, you need to loop over all training examples (x',y') to ... – PowerPoint PPT presentation

Number of Views:124

Avg rating:3.0/5.0

Slides: 36

Provided by: tommmi

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: K-nearest neighbor methods

1
K-nearest neighbor methods

William Cohen
10-601 April 2008

2
But first.
3
Onward multivariate linear regression
Multivariate
col is feature
Univariate
row is example
4
Y
X
5
(No Transcript)
6
(No Transcript)
7
ACM Computing Surveys 2002
8
(No Transcript)
9
Review of K-NN methods (so far)
10
Kernel regression

aka locally weighted regression, locally linear
regression, LOESS,

What does making the kernel wider do to bias and
variance?
11
BellCores MovieRecommender

Participants sent email to videos_at_bellcore.com
System replied with a list of 500 movies to rate
on a 1-10 scale (250 random, 250 popular)
Only subset need to be rated
New participant P sends in rated movies via email
System compares ratings for P to ratings of (a
random sample of) previous users
Most similar users are used to predict scores for
unrated movies (more later)
System returns recommendations in an email
message.

Suggested Videos for John A. Jamus.
Your must-see list with predicted ratings
7.0 "Alien (1979)"
6.5 "Blade Runner"
6.2 "Close Encounters Of The Third Kind (1977)"
Your video categories with average ratings
6.7 "Action/Adventure"
6.5 "Science Fiction/Fantasy"
6.3 "Children/Family"
6.0 "Mystery/Suspense"
5.9 "Comedy"
5.8 "Drama"

The viewing patterns of 243 viewers were
consulted. Patterns of 7 viewers were found to be
most similar. Correlation with target viewer
0.59 viewer-130 (unlisted_at_merl.com)
0.55 bullert,jane r (bullert_at_cc.bellcore.com)
0.51 jan_arst (jan_arst_at_khdld.decnet.philips.nl)
0.46 Ken Cross (moose_at_denali.EE.CORNELL.EDU)
0.42 rskt (rskt_at_cc.bellcore.com)
0.41 kkgg (kkgg_at_Athena.MIT.EDU)
0.41 bnn (bnn_at_cc.bellcore.com)
By category, their joint ratings recommend
Action/Adventure
"Excalibur" 8.0, 4 viewers
"Apocalypse Now" 7.2, 4 viewers
"Platoon" 8.3, 3 viewers
Science Fiction/Fantasy
"Total Recall" 7.2, 5 viewers
Children/Family
"Wizard Of Oz, The" 8.5, 4 viewers
"Mary Poppins" 7.7, 3 viewers

Mystery/Suspense
"Silence Of The Lambs, The" 9.3, 3 viewers
Comedy
"National Lampoon's Animal House" 7.5, 4 viewers
"Driving Miss Daisy" 7.5, 4 viewers
"Hannah and Her Sisters" 8.0, 3 viewers
Drama
"It's A Wonderful Life" 8.0, 5 viewers
"Dead Poets Society" 7.0, 5 viewers
"Rain Man" 7.5, 4 viewers
Correlation of predicted ratings with your actual
ratings is 0.64 This number measures ability to
evaluate movies accurately for you. 0.15 means
low ability. 0.85 means very good ability. 0.50
means fair ability.

14
Algorithms for Collaborative Filtering 1
Memory-Based Algorithms (Breese et al, UAI98)

vi,j vote of user i on item j
Ii items for which user i has voted
Mean vote for i is
Predicted vote for active user a is weighted sum

weights of n similar users
normalizer
15
Basic k-nearest neighbor classification

Training method
Save the training examples
At prediction time
Find the k training examples (x1,y1),(xk,yk)
that are closest to the test example x
Predict the most frequent class among those yis.
Example http//cgm.cs.mcgill.ca/soss/cs644/proj
ects/simard/

16
What is the decision boundary?
Voronoi diagram
17
Convergence of 1-NN
x2
P(Yx)
P(Yx)
x
y2
neighbor
y
x1
P(Yx1)
y1
assume equal
let yargmax Pr(yx)
18
Basic k-nearest neighbor classification

Training method
Save the training examples
At prediction time
Find the k training examples (x1,y1),(xk,yk)
that are closest to the test example x
Predict the most frequent class among those yis.
Improvements
Weighting examples from the neighborhood
Measuring closeness
Finding close examples in a large training set
quickly

19
K-NN and irrelevant features
?

o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
20
K-NN and irrelevant features

o
o

?
o
o

o
o
o
o
o
o

o

o

o
o
o
o
o
o
21
K-NN and irrelevant features
?
22
Ways of rescaling for KNN
Normalized L1 distance
Scale by IG
Modified value distance metric
23
Ways of rescaling for KNN
Dot product
Cosine distance
TFIDF weights for text for doc j, feature i
xitfi,j idfi
docs in corpus
occur. of term i in doc j
docs in corpus that contain term i
24
Combining distances to neighbors
Standard KNN
Distance-weighted KNN
25
(No Transcript)
26
(No Transcript)
27
William W. Cohen Haym Hirsh (1998) Joins that
Generalize Text Classification Using WHIRL in
KDD 1998 169-173.
28
(No Transcript)
29
(No Transcript)
30
M1
M2
Vitor Carvalho and William W. Cohen (2008)
Ranking Users for Intelligent Message Addressing
in ECIR-2008, and current work with Vitor, me,
and Ramnath Balasubramanyan
31
Computing KNN pros and cons

Storage all training examples are saved in
memory
A decision tree or linear classifier is much
smaller
Time to classify x, you need to loop over all
training examples (x,y) to compute distance
between x and x.
However, you get predictions for every class y
KNN is nice when there are many many classes
Actually, there are some tricks to speed this
upespecially when data is sparse (e.g., text)

32
Efficiently implementing KNN (for text)
IDF is nice computationally
33
Tricks with fast KNN

K-means using r-NN
Pick k points c1x1,.,ckxk as centers
For each xi, find DiNeighborhood(xi)
For each xi, let cimean(Di)
Go to step 2.

34
Efficiently implementing KNN
dj3
Selective classification given a training set
and test set, find the N test cases that you can
most confidently classify
dj2
dj4
35
Train once and select 100 test cases to classify

Write a Comment

User Comments (0)