Title: K-nearest neighbor methods
1K-nearest neighbor methods
- William Cohen
- 10-601 April 2008
2But first.
3Onward multivariate linear regression
Multivariate
col is feature
Univariate
row is example
4Y
X
5(No Transcript)
6(No Transcript)
7ACM Computing Surveys 2002
8(No Transcript)
9Review of K-NN methods (so far)
10Kernel regression
- aka locally weighted regression, locally linear
regression, LOESS,
What does making the kernel wider do to bias and
variance?
11BellCores MovieRecommender
- Participants sent email to videos_at_bellcore.com
- System replied with a list of 500 movies to rate
on a 1-10 scale (250 random, 250 popular) - Only subset need to be rated
- New participant P sends in rated movies via email
- System compares ratings for P to ratings of (a
random sample of) previous users - Most similar users are used to predict scores for
unrated movies (more later) - System returns recommendations in an email
message.
12- Suggested Videos for John A. Jamus.
- Your must-see list with predicted ratings
- 7.0 "Alien (1979)"
- 6.5 "Blade Runner"
- 6.2 "Close Encounters Of The Third Kind (1977)"
- Your video categories with average ratings
- 6.7 "Action/Adventure"
- 6.5 "Science Fiction/Fantasy"
- 6.3 "Children/Family"
- 6.0 "Mystery/Suspense"
- 5.9 "Comedy"
- 5.8 "Drama"
13- The viewing patterns of 243 viewers were
consulted. Patterns of 7 viewers were found to be
most similar. Correlation with target viewer - 0.59 viewer-130 (unlisted_at_merl.com)
- 0.55 bullert,jane r (bullert_at_cc.bellcore.com)
- 0.51 jan_arst (jan_arst_at_khdld.decnet.philips.nl)
- 0.46 Ken Cross (moose_at_denali.EE.CORNELL.EDU)
- 0.42 rskt (rskt_at_cc.bellcore.com)
- 0.41 kkgg (kkgg_at_Athena.MIT.EDU)
- 0.41 bnn (bnn_at_cc.bellcore.com)
- By category, their joint ratings recommend
- Action/Adventure
- "Excalibur" 8.0, 4 viewers
- "Apocalypse Now" 7.2, 4 viewers
- "Platoon" 8.3, 3 viewers
- Science Fiction/Fantasy
- "Total Recall" 7.2, 5 viewers
- Children/Family
- "Wizard Of Oz, The" 8.5, 4 viewers
- "Mary Poppins" 7.7, 3 viewers
- Mystery/Suspense
- "Silence Of The Lambs, The" 9.3, 3 viewers
- Comedy
- "National Lampoon's Animal House" 7.5, 4 viewers
- "Driving Miss Daisy" 7.5, 4 viewers
- "Hannah and Her Sisters" 8.0, 3 viewers
- Drama
- "It's A Wonderful Life" 8.0, 5 viewers
- "Dead Poets Society" 7.0, 5 viewers
- "Rain Man" 7.5, 4 viewers
- Correlation of predicted ratings with your actual
ratings is 0.64 This number measures ability to
evaluate movies accurately for you. 0.15 means
low ability. 0.85 means very good ability. 0.50
means fair ability.
14Algorithms for Collaborative Filtering 1
Memory-Based Algorithms (Breese et al, UAI98)
- vi,j vote of user i on item j
- Ii items for which user i has voted
- Mean vote for i is
- Predicted vote for active user a is weighted sum
weights of n similar users
normalizer
15Basic k-nearest neighbor classification
- Training method
- Save the training examples
- At prediction time
- Find the k training examples (x1,y1),(xk,yk)
that are closest to the test example x - Predict the most frequent class among those yis.
- Example http//cgm.cs.mcgill.ca/soss/cs644/proj
ects/simard/
16What is the decision boundary?
Voronoi diagram
17Convergence of 1-NN
x2
P(Yx)
P(Yx)
x
y2
neighbor
y
x1
P(Yx1)
y1
assume equal
let yargmax Pr(yx)
18Basic k-nearest neighbor classification
- Training method
- Save the training examples
- At prediction time
- Find the k training examples (x1,y1),(xk,yk)
that are closest to the test example x - Predict the most frequent class among those yis.
- Improvements
- Weighting examples from the neighborhood
- Measuring closeness
- Finding close examples in a large training set
quickly
19K-NN and irrelevant features
?
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
20K-NN and irrelevant features
o
o
?
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
21K-NN and irrelevant features
?
22Ways of rescaling for KNN
Normalized L1 distance
Scale by IG
Modified value distance metric
23Ways of rescaling for KNN
Dot product
Cosine distance
TFIDF weights for text for doc j, feature i
xitfi,j idfi
docs in corpus
occur. of term i in doc j
docs in corpus that contain term i
24Combining distances to neighbors
Standard KNN
Distance-weighted KNN
25(No Transcript)
26(No Transcript)
27William W. Cohen Haym Hirsh (1998) Joins that
Generalize Text Classification Using WHIRL in
KDD 1998 169-173.
28(No Transcript)
29(No Transcript)
30M1
M2
Vitor Carvalho and William W. Cohen (2008)
Ranking Users for Intelligent Message Addressing
in ECIR-2008, and current work with Vitor, me,
and Ramnath Balasubramanyan
31Computing KNN pros and cons
- Storage all training examples are saved in
memory - A decision tree or linear classifier is much
smaller - Time to classify x, you need to loop over all
training examples (x,y) to compute distance
between x and x. - However, you get predictions for every class y
- KNN is nice when there are many many classes
- Actually, there are some tricks to speed this
upespecially when data is sparse (e.g., text)
32Efficiently implementing KNN (for text)
IDF is nice computationally
33Tricks with fast KNN
- K-means using r-NN
- Pick k points c1x1,.,ckxk as centers
- For each xi, find DiNeighborhood(xi)
- For each xi, let cimean(Di)
- Go to step 2.
34Efficiently implementing KNN
dj3
Selective classification given a training set
and test set, find the N test cases that you can
most confidently classify
dj2
dj4
35Train once and select 100 test cases to classify