K-nearest neighbor methods - PowerPoint PPT Presentation

About This Presentation
Title:

K-nearest neighbor methods

Description:

Suggested Videos for: John A. Jamus. Your must-see list with predicted ratings: ... Time: to classify x, you need to loop over all training examples (x',y') to ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 36
Provided by: tommmi
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: K-nearest neighbor methods


1
K-nearest neighbor methods
  • William Cohen
  • 10-601 April 2008

2
But first.
3
Onward multivariate linear regression
Multivariate
col is feature
Univariate
row is example
4
Y
X
5
(No Transcript)
6
(No Transcript)
7
ACM Computing Surveys 2002
8
(No Transcript)
9
Review of K-NN methods (so far)
10
Kernel regression
  • aka locally weighted regression, locally linear
    regression, LOESS,

What does making the kernel wider do to bias and
variance?
11
BellCores MovieRecommender
  • Participants sent email to videos_at_bellcore.com
  • System replied with a list of 500 movies to rate
    on a 1-10 scale (250 random, 250 popular)
  • Only subset need to be rated
  • New participant P sends in rated movies via email
  • System compares ratings for P to ratings of (a
    random sample of) previous users
  • Most similar users are used to predict scores for
    unrated movies (more later)
  • System returns recommendations in an email
    message.

12
  • Suggested Videos for John A. Jamus.
  • Your must-see list with predicted ratings
  • 7.0 "Alien (1979)"
  • 6.5 "Blade Runner"
  • 6.2 "Close Encounters Of The Third Kind (1977)"
  • Your video categories with average ratings
  • 6.7 "Action/Adventure"
  • 6.5 "Science Fiction/Fantasy"
  • 6.3 "Children/Family"
  • 6.0 "Mystery/Suspense"
  • 5.9 "Comedy"
  • 5.8 "Drama"

13
  • The viewing patterns of 243 viewers were
    consulted. Patterns of 7 viewers were found to be
    most similar. Correlation with target viewer
  • 0.59 viewer-130 (unlisted_at_merl.com)
  • 0.55 bullert,jane r (bullert_at_cc.bellcore.com)
  • 0.51 jan_arst (jan_arst_at_khdld.decnet.philips.nl)
  • 0.46 Ken Cross (moose_at_denali.EE.CORNELL.EDU)
  • 0.42 rskt (rskt_at_cc.bellcore.com)
  • 0.41 kkgg (kkgg_at_Athena.MIT.EDU)
  • 0.41 bnn (bnn_at_cc.bellcore.com)
  • By category, their joint ratings recommend
  • Action/Adventure
  • "Excalibur" 8.0, 4 viewers
  • "Apocalypse Now" 7.2, 4 viewers
  • "Platoon" 8.3, 3 viewers
  • Science Fiction/Fantasy
  • "Total Recall" 7.2, 5 viewers
  • Children/Family
  • "Wizard Of Oz, The" 8.5, 4 viewers
  • "Mary Poppins" 7.7, 3 viewers
  • Mystery/Suspense
  • "Silence Of The Lambs, The" 9.3, 3 viewers
  • Comedy
  • "National Lampoon's Animal House" 7.5, 4 viewers
  • "Driving Miss Daisy" 7.5, 4 viewers
  • "Hannah and Her Sisters" 8.0, 3 viewers
  • Drama
  • "It's A Wonderful Life" 8.0, 5 viewers
  • "Dead Poets Society" 7.0, 5 viewers
  • "Rain Man" 7.5, 4 viewers
  • Correlation of predicted ratings with your actual
    ratings is 0.64 This number measures ability to
    evaluate movies accurately for you. 0.15 means
    low ability. 0.85 means very good ability. 0.50
    means fair ability.

14
Algorithms for Collaborative Filtering 1
Memory-Based Algorithms (Breese et al, UAI98)
  • vi,j vote of user i on item j
  • Ii items for which user i has voted
  • Mean vote for i is
  • Predicted vote for active user a is weighted sum

weights of n similar users
normalizer
15
Basic k-nearest neighbor classification
  • Training method
  • Save the training examples
  • At prediction time
  • Find the k training examples (x1,y1),(xk,yk)
    that are closest to the test example x
  • Predict the most frequent class among those yis.
  • Example http//cgm.cs.mcgill.ca/soss/cs644/proj
    ects/simard/

16
What is the decision boundary?
Voronoi diagram
17
Convergence of 1-NN
x2
P(Yx)
P(Yx)
x
y2
neighbor
y
x1
P(Yx1)
y1
assume equal
let yargmax Pr(yx)
18
Basic k-nearest neighbor classification
  • Training method
  • Save the training examples
  • At prediction time
  • Find the k training examples (x1,y1),(xk,yk)
    that are closest to the test example x
  • Predict the most frequent class among those yis.
  • Improvements
  • Weighting examples from the neighborhood
  • Measuring closeness
  • Finding close examples in a large training set
    quickly

19
K-NN and irrelevant features
?








o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
20
K-NN and irrelevant features

o
o

?
o
o

o
o
o
o
o
o

o


o


o
o
o
o
o
o
21
K-NN and irrelevant features
?
22
Ways of rescaling for KNN
Normalized L1 distance
Scale by IG
Modified value distance metric
23
Ways of rescaling for KNN
Dot product
Cosine distance
TFIDF weights for text for doc j, feature i
xitfi,j idfi
docs in corpus
occur. of term i in doc j
docs in corpus that contain term i
24
Combining distances to neighbors
Standard KNN
Distance-weighted KNN
25
(No Transcript)
26
(No Transcript)
27
William W. Cohen Haym Hirsh (1998) Joins that
Generalize Text Classification Using WHIRL in
KDD 1998 169-173.
28
(No Transcript)
29
(No Transcript)
30
M1
M2
Vitor Carvalho and William W. Cohen (2008)
Ranking Users for Intelligent Message Addressing
in ECIR-2008, and current work with Vitor, me,
and Ramnath Balasubramanyan
31
Computing KNN pros and cons
  • Storage all training examples are saved in
    memory
  • A decision tree or linear classifier is much
    smaller
  • Time to classify x, you need to loop over all
    training examples (x,y) to compute distance
    between x and x.
  • However, you get predictions for every class y
  • KNN is nice when there are many many classes
  • Actually, there are some tricks to speed this
    upespecially when data is sparse (e.g., text)

32
Efficiently implementing KNN (for text)
IDF is nice computationally
33
Tricks with fast KNN
  • K-means using r-NN
  • Pick k points c1x1,.,ckxk as centers
  • For each xi, find DiNeighborhood(xi)
  • For each xi, let cimean(Di)
  • Go to step 2.

34
Efficiently implementing KNN
dj3
Selective classification given a training set
and test set, find the N test cases that you can
most confidently classify
dj2
dj4
35
Train once and select 100 test cases to classify
Write a Comment
User Comments (0)
About PowerShow.com