Learning User Preferences - PowerPoint PPT Presentation

About This Presentation

Title:

Learning User Preferences

Description:

Learning User Preferences Jason Rennie MIT CSAIL jrennie_at_gmail.com Advisor: Tommi Jaakkola Information Extraction Informal Communication: e-mail, mailing lists ... – PowerPoint PPT presentation

Number of Views:126

Avg rating:3.0/5.0

Slides: 46

Provided by: Jason536

Category:

more less

Transcript and Presenter's Notes

Title: Learning User Preferences

1
Learning User Preferences
Jason Rennie MIT CSAIL jrennie_at_gmail.com
Advisor Tommi Jaakkola
2
Information Extraction

Informal Communication e-mail, mailing lists,
bulletin boards
Issues
Context switching
Abbreviations shortened forms
Variable punctuation, formatting, grammar

3
Thesis Advertisement Outline

Thesis is not end-to-end IE system
We address some IE problems
Identifying Resolving Named Entites
Tracking Context
Learning User Preferences

4
Identifying Named Entities

Rialto is now open until 11pm
Facts/Opinions usually about a named entity
Tools typically rely on punctuation,
capitalization, formatting, grammar
We developed criterion to identify topic-oriented
words using occurrence stats

Rennie Jaakkola, SIGIR 2005
5
Resolving Named Entites

Theyre now open until 11pm
What does they refer to?
Clustering
Group noun phrases that co-refer
McCallum Wellner (2005)
Excellent for proper nouns
Our contribution better modeling of non-proper
nouns (incl. pronouns)

6
Tracking Context

The Swordfish was fabulous
Indirect comment on restaurant.
Restaurant identifed by context.
Use word statistics to find topic switches
Contribution new sentence clustering algorithm

7
Learning User Preferences

Examples
I loved Rialto last night.
Overall, Oleana was worth the money
Radius wasnt bad, but wasnt great
Om was purely pretentious
Issues
Translate text to partial ordering or rating
Predict unobserved ratings

8
Preference Problems

Single User w/ Item Features
Multi-user, no features
Aka Collaborative Filtering

9
Single User, Item Features
Ratings
10
Single User, Item Features
?
Preference Scores
11
Many Users, No Features
Features
Weights
Ratings
Preference Scores
12
Collaborative Filtering

Possible goals
Predict missing entries
Cluster users or items
Applications
Movies, Books
Genetic Interaction
Network routing
Sports performance

items
users
13
Outline

Single User, Features
Loss functions, Convexity, Large Margin
Loss function for Ratings
Many Users, No Features
Feature Selection, Rank, SVD
Regularization tie together multiple tasks
Optimization scale to large problems
Extensions

14
This Talk Contributions

Implementation and systematic evaluation of loss
functions for Single User prediction.
Scaling Multi-user regularization to large
(thousands of users/items) problems
Analysis of optimization
Extensions
Hybrid features multiple users
Observation model multiple ratings

15
Rating Classification

n ordered classes
Learn weight vector, thresholds

1
3
2
3
1
2
1
2
1
1
2
3
2
2
1
3
3
3
w
16
Loss Functions
0-1
Hinge
Logistic
Smooth Hinge
Mod. Least Squares
Margin Agreement
17
Convexity

Convex function gt no local minima
Set convex if all line segments within set

18
Convexity of Loss Functions

0-1 loss is not convex
Local minima, sensitive to small changes
Convex Bound
Large margin solution with regularization
Stronger guarantees

19
Proportional Odds

McCullagh introduced original rating model
Linear interaction weights features
Thresholds
Maximum likelihood

McCullagh, 1980
20
Immediate-Thresholds
Shashua Levin, 2003
21
Some Errors are Better than Others
22
Not a Bound on Absolute Diff.
4
3
2
1
5
23
All-Thresholds Loss
Srebro, Rennie Jaakkola, NIPS 2004
24
Experiments
Multi-Class Imm-Thresh All-Thresh p-value
MLS .7486 .7491 .6700 1.7e-18
Hinge .7433 .7628 .6702 6.6e-17
Logistic .7490 .7248 .6623 7.3e-22
Least Squares 1.3368
Rennie Srebro, IJCAI 2005
25
Many Users, No Features
Features
Weights
Ratings
Preference Scores
26
Background Lp-norms

L0 non-zero entries lt0,2,0,3,4gt0 3
L1 absolute value sum lt2,-2,1gt1 5
L2 Euclidean length lt1,-1gt2 ?2
General vp (?i vip)1/p

27
Background Feature Selection

Objective Loss Regularization

L1
L2 Squared
28
Singular Value Decomposition

XUSV
U,V orthogonal (rotation)
S diagonal, non-negative
Eigenvalues of XXUSVVSUUSSU are squared
singular values of X
Rank s0
SVD used to obtain least-squares low-rank
approximation

29
Low Rank Matrix Factorization
V
U

X rank k
¼

Sum-Squared Loss
Fully Observed Y
Classification Error Loss
Partially Observed Y

Use SVD to find Global Optimum
Non-convex No explicit soln.
30
Low-Rank Non-Convex Set
31
Trace Norm Regularization
Fazel et al., 2001
32
Many Users, No Features
Features
V
X
U
Y
Weights
Ratings
Preference Scores
33
Max Margin Matrix Factorization
Trace Norm
All-Thresholds Loss

Convex function of X and ?
Low rank in X

Srebro, Rennie Jaakkola, NIPS 2004
34
Properties of the Trace Norm
The factorization U?S, V?S minimizes both
quantities
35
Factorized Optimization

Factorized Objective (tight bound)
Gradient descent O(n3) per round
Stationary points, but no local minima

Rennie Srebro, ICML 2005
36
Collaborative Prediction Results
size, sparsity EachMovie 36656x1648, 96 EachMovie 36656x1648, 96 MovieLens 6040x3952, 96 MovieLens 6040x3952, 96
Algorithm Weak Error Strong Error Weak Error Strong Error
URP .8596 .8859 .6946 .7104
Attitude .8787 .8845 .6912 .7000
MMMF .8548 .8439 .6650 .6725
URP Attitude Marlin, 2004
MMMF Rennie Srebro, 2005
37
Extensions

Multi-user Features
Observation model
Predict which restaurants a user will rate, and
The rating she will make
Multiple ratings per user/restaurant
E.g. Food, Service and Décor ratings
SVD Parameterization

38
Multi-User Features

Feature parameters (V)
Some are fixed
Some are learned
Learn weights (U) for all features
Fixed part of V does not affect regularization

V
39
Observation Model

Common assumption ratings observed at random
Restaurant selection
Geography, popularity, price, food style
Remove bias model observation process

40
Observation Model

Model as binary classification
Add binary classification loss
Tie together rating and observation models

?
XUXV WUWV
41
Multiple Ratings

Users may provide multiple ratings
Service, Décor, Food
Add in loss functions
Stack parameter matrices for regularization

42
SVD Parameterization

Too many parameters UAA-1VX is another
factorization of X
Alternate U,S,V
U,V orthogonal, S diagonal
Advantages
Not over-parameterized
Exact objective (not a bound)
No stationary points

43
Summary

Loss function for ratings
Regularization for multiple users
Scaled MMMF to large problems (e.g. gt 1000x1000)
Trace norm widely applicable
Extensions

Code http//people.csail.mit.edu/jrennie/matlab
44
Thanks!

Helen, for supporting me for 7.5 years!
Tommi Jaakkola, for answering all my questions
and directing me to the end!
Mike Collins and Tommy Poggio for addl guidance.
Nati Srebro John Barnett for endless valuable
discussions and ideas.
Amir Globerson, David Sontag, Luis Ortiz, Luis
Perez-Breva, Alan Qi, Patrycja Missiuro all
past members of Tommis reading group for paper
discussions, conference trips and feedback on my
talks.
Many, many others who have helped me along the
way!

45
Low-Rank Optimization

Write a Comment

User Comments (0)