Title: Recommender Systems Session B
1Recommender SystemsSession B
- Robin Burke
- DePaul University
- Chicago, IL
2Roadmap
- Session A Basic Techniques I
- Introduction
- Knowledge Sources
- Recommendation Types
- Collaborative Recommendation
- Session B Basic Techniques II
- Content-based Recommendation
- Knowledge-based Recommendation
- Session C Domains and Implementation I
- Recommendation domains
- Example Implementation
- Lab I
- Session D Evaluation I
- Evaluation
- Session E Applications
- User Interaction
- Web Personalization
- Session F Implementation II
- Lab II
3Content-Based Recommendation
- Collaborative recommendation
- requires only ratings
- Content-based recommendation
- all techniques that use properties of the items
themselves - usually refers to techniques that only use item
features - Knowledge-based recommendation
- a sub-type of content-based
- in which we apply knowledge
- about items and how they satisfy user needs
4Content-Based Profiling
- Suppose we have no other users
- but we know about the features of the items rated
by the user - We can imagine building a profile based on user
preferences - here are the kinds of things the user likes
- here are the ones he doesn't like
- Usually called content-based recommendation
5Recommendation Knowledge Sources Taxonomy
RecommendationKnowledge
Collaborative
Opinion Profiles
Demographic Profiles
User
Opinions
Query
Demographics
Constraints
Requirements
Preferences
Content
Item Features
Context
Means-ends
DomainKnowledge
FeatureOntology
Contextual Knowledge
DomainConstraints
6Content-based Profiling
To find relevant items
? item a1 a2 a3 a4 ... ak
Recommend
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Obtain rated items
Build classifier
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Classifier
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Predict
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
Y
N
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
? item a1 a2 a3 a4 ... ak
7Origins
- Began with earliest forms of user models
- Grundy (Rich, 1979)
- Elaborated in information filtering
- Selecting news articles (Dumais, 1990)
- More recently spam filtering
8Basic Idea
- Record user ratings for item
- Generate a model of user preferences over
features - Give as recommendations other items with similar
content
9Movie Recommendation
- Predictions for unseen (target) items are
computed based on their similarity (in terms of
content) to items in the user profile. - E.g., user profile Pu contains
- recommend highly and recommend
mildly
10Content-Based Recommender Systems
11Personalized Search
- How can the search engine determine the users
context?
?
Query Madonna and Child
?
- Need to learn the user profile
- User is an art historian?
- User is a pop music fan?
12Play List Generation
- Music recommendations
- Configuration problem
- Must take into account other items already in
list
Example Pandora
13Algorithms
- kNN
- Naive Bayes
- Neural networks
- Any classification technique can be used
14Naive Bayes
- p(A) probability of event A
- p(A,B) probability of event A and event B
- joint probability
- p(AB) probability of event A given event B
- we know B happened
- conditional probability
- Example
- A is a student getting an "A" grade
- p(A) 20
- B is the event of a student coming to less than
50 of meetings - p(AB) is much less than 20
- p(A,B) would be the probability of both things
- how many students are in this category?
- Recommender system question
- Li is the event that the user likes item i
- B is the set of features associated with item i
- Estimate p(LiB)
15Bayes Rule
- p(AB) p(BA) p(A) / p(B)
- We can always restate a conditional probability
in terms of - the reverse condition p(BA)
- and two prior probabilities
- p(A)
- p(B)
- Often the reverse condition is easier to know
- we can count how often a feature appears in items
the user liked - frequentist assumption
16Naive Bayes
- Probability of liking an item given its features
- p(Lia1, a2, ... , ak)
- think of Li as the class for item i
- By the theorem
17Naive Assumption
- Independence
- the features a1, a2, ... , ak are independent
- independent means
- p(A,B) p(A)p(B)
- Example
- two coin flips P(heads) 0.5
- P(heads,heads) 0.25
- Anti-example
- appearance of the word "Recommendation" and
"Collaborative" in papers by Robin Burke - P("Recommendation") 0.6
- P("Collaborative") 0.3
- P("Recommendation","Collaborative")0.3 not 0.18
- In general
- this assumption is false for items and their
features - but pretending it is true works well
18Naive Assumption
- For joint probability
- For conditional probability
- Bayes' Rule
19Frequency Table
- Iterate through all examples
- if example is "liked"
- for each feature a
- add one to the cell for that feature under L
- similar for L
L L
a1
a2
...
ak
20Example
- Total of movies 20
- 10 liked
- 10 not liked
21Classification MAP
- Maximum a posteriori
- Calculate the probabilities for each possible
classification - pick the one with the highest probability
- Examples
- "12 Monkeys" Pitt Willis
- p(L12 Monkeys)0.13
- p(L12 Monkeys)1
- not liked
- "Devil's Own" Ford Pitt
- p(LDevil's Own)0.67
- p(LDevil's Own)0.53
- liked
22Classification LL
- Log likelihood
- For two possibilities
- Calculate probabilities
- Compute ln(p(Lia1, ... , ak)/p(Lia1, ... , ak)
- If gt 0, then classify as liked
- Examples
- "12 Monkeys" Pitt Willis
- ratio 0.13
- ln -2.1
- not liked
- "Devil's Own" Ford Pitt
- p(LDevil's Own)0.67
- p(LDevil's Own)0.53
- ratio 1.25
- ln 0.22
- liked
23Smoothing
- If a feature never appears in a class
- p(ajL)0
- that means that it will always veto the
classification - Example
- new movie director
- cannot be classified as "liked"
- because there are no liked instances in which he
is a feature - Solution
- Laplace smoothing
- add a small random value to all attributes before
starting
24Naive Bayes
- Works surprisingly well
- used in spam filtering
- Simple implementation
- just counting and multiplying
- requires O(F) space
- where F is the feature set used
- easy to update the profile
- classification is very fast
- Learned classifier can be hard-coded
- used in voice recognition and computer games
- Try this first
25Neural Networks
26Biological inspiration
dendrites
axon
synapses
The information transmission happens at the
synapses.
27How it works
- Source (pre-synaptic)
- Tiny voltage spikes travel along the axon
- At dendrites, neurotransmitter released in the
synapse - Destination (post-synaptic)
- Neurotransmiter absorbed by dendrites
- Causes excitation or inhibition
- Signals integrated
- may produce spikes in the next neuron
- Connections
- Synaptic connections can be strong or weak
28Artificial neurons
Neurons work by processing information. They
receive and provide information in form of
voltage spikes.
x1 x2 x3 xn-1 xn
w1
Output
w2
Inputs
y
w3
.
.
.
wn-1
wn
The McCullogh-Pitts model
29Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron
y is the neurons output, x is the vector of
inputs, and w is the vector of synaptic
weights. Examples
sigmoidal neuron Gaussian neuron
30Artificial neural networks
Output
Inputs
An artificial neural network is composed of many
artificial neurons that are linked together
according to a specific network architecture. The
objective of the neural network is to transform
the inputs into meaningful outputs.
31Learning with Back-Propagation
- Biological system
- seems to modify many synaptic connections
simultaneously - we still don't totally understand this
- A simplification of the learning problem
- calculate first the changes for the synaptic
weights of the output neuron - calculate the changes backward starting from
layer p-1, and propagate backward the local error
terms - Still relatively complicated
- much simpler than the original optimization
problem
32Application to Recommender Systems
- Inputs
- features of products
- binary features work best
- otherwise tricky encoding is required
- Output
- liked / disliked neurons
33NN Recommender
Item Features
Liked
Disliked
- Calculate recommendation score as yliked -
ydisliked
34Issues with ANN
- Often many iterations are needed
- 1000s or even millions
- Overfitting can be a serious problem
- No way to diagnose or debug the network
- must relearn
- Designing the network is an art
- input and output coding
- layering
- often learning simply fails
- system never converges
- Stability vs plasticity
- Learning is usually one-shot
- Cannot easily restart learning with new data
- (Actually many learning techniques have this
problem)
35Overfitting
- The problem of training a learner too much
- the learner continues to improve on the training
data - but gets worse on the real task
36Other classification techniques
- Lots of other classification techniques have been
applied to this problem - support vector machines
- fuzzy sets
- decision trees
- Essentials are the same
- learn a decision rule over the item features
- apply the rule to new items
37Content-Based Recommendation
- Advantages
- useful for large information-based sites (e.g.,
portals) or for domains where items have
content-rich features - can be easily integrated with content servers
- Disadvantages
- may miss important pragmatic relationships among
items (based on usage) - avante-garde jazz / classical
- not effective in small-specific sites or sites
which are not content-oriented - cannot achieve serendipity novel connections
38Break
39Roadmap
- Session A Basic Techniques I
- Introduction
- Knowledge Sources
- Recommendation Types
- Collaborative Recommendation
- Session B Basic Techniques II
- Content-based Recommendation
- Knowledge-based Recommendation
- Session C Domains and Implementation I
- Recommendation domains
- Example Implementation
- Lab I
- Session D Evaluation I
- Evaluation
- Session E Applications
- User Interaction
- Web Personalization
- Session F Implementation II
- Lab II
40Knowledge-Based Recommendation
- Sub-type of content-based
- we use the features of the items
- Covers other kinds of knowledge, too
- means-ends knowledge
- how products satisfy user needs
- ontological knowledge
- what counts as similar in the product domain
- constraints
- what is possible in the domain and why
41Recommendation Knowledge Sources Taxonomy
RecommendationKnowledge
Collaborative
Opinion Profiles
Demographic Profiles
User
Opinions
Query
Demographics
Constraints
Requirements
Preferences
Content
Item Features
Context
Means-ends
DomainKnowledge
FeatureOntology
Contextual Knowledge
DomainConstraints
42Diverse Possibilities
- Utility
- some systems concentrate on representing the
user's constraints in the form utility functions - Similarity
- some systems focus on detailed knowledge-based
similarity calculations - Interactivity
- some systems use knowledge to enhance the
collection of requirement information - For our purposes
- concentrate on case-based recommendation and
constraint-based recommendation
43Case-Based Recommendation
- Based on ideas from case-based reasoning (CBR)
- An alternative to rule-based problem-solving
- A case-based reasoner solves new problems by
adapting solutions used to solve old problems - -- Riesbeck Schank 1987
44CBR Solving Problems
Review
Retain
Database
Adapt
Retrieve
Similar
New Problem
45CBR System Components
- Case-base
- database of previous cases (experience)
- episodic memory
- Retrieval of relevant cases
- index for cases in library
- matching most similar case(s)
- retrieving the solution(s) from these case(s)
- Adaptation of solution
- alter the retrieved solution(s) to reflect
differences between new case and retrieved case(s)
46Retrieval knowledge
- Contents
- features used to index cases
- relative importance of features
- what counts as similar
- Issues
- surface vs deep similarity
47Analogy to the catalog
- Problem
- user need
- Case
- product
- Retrieval
- recommendation
48Entree I
49Entree II
50Entree III
51Critiquing Dialog
- Mixed-initiative interaction
- user offers input
- system responds with possibilities
- user critiques or offers additional input
- Makes preference elicitation gradual
- rather than all-at-once with a query
- can guide user away from empty parts of the
product space
52CBR retrieval
- Knowledge-based nearest-neighbor
- similarity metric defines distance between cases
- usually on an attribute-by-attribute basis
- Entree
- cuisine
- quality
- price
- atmosphere
53How do we measure similarity?
- complex multi-level comparison
- goal sensitive
- multiple goals
- retrieval strategies
- non-similarity relationships
- Can be strictly numeric
- weighted sum of similarities of features
- local similarities
- May involve inference
- reasoning about the similarity of items
54Price metric
55Cuisine Metric
European
Asian
French
Chinese
Japanese
NouvelleCuisine
Vietnamese
Thai
PacificNew Wave
56Metrics
- Goal-specific comparison
- How similar is target product to the source with
respect to this goal? - Asymmetric
- directional effects
- A small of general purpose types
57Metrics
- If they generate a true metric space
- approaches using space-partitioning techniques
- bsp, quad-trees, etc.
- Not always the case
- Hard to optimize
- storing n2 distances/recalculating
- FindMe calculates similarity at retrieval time
58Combining metrics
- Global metric
- combination of attribute metrics
- Hierarchical combination
- lower metrics break ties in upper
- Benefits
- simple to acquire
- easy to understand
- Somewhat inflexible
- More typical would be a weighted sum
59Constraint-based Recommendation
- Represent users needs as a set of constraints
- Try to satisfy those constraints with products
60Example
- User needs a car
- Gas mileage gt 25 mpg
- Capacity gt 5 people
- Price lt 18,000
- A solution would be a list of models satisfying
these requirements
61Configurable Products
- Constraints important where products are
configurable - computers
- travel packages
- business services
- (cars)
- The relationships between configurable components
need to be expressed as constraints anyway - a GT 6800 graphics card needs power supply gt 300
W
62Product Space
Weight lt x
Screen gt y
Weight
PossibleRecommendations
Screen Size
63Utility
- In order to rank products
- we need a measure of utility
- can be slack
- how much the product exceeds the constraints
- can be another measure
- price is typical
- can be a utility calculation that is a function
of product attributes - but generally this is user-specific
- value of weight vs screen size
64Product Space
Weight lt x
Screen gt y
Weight
C
A
B
Screen Size
65Utility
- SlackA (X WeightA) (SizeA - Y)
- not really commensurate
- PriceA
- ignores product differences
- UtilityA ? (X WeightA) ? (SizeA - Y ) ?
(X WeightA) (SizeA - Y ) - usually we ignore ? and treat utilities as
independent - how do we know what ? and ? are?
- make assumptions
- infer from user behavior
66Knowledge-Based Recommendation
- Hard to generalize
- Advantages
- no cold start issues
- great precision possible
- very important in some domains
- Disadvantages
- knowledge engineering required
- can be substantial
- expert opinion may not match user preferences
67Next
- Session C
- 1500
- Need laptops
- Install workspace