RECOMMENDATION SYSTEMS - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

RECOMMENDATION SYSTEMS

Description:

... profile that includes various user characteristics, such as age, gender, income, ... Libra uses a na ve Bayesian text-categorization algorithm to learn a profile ... – PowerPoint PPT presentation

Number of Views:1029
Avg rating:3.0/5.0
Slides: 53
Provided by: cengMe
Category:

less

Transcript and Presenter's Notes

Title: RECOMMENDATION SYSTEMS


1
RECOMMENDATION SYSTEMS
ÖZNUR KIRMEMIS
2
OUTLINE
  • INTRODUCTION
  • FORMALIZATION OF THE PROBLEM
  • APPROACHES
  • COLLABORATIVE
  • CONTENT BASED
  • HYBRID
  • CONCLUSION

3
PAPERS
  • 1. Toward the Next Generation of Recommender
    Systems A Survey of the State-of-the-Art and
    Possible Extensions, Gediminas Adomavicius,
    Alexander Tuzhilin IEEE Transactions on Knowledge
    and Data Engineering(June 2005)
  • 2. Content-Boosted Collaborative Filtering for
    Improved Recommendations, Prem Melville, Raymond
    J. Mooney and Ramadass Nagarajan, Proceedings of
    the Eighteenth National Conference on Artificial
    Intelligence(AAAI-2002)
  • 3. Recommendation as Classification Using
    Social and Content-Based Information in
    Recommendation, Chumki Basu, Haym Hirsh,William
    Cohen(AAAI-1998)

4
PART 1
  • INTRODUCTION

5
Recommendations
  • We are in the Information society. The quantity
    of new information available every day goes over
    our limited processing capabilities.
  • We face far more choices than we can try in the
    world, like, which book shall I read, which movie
    is worth watching, where I shall have dinner
    tonight, etc.
  • For this reason, we need something able to
    suggest us only the worthwhile information.
  • Make search space smaller!

Items
Products, web sites, blogs, news items,
6
Recommendations
  • Acting upon recommendations from other people is
    a normal part of life.
  • By using recommendations we can take a shortcut
    to the things we like without having to try many
    things we dislike or without having to acquire
    all the knowledge to make an informed decision.
  • Recommender systems(RS) automate this facility.
  • Recommendation systems are thus a solution for
    information overload.

7
DEFINITION OF RS
  • programs which attempt to predict
  • items (movies, music, books, news,
  • web pages) that a user may be
  • interested in, given some information
  • about the user's profile

8
Recommendation Systems
  • Based on a synthesis of ideas from
  • Artificial Intelligence
  • Natural Language Processing
  • Human-Computer Interaction
  • Sociology
  • Information Retrieval
  • and the technology of the WWW

9
GENERIC RS
  • For a typical recommender system, there are three
    steps
  • The user provides some form of input to the
    system. These inputs can be both explicit and
    implicit . Ratings submitted by users are among
    explicit inputs whereas the URLs visited by a
    user and time spent reading a web site are among
    possible implicit inputs.
  • These inputs are brought together to form a
    representation of the user's likes and dislikes.
    This representation could be as simple as a
    matrix of items-ratings, or as complex as a data
    structure combining both content and rating
    information.
  • The system computes recommendations using these
    user profiles.
  • Even though the steps are essentially the same
    for most recommender systems, there have been
    different approaches to both step 2 and 3.

10
Current Examples
  • MovieLens
  • Movie recommendation
  • makes use of collaborative filtering technology
  • gathers user preferences by asking the user to
    rate movies.
  • searches for similar profiles (i.e. users that
    share the same or similar taste) and uses them to
    generate new suggestions.

11
Current Examples
  • Amazon
  • Book recommendations
  • recommends books frequently purchased by
    customers who purchased the selected book
  • customers receive text recommendations based on
    the opinions of other customers
  • LIBRA
  • Book recommendations
  • Combines a content-based approach with machine
    learning

12
Current Examples
  • Cinemax.com
  • Moviecritic movies again
  • And much more

13
PART 2
  • FORMALIZATION
  • OF THE PROBLEM

14
Formal Model
  • Let C be the set of all users or customers and
    let S be the set of all possible items that can
    be recommended, such as books, movies, or
    restaurants.
  • S set of Items
  • C set of Customers
  • Let u be a utility function that measures the
    usefulness of item s to user c
  • Utility function u
  • C S ? R,

15
Utility Function
  • Utility function u C S ? R,
  • R
  • e.g., 0-5 stars, real number in 0,1
  • u(c1,s1) r1 u(c1,s2) r2.....
  • Recommendation for each user c ? C, choose such
    item si ? S that maximizes the users utility

16
USER SPACE ITEM SPACE
  • USER SPACE(C)
  • can be defined with a profile that includes
    various user characteristics, such as age,
    gender, income, marital status, etc.
  • ITEM SPACE(S)
  • Similarly, each element of the item space S can
    be defined with a set of characteristics.
  • Ex (in a movie recommendation application)
  • S a collection of movies,
  • each movie can be represented not only by its ID,
    but also by its title, genre, director, year of
    release, leading actors, etc.

17
UTILITY FUNCTION
  • The central problem of recommender systems lies
    in that utility u is usually not defined on the
    whole CXS space, but only on some subset of it.
  • This means u needs to be extrapolated to the
    whole space CXS.
  • The recommendation engine should be able to
    estimate the ratings of the nonrated item/user
    combinations and issue appropriate
    recommendations based on these predictions.

18
Example Utility Matrix
King Kong
Garfield
Matrix
Usual Suspects
Ayse
Ali
Veli
Hasan
  • Gathering known ratings for matrix
  • Extrapolate unknown ratings from known ratings

19
EXTRAPOLATION
  • Extrapolations from known ratings are done by
  • Specifying heuristics that defines the utility
    function and validating its performance.
  • Estimating the utility function that optimizes
    certain performace criterion, such as the mean
    square error.
  • Once the unknown ratings are estimated,
    recommendations to a user are made by selecting
    the highest rating among all the estimated
    ratings for that user.
  • Alternatively, we can recommend the N best items
    to a user.

20
PART 3
  • APPROACHES
  • Content Based
  • Collaborative
  • Hybrid

21
APPROACHES
  • Recommender systems are usually classified into
    the following categories, based on how
    recommendations are made
  • Content-based recommendations
  • The user will be recommended items similar to the
    ones the user preferred in the past, similarity
    between user profile and item profile, or
    similarity between item profiles.
  • Collaborative recommendations
  • aim to identify users that have relevant
    interests and preferences by calculating
    similarities and dissimilarities between user
    profiles
  • The user will be recommended items that are
    preferred by other people with similar tastes and
    preferences.
  • Hybrid approaches
  • These methods combine collaborative and
    content-based methods.

22
  • CONTENT BASED METHODS

23
Content-based Methods
  • Main idea
  • recommend items to customer C similar to previous
    items rated highly by C
  • No similar user information!!
  • Formalization
  • the utility u(c,s) of item s for user c is
    estimated based on
  • the utilities u(c,si) assigned by user c to items
    si ? S that are similar to item s.

24
Content-based Methods
  • has its roots in information retrieval and
    information filtering research.
  • The improvement over the traditional information
    retrieval approaches comes from the use of user
    profiles that contain information about users
    tastes, preferences, and needs.
  • The profiling information
  • can be obtained from users explicitly, e.g.,
    through questionnaires, or
  • implicitlylearned from their transactional
    behavior over time.
  • Can use a machine learning algorithm to induce a
    profile of the users preferences

25
Plan of action(Item ProfileUser
ProfilePrediction Mechanism)
Item profiles
likes
recommend objects with similar content, same
color, shape,..
build
recommend
Red Circles Triangles
match
User profile
26
Item Profiles
  • For each item, create an item profile
  • Let Content(s) be an item profile,
  • a set of attributes characterizing item s.
  • movies author, title, actor, director
  • text set of important words in document
  • attributes are used to determine the
    appropriateness of the item for recommendation
    purposes.

27
Item Profiles
  • How attributes determined?
  • straightforward
  • By deciding which slots are important
  • Slots Author,Title,Editorial Reviews,..etc
  • By processing texts
  • The importance (or informativeness) of word
    kj in document dj is determined with some
    weighting measure wij that can be defined in
    several different ways.
  • One of the best-known measures for specifying
    keyword weights in Information Retrieval is the
    term frequency/inverse document frequency
    (TF-IDF) measure.

28
User profiles
  • Let ContentBasedProfile(c) be the profile of user
    c containing preferences of this user. These
    profiles are obtained by analyzing the content of
    the items previously seen and constructed using
    keyword analysis techniques from information
    retrieval.
  • For example, ContentBasedProfile(c) can be
    defined as a vector of weights (wc1, . . . ,
    wck), where each weight wci denotes the
    importance of keyword ki to user c and can be
    computed from individually rated content vectors
    using a variety of techniques.

29
Prediction
  • In content-based systems, the utility function
    u(c,s) is usually defined as
  • Especially, recommending Web pages, both
    ContentBasedProfile(c) of user c and Content(s)
    of document s can be represented as TF-IDF
    vectors and of keyword weights.
  • Moreover, utility function u(c,s) is usually
    represented in the information retrieval
    literature by some scoring heuristic defined in
    terms of vectors mentioned above, such as the
    cosine similarity measure. K is the total number
    of keywords in the system.

30
LIBRALearning Intelligent Book Recommending
Agent
  • Content-based recommender for books using
    information about titles extracted from Amazon.
  • Uses information extraction from the web to
    organize text into fields
  • Author
  • Title
  • Editorial Reviews
  • Customer Comments
  • Subject terms
  • Related authors
  • Related titles

31
EXAMPLE LIBRA System
32
Sample Extracted Information
Title Computers Exceed Human Intelligence Author
Price Publication Date
ISBN Related Titles
Mind Author Moravec Reviews
Text we humans Comments
Author
Text
Related Authors Drexler Subjects

33
Libra Content Information
  • Libra uses this extracted information to form
    bags of words for the following slots
  • Author
  • Title
  • Description (reviews and comments)
  • Subjects
  • Related Titles
  • Related Authors

34
Libra Overview
  • User rates selected titles on a 1 to 10 scale.
  • Libra uses a naïve Bayesian text-categorization
    algorithm to learn a profile from these rated
    examples.
  • Rating 610 Positive
  • Rating 15 Negative

35
LIMITATIONS(Content Based)
  • Finding the appropriate features
  • Overspecialization
  • Never recommends items outside users content
    profile
  • introduce some randomness
  • ex genetic algorithms
  • the diversity of recommendations is often a
    desirable feature in recommender systems.
  • Too similar items should not be recommended,
  • exa different news article describing the same
    event.

36
LIMITATIONS(Content Based)
  • Recommendations for new users
  • How to build a profile?
  • The user has to rate a sufficient number of items
    before a content-based recommender system can
    really understand the users preferences.
    Therefore, a new user, having very few ratings,
    would not be able to get accurate
    recommendations.

37
  • COLLABORATIVE FILTERING

38
Collaborative Filtering
  • Unlike content-based recommendation methods,
    collaborative recommender systems (or
    collaborative filtering systems) try to predict
    the utility of items based on the items
    previously rated by other similar users.
  • The utility u(c,s) of item s for user c is
    estimated based on the utilities u(c,s) assigned
    to item s by those users cj ? C who are similar
    to user c.

39
Basic Algorithm
  • Maintain a database of many users ratings of a
    variety of items.
  • For a given user, find other similar users whose
    ratings strongly correlate with the current user.
  • Recommend items rated highly by these similar
    users, but not rated by the current user.
  • Almost all existing commercial recommenders use
    this approach (e.g. Amazon).

40
Similar Users
  • Let rx be the vector of user xs ratings
  • Cosine similarity measure
  • sim(x,y) cos(rx , ry)
  • Pearson correlation coefficient
  • ....

41
Collaborative Filtering
42
LIMITATIONS(Collaborative)
  • New User Problem
  • same problem as with content-based systems.
  • In order to make accurate recommendations, the
    system must first learn the users preferences
    from the ratings that the user gives.
  • New Item Problem
  • New items are added regularly to recommender
    systems.
  • Collaborative systems rely solely on users
    preferences to make recommendations.
  • Therefore, until the new item is rated by a
    substantial number of users, the recommender
    system would not be able to recommend it.
  • Not a problem in content based!!
  • Works for any kind of item, No feature selection
    needed

43
  • HYBRID METHODS

44
Hybrid Methods
  • Content-based and collaborative methods have
    complementary strengths and weaknesses.
  • Combine methods to obtain the best of both.

45
HOW TO COMBINE?
  • Implement two separate recommenders and combine
    predictions, by giving weights
  • Add content-based methods to collaborative
    filtering
  • Use content-based predictor to complete
    collaborative data.
  • Content-Boosted Collaborative Filtering for
    Improved Recommendations,Prem Melville and
    Raymond J. Mooney and Ramadass Nagarajan,
    2002,AAAI

46
Movie Domain
  • hybrid approach in the domain of movie
    recommendation
  • the user-movie ratings from the EachMovie dataset
  • The dataset contains rating data provided by each
    user for various movies.
  • User ratings range from zero to five stars. Zero
    stars indicate extreme dislike for a movie and
    five stars indicate high praise.
  • The content information for each movie was
    collected from IMDb using a simple crawler.
  • The crawler follows the IMDB link provided for
    every movie in the EachMovie dataset and collects
    information.
  • Content information of every movie is represented
    by a set of slots (features).
  • Each slot is represented simply as a bag of
    words.
  • The slots used for the Each-Movie dataset are
    movie title, director, cast, genre, plot

47
Content-Boosted CF - I
48
Content-Boosted CF - II
User Ratings Matrix
Pseudo User Ratings Matrix
Content-Based Predictor
  • Compute pseudo user ratings matrix
  • Full matrix approximates actual full user
    ratings matrix
  • Perform CF
  • Using Pearson corr. between pseudo user-rating
    vectors

49
Content-Boosted Collaborative Filtering
EachMovie
IMDb
50
PART 4
  • CONCLUSION

51
CONCLUSION
  • Recommendation System is an important technology
    to combating information overload.
  • Collaborative filtering has problems.
  • Content-based methods address these problems (but
    have problems of their own).
  • Integrating both is best.

52
THANK YOU FOR LISTENING
QUESTIONS?
Write a Comment
User Comments (0)
About PowerShow.com