Recommender Systems - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Recommender Systems

Description:

Recommender Systems – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 23
Provided by: AlexT155
Category:

less

Transcript and Presenter's Notes

Title: Recommender Systems


1
Recommender Systems
2
Collaborative Filtering Process
3
Challenge - Sparsity
  • Active users may have purchased well under 1 of
    the items (1 of 2 million books is 20,000
    books).
  • Solution Use sparse representations of the
    rating matrix.

4
Ratings in a hashtable
  • critics
  • 'Lisa Rose' 'Lady in the Water'
    2.5,
  • 'Snakes on a Plane'
    3.5,
  • 'Just my Luck' 3.0,
  • 'Superman Returns' 3.5,
  • 'You, Me and Dupree'
    2.5,
  • 'The Night Listener'
    3.0,
  • 'Gene Seymour' 'Lady in the Water'
    3.0,
  • 'Snakes on a Plane'
    3.5,
  • 'Just my Luck' 1.5,
  • 'Superman Returns' 5.0,
  • 'The Night Listener'
    3.0,
  • 'You, Me and Dupree'
    3.5,

5
Ratings in a hashtable
  • 'Michael Phillips' 'Lady in the Water'
    2.5,
  • 'Snakes on a Plane'
    3.0,
  • 'Superman Returns' 3.5,
  • 'The Night Listener'
    4.0,
  • 'Claudia Puig' 'Snakes on a Plane'
    3.5,
  • 'Just my Luck' 3.0,
  • 'The Night Listener'
    4.5,
  • 'Superman Returns' 4.0,
  • 'You, Me and Dupree'
    2.5,
  • 'Mick LaSalle' 'Lady in the Water'
    3.0,
  • 'Snakes on a Plane'
    4.0,
  • 'Just my Luck' 2.0,
  • 'Superman Returns' 3.0,
  • 'The Night Listener'
    3.0,
  • 'You, Me and Dupree'
    2.0,

6
Ratings in a hashtable
  • 'Jack Matthews' 'Lady in the Water'
    3.0,
  • 'Snakes on a Plane'
    4.0,
  • 'Superman Returns' 5.0,
  • 'The Night Listener'
    3.0,
  • 'You, Me and Dupree'
    3.5,
  • 'Toby' 'Snakes on a Plane'
    4.5,
  • 'Superman Returns' 4.0,
  • 'You, Me and Dupree'
    1.0

7
Finding Similar Users
  • Simple way to calculate a similarity score is to
    use Euclidean distance, which considers the items
    that people have ranked in common.

People in preference space
8
Computing Euclidean Distance
  • def sim_distance(prefs, person1, person2)
  • Get the list of shared items
  • si
  • for item in prefsperson1
  • if item in prefsperson2
  • si item
  • if len(si) 0 return 0
  • sum_of_squares sum(
  • (prefsperson1item-prefsperson2
    item)2
  • for item in si
  • )
  • return 1/(1sqrt(sum_of_squares))

9
Pearson Correlation Score
  • The correlation coefficient is a measure of how
    well two sets of data fit on a straight line.

Best fit line
10
Pearson Correlation Score
  • Corrects for grade inflation.
  • E.g., Jack Matthews tends to give higher scores
    than Lisa Rose, but the line still fits because
    they have relatively similar preferences.
  • Euclidean distance score will say they are quite
    dissimilar...

Two critics with a high correlation score.
11
Pearson Correlation Formula
12
Geometric Interpretation
  • For centered data (i.e., data which have been
    shifted by the sample mean so as to have an
    average of zero), the correlation coefficient can
    also be viewed as the cosine of the angle between
    two vectors.
  • E.g.,
  • suppose a critic rated five movies by 1, 2, 3, 5,
    and 8, respectively,
  • and another critic rated those movies by .11,
    .12, .13, .15, and .18.
  • These data are perfectly correlated y 0.10
    0.01 x.
  • Pearson correlation coefficient must therefore be
    exactly one.
  • Centering the data (shifting x by E(x) 3.8 and
    y by E(y) 0.138) yields
  • x (-2.8, -1.8, -0.8, 1.2, 4.2) and
  • y (-0.028, -0.018, -0.008, 0.012, 0.042), from
    which
  • as expected.

13
Pearson Correlation Code
  • def sim_pearson(prefs, person1, person2)
  • si
  • for item in prefsperson1
  • if item in prefsperson2
  • si item
  • n len(si)
  • if n 0 return 0
  • Add up all the preferences
  • sum1 sum(prefsperson1item for item in
    si)
  • sum2 sum(prefsperson2item for item in
    si)
  • Sum up the squares
  • sum1Sq sum(prefsperson1item2 for
    item in si)
  • sum2Sq sum(prefsperson2item2 for
    item in si)
  • Sum up the products
  • pSumsum( prefsperson1item
    prefsperson2item for item in si )

14
Top Matches
  • def topMatches(critics, person, n5,
    similaritysim_pearson)
  • scores (similarity(critics,person,other),
    other)
  • for other in critics if
    other!person
  • scores.sort()
  • scores.reverse()
  • return scores0n
  • gtgt recommendations.topMatches(recommendations.crit
    ics,'Toby',n3) (0.99124070716192991, 'Lisa
    Rose'),
  • (0.92447345164190486, 'Mick LaSalle'),
  • (0.89340514744156474, 'Claudia Puig')

15
Recommending Items
16
Recommending Items
  • def getRecommendations(prefs, person,
    similaritysim_pearson)
  • totals
  • simSums
  • for other in prefs
  • if otherperson continue
  • simsimilarity(prefs,person,other)
  • if simlt0 continue
  • for item in prefsother
  • only score movies I haven't seen yet
  • if item not in prefsperson
  • Similarity Score
  • totals.setdefault(item,0)
  • totalsitemprefsotheritemsim
  • Sum of similarities
  • simSums.setdefault(item,0)
  • simSumsitemsim

17
Recommending Items
  • Create the normalized list
  • rankings(total/simSumsitem,item) for
    item,total in totals.items()
  • Return the sorted list
  • rankings.sort( )
  • rankings.reverse( )
  • return rankings
  • gtgtgt recommendations.getRecommendations(recommendat
    ions.critics,'Toby')
  • (3.3477895267131013, 'The Night Listener'),
  • (2.8325499182641614, 'Lady in the Water'),
  • (2.5309807037655645, 'Just My Luck')

18
Matching Products
  • Recall Amazon

19
Transform the data
  • 'Lisa Rose' 'Lady in the Water' 2.5, 'Snakes
    on a Plane' 3.5,
  • 'Gene Seymour' 'Lady in the Water' 3.0,
    'Snakes on a Plane' 3.5
  • to
  • 'Lady in the Water''Lisa Rose'2.5,'Gene
    Seymour'3.0,
  • 'Snakes on a Plane''Lisa Rose'3.5,'Gene
    Seymour'3.5 etc..
  • def transformPrefs(prefs)
  • result
  • for person in prefs
  • for item in prefsperson
  • result.setdefault(item,)
  • Flip item and person
  • resultitempersonprefspersonitem
  • return result

20
Getting Similar Items
  • gtgt moviesrecommendations.transformPrefs(recommend
    ations.critics)
  • gtgt recommendations.topMatches(movies,'Superman
    Returns')
  • (0.657, 'You, Me and Dupree'),
  • (0.487, 'Lady in the Water'),
  • (0.111, 'Snakes on a Plane'),
  • (-0.179, 'The Night Listener'),
  • (-0.422, 'Just My Luck')

21
Whom to invite to a premiere?
  • gtgtrecommendations.getRecommendations(movies,'Just
    My Luck')
  • (4.0, 'Michael Phillips'),
  • (3.0, 'Jack Matthews')
  • For another example, reversing the products with
    the people, as done here, would allow an online
    retailer to search for people who might buy
    certain products.

22
Building a Cache
  • def calculateSimilarItems(prefs,n10)
  • Create a dictionary of items showing which
    other items they
  • are most similar to.
  • result
  • Invert the preference matrix to be
    item-centric
  • itemPrefstransformPrefs(prefs)
  • for item in itemPrefs
  • Find the most similar items to this one
  • scorestopMatches(itemPrefs,item,nn,similarit
    ysim_distance)
  • resultitemscores
  • return result
Write a Comment
User Comments (0)
About PowerShow.com