Title: MRI: Meaningful Interpretations of Collaborative Ratings
1MRI Meaningful Interpretations of Collaborative
Ratings
- Mahashweta Das Sihem Amer-Yahia Cong
Yu - Gautam Das
37th International Conference on Very Large Data
Bases, 2011 _at_ Seattle
2Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
3Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
4Motivation
5Motivation
6Motivation
7Motivation
- Examining reviews vs. trusting overall aggregate
rating - IMDB ratings demographic breakdown not meaningful
- enough
-
8MRI Problem
- Examining reviews vs. trusting overall aggregate
rating - IMDB ratings demographic breakdown not meaningful
- enough
- Novel and powerful third option Meaningful
Rating Interpretation - Explain ratings by leveraging user and item
attribute information -
9MRI Problem
- Examining reviews vs. trusting overall aggregate
rating - IMDB ratings demographic breakdown not meaningful
- enough
- Novel and powerful third option Meaningful
Rating Interpretation - Explain ratings by leveraging user and item
attribute information - Example
-
10MRI Problem
- Examining reviews vs. trusting overall aggregate
rating - IMDB ratings demographic breakdown not meaningful
- enough
- Novel and powerful third option Meaningful
Rating Interpretation - Explain ratings by leveraging user and item
attribute information - Example
-
11MRI Sub-problem
- DEM Meaningful Description Mining
- Identify groups of reviewers who consistently
share similar ratings on items
12MRI Sub-problem
- DEM Meaningful Description Mining
- Identify groups of reviewers who consistently
share similar ratings on items
13MRI Sub-problem
- DIM Meaningful Difference Mining
- Identify groups of reviewers who consistently
disagree on item ratings
14MRI Sub-problem
- DIM Meaningful Difference Mining
- Identify groups of reviewers who consistently
disagree on item ratings
15Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
16Data Model
- Collaborative rating site ltSet of Items, Set of
Users, Ratingsgt - Rating tuple ltitem attributes,
user attributes, ratinggt - Group Set of ratings describable by a set of
attribute values - Notion of group based on data cube
- OLAP literature for mining multidimensional data
ID Title Genre Director Name Gender Location Rating
1 Titanic Drama James Cameron Amy Female New York 8.5
2 Schindlers List Drama Steven Speilberg John Male New York 7.0
17Data Model
- Notion of group based on data cube lattice
Each node in lattice is a data cube/cuboid
Query condition on database
Figure 4-Dimensional Data Cube Lattice
18Data Model
- Notion of group based on data cube lattice
Each node in lattice is a data cube/cuboid
Query condition on database
A Gender B Age C Location D Occupation
Figure 4-Dimensional Data Cube Lattice
19Data Model
Each node/data cube/ cuboid in lattice is a group
Selection Query Condition
A Gender Male B Age Young C Location
CA D Occupation Student
Figure Partial Rating Lattice for a
Movie (MMale, YYoung, CACalifornia, SStudent)
20Data Model
Each node/data cube/ cuboid in lattice is a group
Selection Query Condition
A Gender Male B Age Young C Location
CA D Occupation Student
Figure Partial Rating Lattice for a
Movie (MMale, YYoung, CACalifornia, SStudent)
21Data Model
Task Quickly indentify good groups in the
lattice that help users understand ratings
effectively
Figure Partial Rating Lattice for a
Movie (MMale, YYoung, CACalifornia, SStudent)
22Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
23DEM Meaningful Description Mining
- For an input item covering RI ratings, return set
C of cuboids, such that - description error is
minimized, subject to - C k
- coverage a
-
- Description Error
- Measures how well a cuboid average rating
approximates the numerical score of each
individual rating belonging to it -
- Coverage
- Measures the percentage of ratings covered by
the returned cuboids -
- DEM is NP-Hard Proof details in paper
24DEM Algorithms
- Exact Algorithm (E-DEM)
- Brute-force enumerating all possible combinations
of cuboids in lattice to return the exact (i.e.,
optimal) set as rating descriptions - Random Restart Hill Climbing Algorithm
- Often fails to satisfy Coverage constraint Large
number of restarts required - Need an algorithm that optimizes both Coverage
and Description Error constraints simultaneously - Randomized Hill Exploration Algorithm (RHE-DEM)
25RHE-DEM Algorithm
Satisfy Coverage Minimize Error
C Male, Student California, Student
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
26RHE-DEM Algorithm
Satisfy Coverage Minimize Error
C Male, Student California, Student
Say,C does not satisfy Coverage Constraint
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
27RHE-DEM Algorithm
Satisfy Coverage Minimize Error
C Male, Student California, Student
C Male California,Student
C Student California,Student
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
28RHE-DEM Algorithm
Satisfy Coverage Minimize Error
v
C Male California, Student
Say, C satisfies Coverage Constraint
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
29RHE-DEM Algorithm
Satisfy Coverage Minimize Error
v
C Male California, Student
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
30RHE-DEM Algorithm
Satisfy Coverage Minimize Error
v
C Male California, Student
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
31RHE-DEM Algorithm
Satisfy Coverage Minimize Error
v
v
C Male Student
Figure Partial Rating Lattice for a Movie k2,
a80 (MMale, YYoung, CACalifornia, SStudent)
32DIM Meaningful Difference Mining
- For an input item covering RI RI- ratings,
return set C of cuboids, such that - difference balance
is minimized, subject to - C k
- a n
a -
- Difference Balance
- Measures whether the positive and negative
ratings are mingled together" (high balance) or
separated apart" (low balance) -
- Coverage
- Measures the percentage of , - ratings covered
by the returned cuboids - DIM is NP-Hard Proof details in paper
33DIM Algorithms
- Exact Algorithm (E-DIM)
- Randomized Hill Exploration Algorithm (RHE-DIM)
- Unlike DEM error, DIM balance computation is
expensive - Quadratic computation scanning all possible
positive and negative ratings for each set of
cuboids - Introduce the concept of Fundamental Regions to
aid faster balance computation - Partition space of all ratings and aggregate
rating tuples in each region -
34DIM Algorithms Fundamental Region
C1 Male, Student C2 California, Student
Balance
Figure Computing Balance using Fundamental
Region Set of k2 cuboids having
75 ratings (44, 31-),10 ratings (6, 4-)
35Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
36Experiments
- Dataset
- MovieLens100,000 ratings for 1682 movies by 943
users - Each user has 4 attributes Gender, Age,
Occupation, Location - Binning the movies Order movies according to
number of ratings and then partition into 6 bins - Bin 1 movies with fewest ratings, Bin 6 movies
with highest ratings - Evaluation
- Quantitative Indicator Efficiency, Quality and
Scalability - Qualitative Indicator Mechanical Turk User
Study
37Quantitative Experiments DEM
38Quantitative Experiments DEM
39Qualitative Experiments User Study
- Amazon Mechanical Turk study
- Two sets one for description mining, one for
difference mining - Each set 4 randomly chosen movies, 30
independent single- - user tasks
- Study 1 Users prefer simple aggregate ratings
over rating - interpretations
- Study 2 Users prefer rating interpretations by
exact algorithm or - heuristic randomized hill
exploration algorithm
40Qualitative Experiments User Study
41Roadmap
- Introduction
- Motivation
- Problem MRI
- Sub problem DEM
- Sub problem DIM
- Data Model
- Algorithms
- Experiments
- Quantitative
- Qualitative
- Conclusion Future Work
42Conclusion and Future Work
- Novel problem of meaningful rating interpretation
(MRI) in collaborative rating sites - Meaningful Description Mining
- Meaningful Difference Mining
- Heuristic algorithmic solutions that generate
equally good rating interpretations as exact
brute-force with much less execution time - Meaningful interpretations of ratings by
reviewers of interest - Additional constraints such as diversity of
rating explanations
43Related Work
- Data Cubes
- Gray et. al, A relational aggregation operator
generalizing group-by, cross-tab, and sub-totals,
ICDE 1996 - Sathe et. al, Intelligent rollups in
multidimensional olap data, VLDB 2001 - Lakshmanan et. al, Quotient cube how to
summarize the semantics of a data cube, VLDB 2002 - Ramakrishnan et. al, Exploratory mining in cube
space, ICDM 2006 - Wu et. al, Promotion analysis in
multi-dimensional space, VLDB 2009 - Clustering Dimensionality Reduction
- Agrawal et. al, Automatic subspace clustering of
high dimensional data for data mining
applications, SIGMOD 1998 - Recommendation Explanation
- Herlocker et. al, Explaining collaborative
filtering recommendations, CSCW 2000 - Bilgic et. al, Explaining recommendations
Satisfaction vs. promotion, IUI 2005
44Thank You
45Quantitative Experiments DIM
46Quantitative Experiments DEM, DIM