Title: Information Management on the WorldWide Web
1Information Management on the World-Wide Web
Junghoo John Cho UCLA Computer Science
2The Web and Information Galore
310 Years Ago
- Reading papers for research
- Stacks of papers
- Long wait
4With Web
5Challenges (1)
- Information overload
- Too much information, too little time
6Information Overload
- XML to Google
- 14 Million matching documents!
- XML to Amazon
- 464 matching books!
- Which one to read?
7Challenges (2)
- Hidden Web
- Not indexed by Search Engines
- Hidden from an average user
- Browse every site manually?
8Challenges (3)
9Challenges (4)
- Scattered unstructured data
- All Computer Science faculty members and graduate
students in the US?
10Projects In Our Group
- Web Archive
- Hidden Web Integration
- Page Ranking Algorithm
- User Recommendation System
11User Recommendation System
- 464 books on XML
- Which one to read?
- The one that my colleagues and friends recommend?
12Amazons Recommendation System
- 1 5 star rating by individual users
- Books can be sorted by average user rating
13My Typical Scenario
- Sort books by their average user rating
- Browse top 20 books to decide what to read
14Questions
- Is 5 star by one user better than 4.9 star by
100 users? - Intuitively, I prefer 4.9 star by 100 users
- More reliable rating
- How much can I trust the rating of a particular
person? - How do I know that the persons rating is reliable
15Our Approach
- Inherent quality or rating of a book
- How many users recommend the book (i.e., give
high rating) if all users have read the book? - More user rating ? More information on the
quality of the book - An average user is likely to give high rating for
a high-quality book
16Probabilistic Rating Model
- How likely is the book of 4 star rating?
- Rating probability distribution
Probability density
Book rating/quality
17Update of Rating Probability
- As more users provide rating, we update our
probability distribution
Probability density
Book rating/quality
18Update of Rating Probability
- As more users provide rating, we update our
probability distribution
After five-star rating by a user
Probability density
Book rating/quality
19Update of Rating Probability
- As more users provide rating, we update our
probability distribution
After one-star rating by a user
Probability density
Book rating/quality
20Update of Rating Probability
- As more users provide rating, we update our
probability distribution
After many ratings
Probability density
Book rating/quality
21Bayesian Inference Theory
- Given a user rating UR, what is the inherent
rating IR?
)
(
)
(
IR
P
IR
UR
P
)
(
UR
IR
P
)
(
UR
P
22User Model
- The characteristics of a user
- Sensitivity Slope of the curve
- 1 good, 1 bad, 0 not useful
23User Model
- The characteristics of a user
- Bias Average height of the curve
24Iterative Model Refinement
- As more users rate a book, we get better
estimates on book quality - As we estimate a book quality better, we get
better idea on a users sensitivity and bias
25Iterative Model Refinement
Book Rating Estimate
User-provided Rating
26Final Recommendation
- Recommend the book with the highest expected
rating
27Initial Results
- Our system prefers a 4.9-star book by 100 people
to a 5-star book by 1 user - If a user gives random ratings, the system
ignores the users rating - More thorough evaluation on the way
28Other Projects
- Web Archive
- Hidden Web Integration
- Page Ranking Algorithm
29Ph.D. Students on the Projects
Rob Adams
Victor Liu
30Thank You