Information Management on the WorldWide Web - PowerPoint PPT Presentation

About This Presentation
Title:

Information Management on the WorldWide Web

Description:

All Computer Science faculty members and graduate students in the US? 10 ... 1 5 star rating by individual users. Books can be sorted by 'average user rating' ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 31
Provided by: Jungh1
Learn more at: http://oak.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Information Management on the WorldWide Web


1
Information Management on the World-Wide Web
Junghoo John Cho UCLA Computer Science
2
The Web and Information Galore
3
10 Years Ago
  • Reading papers for research
  • Stacks of papers
  • Long wait

4
With Web
5
Challenges (1)
  • Information overload
  • Too much information, too little time

6
Information Overload
  • XML to Google
  • 14 Million matching documents!
  • XML to Amazon
  • 464 matching books!
  • Which one to read?

7
Challenges (2)
  • Hidden Web
  • Not indexed by Search Engines
  • Hidden from an average user
  • Browse every site manually?


8
Challenges (3)
  • Transience

9
Challenges (4)
  • Scattered unstructured data
  • All Computer Science faculty members and graduate
    students in the US?

10
Projects In Our Group
  • Web Archive
  • Hidden Web Integration
  • Page Ranking Algorithm
  • User Recommendation System

11
User Recommendation System
  • 464 books on XML
  • Which one to read?
  • The one that my colleagues and friends recommend?

12
Amazons Recommendation System
  • 1 5 star rating by individual users
  • Books can be sorted by average user rating

13
My Typical Scenario
  • Sort books by their average user rating
  • Browse top 20 books to decide what to read

14
Questions
  • Is 5 star by one user better than 4.9 star by
    100 users?
  • Intuitively, I prefer 4.9 star by 100 users
  • More reliable rating
  • How much can I trust the rating of a particular
    person?
  • How do I know that the persons rating is reliable

15
Our Approach
  • Inherent quality or rating of a book
  • How many users recommend the book (i.e., give
    high rating) if all users have read the book?
  • More user rating ? More information on the
    quality of the book
  • An average user is likely to give high rating for
    a high-quality book

16
Probabilistic Rating Model
  • How likely is the book of 4 star rating?
  • Rating probability distribution

Probability density
Book rating/quality
17
Update of Rating Probability
  • As more users provide rating, we update our
    probability distribution

Probability density
Book rating/quality
18
Update of Rating Probability
  • As more users provide rating, we update our
    probability distribution

After five-star rating by a user
Probability density
Book rating/quality
19
Update of Rating Probability
  • As more users provide rating, we update our
    probability distribution

After one-star rating by a user
Probability density
Book rating/quality
20
Update of Rating Probability
  • As more users provide rating, we update our
    probability distribution

After many ratings
Probability density
Book rating/quality
21
Bayesian Inference Theory
  • Given a user rating UR, what is the inherent
    rating IR?

)
(
)

(
IR
P
IR
UR
P

)

(
UR
IR
P
)
(
UR
P
22
User Model
  • The characteristics of a user
  • Sensitivity Slope of the curve
  • 1 good, 1 bad, 0 not useful

23
User Model
  • The characteristics of a user
  • Bias Average height of the curve

24
Iterative Model Refinement
  • As more users rate a book, we get better
    estimates on book quality
  • As we estimate a book quality better, we get
    better idea on a users sensitivity and bias

25
Iterative Model Refinement
Book Rating Estimate
User-provided Rating
26
Final Recommendation
  • Recommend the book with the highest expected
    rating

27
Initial Results
  • Our system prefers a 4.9-star book by 100 people
    to a 5-star book by 1 user
  • If a user gives random ratings, the system
    ignores the users rating
  • More thorough evaluation on the way

28
Other Projects
  • Web Archive
  • Hidden Web Integration
  • Page Ranking Algorithm

29
Ph.D. Students on the Projects
Rob Adams
  • Alex Ntoulas

Victor Liu
  • In Dr Chus group

30
Thank You
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com