Information Management on the WorldWide Web

About This Presentation

Title:

Information Management on the WorldWide Web

Description:

All Computer Science faculty members and graduate students in the US? 10 ... 1 5 star rating by individual users. Books can be sorted by 'average user rating' ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 31

Provided by: Jungh1

Learn more at: http://oak.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Information Management on the WorldWide Web

1
Information Management on the World-Wide Web
Junghoo John Cho UCLA Computer Science
2
The Web and Information Galore
3
10 Years Ago

Reading papers for research
Stacks of papers
Long wait

4
With Web
5
Challenges (1)

Information overload
Too much information, too little time

6
Information Overload

XML to Google
14 Million matching documents!
XML to Amazon
464 matching books!
Which one to read?

7
Challenges (2)

Hidden Web
Not indexed by Search Engines
Hidden from an average user
Browse every site manually?

8
Challenges (3)

Transience

9
Challenges (4)

Scattered unstructured data
All Computer Science faculty members and graduate
students in the US?

10
Projects In Our Group

Web Archive
Hidden Web Integration
Page Ranking Algorithm
User Recommendation System

11
User Recommendation System

464 books on XML
Which one to read?
The one that my colleagues and friends recommend?

12
Amazons Recommendation System

1 5 star rating by individual users
Books can be sorted by average user rating

13
My Typical Scenario

Sort books by their average user rating
Browse top 20 books to decide what to read

14
Questions

Is 5 star by one user better than 4.9 star by
100 users?
Intuitively, I prefer 4.9 star by 100 users
More reliable rating
How much can I trust the rating of a particular
person?
How do I know that the persons rating is reliable

15
Our Approach

Inherent quality or rating of a book
How many users recommend the book (i.e., give
high rating) if all users have read the book?
More user rating ? More information on the
quality of the book
An average user is likely to give high rating for
a high-quality book

16
Probabilistic Rating Model

How likely is the book of 4 star rating?
Rating probability distribution

Probability density
Book rating/quality
17
Update of Rating Probability

As more users provide rating, we update our
probability distribution

Probability density
Book rating/quality
18
Update of Rating Probability

As more users provide rating, we update our
probability distribution

After five-star rating by a user
Probability density
Book rating/quality
19
Update of Rating Probability

As more users provide rating, we update our
probability distribution

After one-star rating by a user
Probability density
Book rating/quality
20
Update of Rating Probability

As more users provide rating, we update our
probability distribution

After many ratings
Probability density
Book rating/quality
21
Bayesian Inference Theory

Given a user rating UR, what is the inherent
rating IR?

)
(
)

(
IR
P
IR
UR
P

)

(
UR
IR
P
)
(
UR
P
22
User Model

The characteristics of a user

Sensitivity Slope of the curve
1 good, 1 bad, 0 not useful

23
User Model

The characteristics of a user

Bias Average height of the curve

24
Iterative Model Refinement

As more users rate a book, we get better
estimates on book quality
As we estimate a book quality better, we get
better idea on a users sensitivity and bias

25
Iterative Model Refinement
Book Rating Estimate
User-provided Rating
26
Final Recommendation