Efficient Computation of Personal Aggregate Queries on Blogs

About This Presentation

Title:

Efficient Computation of Personal Aggregate Queries on Blogs

Description:

1. Efficient Computation of Personal Aggregate ... User-generated content in Blogosphere and Web2.0 services contains rich ... Proposed by Fagin et.al. [2001] ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 22

Provided by: Richa131

Learn more at: http://oak.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Efficient Computation of Personal Aggregate Queries on Blogs

1
Efficient Computation of Personal Aggregate
Queries on Blogs

Ka Cheung Sia1 Junghoo Cho1Yun Chi2 Belle L.
Tseng31University of California, Los
Angeles2NEC Labs America3Yahoo! Inc.ACM
SIGKDD 2008

2
(No Transcript)
3
Motivation

Global aggregation
Recent news are picked up automatically
Dark Knight in the week of July 18
Olympics related in the week of August 8
Potential drawbacks
What if I am not interested in sports at all?
Groups of bloggers collaborated to promote
advertisement videos
Personal aggregation
Users selectively aggregate from different
sources
Efficient strategy to handle large number of
users and sources

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Best of both worlds

Identify template users- typical users
interested in sports / politics / technology /
...
Results of template users are pre-computed
Results of individual users are combined from
partially computed results

9
Trust matrix decomposition

Trust matrix reflects user's interest
Decompose the T into two sub-matrices W and H
Non-negative Matrix Factorization (NMF)
W ltindividual users template usersgt
relationship
H lttemplate users blogsgt relationship
User 2s trust vector is expressed as linear
combination of the trust vectors of template user
1 and 2

10
(No Transcript)
11
Partition of trust matrix

Decomposition is useful when the matrix is dense
Real life data is skewed
Hybrid method uses decomposition only when it is
effective

2.7M subscription pairs
2. VIEW
1. OTF
Users with gt30 subscriptions Feeds with gt30
subscribers 10k feeds, 24k users1M subscription
pairs
Blogs with more subscribers
3. NMF
Users with more subscription
12
Experiments

Bloglines.com online RSS reader
Trust matrix T (1-0 version) subscription
profile
91,366 users
487,694 RSS feeds
Endorsement matrix E blog - keywords occurrence
Feed content collected between Nov 2006 and Jul
2007
Keywords filtered by nouns and high tf-idf values
in entries
Platform
Python implementation of proposed scheme
MySQL server on linux with data on RAID disk

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Approximation accuracy

How many items are approximated by NMF in top 20
list?
Ti top 20 items of user i computed by OTF Ai
top 20 items of user i computed by NMF
70 approximation and more accurate for higher
rank items

Correlation with rank
17
(No Transcript)
18
Conclusion and future work

Deliver tailored results to users by personal
aggregation
Proposed a model for personal aggregate queries
Optimization by NMF Threshold Algorithm
Real life dataset study shows query response time
can be reduced by significantly with acceptable
approximation accuracy
Handle updates of trust matrix change
Parallelism
Better phrase extraction (e.g. opinion
orientation)