Efficient Computation of Personal Aggregate Queries on Blogs - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Computation of Personal Aggregate Queries on Blogs

Description:

1. Efficient Computation of Personal Aggregate ... User-generated content in Blogosphere and Web2.0 services contains rich ... Proposed by Fagin et.al. [2001] ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 22
Provided by: Richa131
Learn more at: http://oak.cs.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Computation of Personal Aggregate Queries on Blogs


1
Efficient Computation of Personal Aggregate
Queries on Blogs
  • Ka Cheung Sia1 Junghoo Cho1Yun Chi2 Belle L.
    Tseng31University of California, Los
    Angeles2NEC Labs America3Yahoo! Inc.ACM
    SIGKDD 2008

2
(No Transcript)
3
Motivation
  • Global aggregation
  • Recent news are picked up automatically
  • Dark Knight in the week of July 18
  • Olympics related in the week of August 8
  • Potential drawbacks
  • What if I am not interested in sports at all?
  • Groups of bloggers collaborated to promote
    advertisement videos
  • Personal aggregation
  • Users selectively aggregate from different
    sources
  • Efficient strategy to handle large number of
    users and sources

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Best of both worlds
  • Identify template users- typical users
    interested in sports / politics / technology /
    ...
  • Results of template users are pre-computed
  • Results of individual users are combined from
    partially computed results

9
Trust matrix decomposition
  • Trust matrix reflects user's interest
  • Decompose the T into two sub-matrices W and H
  • Non-negative Matrix Factorization (NMF)
  • W ltindividual users template usersgt
    relationship
  • H lttemplate users blogsgt relationship
  • User 2s trust vector is expressed as linear
    combination of the trust vectors of template user
    1 and 2

10
(No Transcript)
11
Partition of trust matrix
  • Decomposition is useful when the matrix is dense
  • Real life data is skewed
  • Hybrid method uses decomposition only when it is
    effective

2.7M subscription pairs
2. VIEW
1. OTF
Users with gt30 subscriptions Feeds with gt30
subscribers 10k feeds, 24k users1M subscription
pairs
Blogs with more subscribers
3. NMF
Users with more subscription
12
Experiments
  • Bloglines.com online RSS reader
  • Trust matrix T (1-0 version) subscription
    profile
  • 91,366 users
  • 487,694 RSS feeds
  • Endorsement matrix E blog - keywords occurrence
  • Feed content collected between Nov 2006 and Jul
    2007
  • Keywords filtered by nouns and high tf-idf values
    in entries
  • Platform
  • Python implementation of proposed scheme
  • MySQL server on linux with data on RAID disk

13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
Approximation accuracy
  • How many items are approximated by NMF in top 20
    list?
  • Ti top 20 items of user i computed by OTF Ai
    top 20 items of user i computed by NMF
  • 70 approximation and more accurate for higher
    rank items

Correlation with rank
17
(No Transcript)
18
Conclusion and future work
  • Deliver tailored results to users by personal
    aggregation
  • Proposed a model for personal aggregate queries
  • Optimization by NMF Threshold Algorithm
  • Real life dataset study shows query response time
    can be reduced by significantly with acceptable
    approximation accuracy
  • Handle updates of trust matrix change
  • Parallelism
  • Better phrase extraction (e.g. opinion
    orientation)

19
  • Thank you!
  • Q and A

20
(No Transcript)
21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com