Towards Scaling Fully Personalized PageRank - PowerPoint PPT Presentation

About This Presentation
Title:

Towards Scaling Fully Personalized PageRank

Description:

Towards Scaling Fully Personalized PageRank D niel Fogaras, Bal zs R cz Computer and Automation Research Institute of the Hungarian Academy of Sciences – PowerPoint PPT presentation

Number of Views:157
Avg rating:3.0/5.0
Slides: 17
Provided by: fd883
Category:

less

Transcript and Presenter's Notes

Title: Towards Scaling Fully Personalized PageRank


1
Towards Scaling Fully Personalized PageRank
  • Dániel Fogaras, Balázs Rácz

Computer and Automation Research Institute of the
Hungarian Academy of Sciences
Budapest University of Technology and Economics
2
Problem formulation
  • PageRank(Brin,Page,98)
  • PV PageRank vector, r uniform distribution vector
  • Overall quality measure of Web pages
  • Pre-computation evaluate PV by power iteration
  • Query order results by PV
  • Personalized PageRank(Brin,Page,98)
  • r preference vector of a user, query dependent
  • PPV(r)PV personalized quality measure of Web
    pages
  • Pre-computation r is not known. What to compute?
  • Query power-iteration. 5 hours/query!!!

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
3
Preliminaries
  • Linearity
  • Full personalization
  • Pre-compute PPV(ri) for all pages
  • V2 disk, V(VE) time, where V 109, E 1010,
    ???
  • Topic-Sensitive PageRank (Haveliwala 01)
  • Linearity
  • Pre-compute PPV(ri) for a topical basis r1,,rk,
    k20
  • Query user submits a topic by
  • Query engine combines PPV(ri) vectors
  • Scaling Personalized Web Search (Jeh, Widom, 03)
  • Decomposition, linearity
  • Pre-compute PPV(ri) for unit vectors r1,,rk,
    corresponding to k10.000 pages
  • Query personalization over the 10.000 pages

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
4
Towards full personalization
  • Our algorithm
  • Monte Carlo simulation, not power iteration
  • Pre-compute approximate PPV(ri) for all unit
    vectors r1,,rk, knumber of pages
  • Scalability quasi linear pre-computation
    sub-linear query
  • Main points of this presentation
  • Outline of the algorithm
  • Pre-computation external-memory, distributed
  • Query used to increase precision
  • Error of approximation tends to zero
    exponentially
  • Exact vs. approximated PPV -- space lower bounds

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
5
Outline of the Algorithm
  • Theorem (Jeh, Widom 03, F 03)
  • Random walk starts from page u
  • Uniform step with probability 1-c, stops with c
  • PPV(u,v)Pr the walk stops at page v
  • Monte Carlo algorithm
  • Pre-computation
  • From u simulate N independent random walks
  • Database of fingerprints ending vertices of the
    walks from all vertices
  • Query
  • PPV(u,v) ( walks u?v ) / N

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
6
External memory pre-computation
  • Goal N independent random walks from each vertex
  • Input webgraph V 109, E 1010
  • VE gt memory
  • Accessing the edges
  • Edge scan --- stream access
  • Edges sorted by source vertices

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
7
External memory pre-computation (2)
  • Goal N independent random walks from each vertex
  • Simulate all walks together

Iteration 1 blink 1 edge scan Sort path
ends Merge with the sorted graph Each walk stops
with prob. c E( walks ) (1-c)kNV after k
iterations
Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
8
Distributed indexing
  • M machines with fast local network connections
  • memory lt VE M(memory)

Parallelize for NV walks Parts of the graph in
RAM Remote transfers batched
M3
Heuristic partition one site to one
machine Machine1 www.cnn.com/, Machine2
www.yahoo.com/ Uniform load balance ? ordinary
PR distributed equally
Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
9
Query, increasing precision
  • Database of NV fingerprints (path endings)
  • Query PPV(u) empirical distribution
  • from N samples
  • Theorem (Jeh, Widom, 03)
  • O(u) denotes out-neighbors of u
  • Query PPV(u) empirical distribution
  • from NO(u) samples
  • Number of fingerprints for a query
  • F N(db accesses/query)

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
10
Error of approximation
  • Exact PPV(u,v)
  • Approximate by F fingerprints PPV(u,v)
  • Theorem
  • If PPV(u,v) gt PPV(u, holds, then
  • Pr PPV(u,v) lt PPV(u,w) lt exp( - 0.3Nd2 )
  • Idea of the proof
  • N( PPV(u,v) - PPV(u,w) ) (u?v) - (u?w)
  • sum of F iid. random variables with values
    -1,0,1
  • Bernsteins inequality
  • Error of approximation ? 0 exponentially with
  • F (db size/vertex)(db accesses/query) ? 8

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
11
Exact versus approximate
  • Model of computation
  • Input G graph with V vertices
  • Pre-compute a database of size D
  • Query respond by accessing only the db.
  • Exact
  • Query u,v,w
  • Decide if PPV(u,v) gt PPV(u,w) holds
  • Approximate for fixed e and d
  • Query u,v,w
  • Decide if PPV(u,v) gt PPV(u,w) holds with error
    probability e when PPV(u,v) - PPV(u,w) gt d

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
12
Lower bounds for the db size
  • For the webgraph V 109
  • Theorem 1
  • For the Exact problem D ?(V2) sized db is
    required in worst case
  • Theorem 2
  • For the Approximate problem D ?(V)
  • Is it possible to improve the 2nd lower bound?
  • Our algorithm uses a D O(V logV) sized db

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
13
Idea of the lower bound proofs
  • One-way communication complexity
  • Bit-vector probing (BVP)
  • Theorem B m for any protocol
  • Reduction from Exact-PPV to BVP

Alice has a bit vector Input x (x1, x2, , xm
)
Bob has a number Input 1 k m Xk ?
Communication B bits
Alice has x (x1, x2, , xm ) G graph with V
vertices, where V2 m Pre-compute an Exact PPV
database of size D
Bob has 1 k m u, v, w vertices PPV(u,v) ?
PPV(u,w) Xk ?
Communication Exact PPV db, D bits
Thus D B m V2
Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
14
Summary
  • Fully personalized PR
  • Monte-Carlo method, not power iteration
  • Pre-computation
  • External-memory, distributed
  • Query
  • Increase precision by (db accesses/query)
  • Error of approximation
  • Tends to zero exponentially
  • Space lower bounds
  • Quadratic for Exact PPR
  • Linear for Approximate PPR

15
Thank you!
Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
16
Misc
  • NPPV(u,v) (u?v) Binom(N,PPV(u,v))
  • Claim (by Chernoffs bound)
  • Pr PPV(u,v) gt (1d) PPV(u,v) lt
  • exp(-NPPV(u,v)d2/4)
  • If for a protocol Prright answer (1?) / 2
    then B ? m
  • PV PageRank vector, c constant, M normalized
    adjacency matrix,

Towards Scaling Fully Personalized
PageRank Dániel Fogaras, Balázs Rácz
Write a Comment
User Comments (0)
About PowerShow.com