Title: On Computing Topt Most Influential Spatial Sites
1On Computing Top-t Most Influential Spatial Sites
- Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang
Du
- Northeastern University
- Boston, USA
2Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
3Problem Definition
- Given
- a set of sites S
- a set of weighted objects O
- a spatial region Q
- an integer t.
- Top-t most influential sites query
- find t sites in Q with the largest influences.
- influence of a site s total weight of objects
that consider s as the nearest site.
4Motivation
- Which supermarket in Boston is the most
influential among residential buildings?
- Sites supermarkets
- Objects residential buildings
- Weight people in a building
- Query region Boston
- Which wireless station in Boston is the most
influential among mobile users?
5Example
- Suppose all objects have weight 1, Q is the
whole space, and t 1.
- The most influential site is s1, with influence
3.
6Example
o2
o4
s2
s3
o5
o1
s4
s1
o3
o6
- Now that Q is the shadowed rectangle and t 2.
- Top-2 most influential sites s4 and s2.
7Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
8Related Work
- Bi-chromatic RNN query considers two datasets,
sites and objects.
- The RNNs of a site s ? S are the objects that
consider s as the nearest site.
9Related Work
- Solutions to the RNN query based on
pre-computation KM00, YL01.
10Related Work
- Solution to RNN query based on Voronoi diagram
SRAE01.
- Compute the Voronoi cell of s a region enclosing
the locations closer to s than to any other
sites.
- Querying the object R-tree using the Voronoi cell.
11Related Work SRAE01
o2
o4
s2
s3
o5
o1
s4
s1
o3
o6
12Our Problem vs. RNN Query
- RNN query
- A single site as an input.
- Interested in the actual set of the RNNs.
- Top-t most influential sites query
- A spatial region as an input.
- Interested in the aggregate weight of RNNs.
13Straightforward Solution 1
- For each site, pre-compute its influence.
- At query time, find the sites in Q and return the
t sites with max influences.
- Drawback 1 Costly maintenance upon updates.
- Drawback 2 binding a set of sites closely with a
set of objects.
14Straightforward Solution 2
- An extension of the Voronoi diagram based
solution to the RNN query.
- Find all sites in Q.
- For each such site, find its RNNs by using the
Voronoi cell, and compute its influence.
- Return the t sites with max influences.
15Straightforward Solution 2
- Drawback 1 All sites in Q need to be retrieved
from the leaf nodes.
- Drawback 2 The object R-tree and the site R-tree
are browsed multiple times.
- For each site in Q, browse the site R-tree to
compute the Voronoi Cell.
- For each such Voronoi Cell, browse the object
R-tree to compute the influence.
16Features of Our Solution
- Systematically browse both trees once.
- Pruning techniques are provided based on a new
metric, minExistDNN.
- No need to compute the influences for all sites
in Q, or even to locate all sites in Q.
17Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
18Motivation
- Intuitively, if some object in Oi may consider
some site in Sj as an NN, Oi affects Sj.
- To estimate the influences of all sites in a site
MBR Sj, we need to know whether an object MBR Oi
will affect Sj.
19maxDist A Loose Estimation
- If maxDist(O1, S1) affect S2.
- Why not good enough?
20minMaxDist A Tight Estimation?
- An object o does not affect S2, if there exists
S1 such that
- minMaxDist(o1, S1)
21minMaxDist A Tight Estimation?
- Not true for an object MBR O1.
22A Tight Estimation?
- A metric m(O1, S1) should
- guarantee that, each location in O1 is within
m(O1, S1) of a site in S1,
- and be the smallest distance with this property.
23New Metric minExistDNNS1(O1)
- Definition minExistDNNS1(O1)
- max minMaxDist(l, S1) ? location l? O1
- O1 does not affect S2, if there exists S1, s.t.
minExistDNNS1(O1)
24Examples of minExistDNNS1(O1)
25Calculating minExistDNNS1(O1)
- Step 1 Space partitioning
Every location l in the same partition is
associated with the second closest corner of S1
the distance is minMaxDist(l, S1)!
26Space Partitioning
- O1 is divided into multiple sub-regions, one in
each partition.
27Calculating minExistDNNS1(O1)
- Step 2 Choose up-to 8 locations on O1 border
and compute the minMaxDists to S1.
- minExistDNN is the largest one!
28Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
29Data Structure
- Two R-trees S of sites, O of objects.
- Three queues
- queueSIN entries of S inside Q.
- queueSOUT entries of S outside Q.
- queueO entries of O.
30Data Structure
- queueSIN
- queueO
- queueSOUT
S1
S2
O1
S3
31maxInfluence and minInfluence
- For each entry Sj in queueSIN,
- maxInfluence total weight of entries in queueO
that affect Sj.
- minInfluence total weight of entries in queueO
that ONLY affect Sj, divided by the number of
objects in Sj.
- queueSIN is sorted in decreasing order of
maxInfluence.
32Algorithm Overview
- Expand an entry from one of the three queues.
- Remove the entry from the queue.
- Retrieve the referenced node, and insert the
(unpruned) entries into the same queue.
- Update maxInfluence and minInfluence if
necessary.
- If top-t entries in queueSIN are sites, with
minInfluences maxInfluences of all remaining
entries, return.
33Example
- queueSIN S1
- queueO O1
- queueSOUT S3
- queueSIN S5, S7
- queueO O6
- queueSOUT S9
Q
- S6 is not affected by O1, prune S6.
- O5 does not affect S5 and S7, prune O5.
34A Pruning Case
S1
Expand S1
- S2 is pruned because of minExistDNNS3(O1) minDist(S2, O1)
35Choosing an Entry to Expand
- Expand top entries in queueSIN.
- Expand the most important Oi.
- Importance Oi affected entries area(Oi)
- Expand Sj that contains the most important Oi.
36Choosing an Entry to Expand
- Estimate the probability of pruning Oi using some
Sj in queueSOUT.
- After expanding S2, O1 is likely not to affect S1.
37Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
38Experimental Setup
- Data sets
- 24,493 populated places in North America
- 9,203 cultural landmarks in North America
- R-tree page size 1 KB
- LRU buffer 128 disk pages.
- t 4.
- Comparing to the solution using Voronoi diagram.
39Selected Experimental Results
40Selected Experimental Results
41Outline
- Problem Definition
- Related Work
- The New Metric minExistDNN
- Data Structures and Algorithm
- Experimental Results
- Conclusions
42Conclusions
- We addressed a new problem Top-t most
influential sites query.
- We proposed a new metric minExistDNN. It can be
used to prune search space in NN/RNN related
problems.
- We carefully designed an algorithm which
systematically browses both R-trees once.
- Experiments showed more than an order of
magnitude improvement.
43Thank you!