Title: LocationBased Services
1Location-Based Services Continuous kNN Query
Processing
- Tai Do
- Data Systems Group, UCF
- Fall 2005
2Outline
- Introduction to Data Management in Mobile
Computing. - Discussion on Location-Based Services and its
enabling technologies. - In-depth discussion on Continuous kNN queries (2
recent papers MHP05, and XMA05)
3Data Management in Mobile Computing
- Our interest application-driven research that
involves data management in mobile computing. - Services/applications that inspire data
management research - Location-based services, Transactional services,
Data mining applications. - Research problems to support these novel services
efficiently - Spatiotemporal Query Processing.
- Data dissemination over limited bandwidth
channels. - Data consistency guarantees.
- Advanced interfaces for mobile computers.
4Location-Based Services (LBS)
- Location-Based Services
- can be defined as services that integrate a
mobile devices location or position with other
information so as to provide added value to a
user. - Examples
- Military and Government industries
- Emergency services (E911 in US and 112 in Europe)
- Commercial Sector Advanced Traveler Information
Systems (DoT), location-aware games, Advertising
services - Commercial potentials of LBS(S03)
- Optimistic prediction 4B by 2002, 81.9B by
2005 (Europe only) - Pessimistic prediction 11M by 2002, 167M by
2005 (USA only). - Enabling Technologies
- Mobile Positioning Methods
- Location Update Techniques
- Location-based Query Processing
5Mobile Positioning
- GPS Global Positioning System. Accuracy up to 3
meters or more. - Cell-ID (Europe) Accuracy 100m-3km
Overview of LBS app. And level of accuracy
required (SV04)
6Location Update Techniques
- Dead-Reckoning Location Update Policies (GS05)
- Periodic Updates
7Concept of Uncertainty
- Uncertainty is an inherent feature in databases
storing location information. - Sources of uncertainty
- Mobile Positioning Methods
- Location Update Techniques
- Capturing uncertainty in the model and query
language is an ongoing research.
8Location-Based Queries
- Two kinds of location-based queries
- Snapshot queries Tell me 3 nearest cars around
me now - Continuous queries Monitor 3 nearest
restaurants around me in the next 10 minutes - We focus on continuous kNN (CkNN) query
processing. - Main memory solution Conceptual Partitioning
Model CPM MHP05 - Disk-based solution Shared Execution Algorithm
SEA-CNN XMA05
9Common Assumptions
10SEA-CNN Over view
- Overview
- Objects are stored in disk, everything else is in
memory. - Centralized processing.
- Support all kinds of mutability between objects
and queries. - No movement pattern, in open space.
- Goal
- Minimize I/O cost, and CPU time.
- Two important features
- Incremental evaluation of queries
- Shared execution
11SEA-CNN Data Structures
12SEA-CNN Incremental Search
- Key points
- For each query q, define a search region based on
past answer and recent movements of q and
objects. - Only objects inside search region are checked
against q. - Given q.ARt0 as the answer radius of q at time t0
(q.AR distance from q to kth-NN object) - At time t1, the search radius of query q
(q.SRt1) is computed - as follows
- Step 1 check if any object moves in q.ARt0
during t0, t1. If yes, q.SRt1 q. ARt0. If no,
q.SRt1 0. - Step 2 check if any object that was in q.ARt0
but moves out of q.ARt0 during t0, t1. If yes,
q.SRt1 equals the distance from q to the furthest
object. - Step 3 check if q moves during t0, t1. If yes
- If q. SRt1 0 then q.SRt1 q.ARt0 q.Loct1-
q.Loct0 - If q. SRt1 !0 then q.SRt1 q. SRt1 q.Loct1-
q.Loct0
13SEA-CNN Incremental Search(An Example)
Q1 O5 and Q1 move during T0, T1. So Q1.SRT1
Q1.ART0 Q1.LocT1-Q1.LocT0
Q2 O8 moves out of Q2.ART0 during T0, T1. So
Q2.SRT1 Q2.LocT1-O8.LocT0
14SEA-CNN Shared Execution
- Key points
- Utilize shared execution to reduce repeated I/O
operations. - Group similar queries together. Evaluating this
set of queries is reduced to a spatial join
between the objects and the queries.
15SEA-CNN Algorithm
16CPM Overview
- Overview
- Objects and queries are stored in memory.
- Centralized processing.
- Support all kinds of mutability between objects
and queries. - No movement pattern, in open space.
- Goal
- Minimize CPU time.
- Important features
- Conceptual Partitioning
- Simulate traditional kNN search (using
branch-and-bound search with breadth-first (or
best-first) traversal) - Roadmap
- Initial NN Computation (conceptual partitioning
branch and bound search breadth-first
traversal) - Handling Updates
17CPM Data Structures
18CPM NN Computation(Conceptual Partitioning)
- Conceptual Partitioning
- What is CP? Partitioning of cells into
rectangles based on proximity to the query cell.
Each rectangle has direction and level. - Why CP? A natural processing order of the cells.
Facilitate NN search (search minimal set of
cells).
19CPM NN Computation(Algorithm by Example)
- Search heap content (always sorted)
- H ltc4,4,0gt, ltU0,0.1gt, ltL0,0.2gt, ltR0,0.8gt,
ltD0,0.9gt - Deheap c4 do nothing.
- Deheap U0
- insert cells of U0
- Insert U1
- Continue until deheap ltc3,3, 1gt and find 1st
candidate p1 - best_dist dist(p1, q) 1.7
- Continue until deheap c2,4 and find p2
- best_dist dist(p2, q) 1.3
- Terminate because the next entry in the heap has
min_dist gt best_dist
20CPM Handling Updates
- Key Points
- Focus on moving objects, static queries. Moving
queries are treated as new queries. - Reexamine only queries whose influence regions
overlap with updated cells. - Re-compute affected queries incrementally based
on book keeping information to save computation
time.
21CPM Handling Updates(Algorithm by Example)
NN Re-computation Algorithm Input grid G,
affected query q Output new NN for q / Similar
to NN Computation. Utilize the book keeping
information in visit_list and search heap /
- p2 moves from c2,4 to c0,6
- c2,4 has q in the influence list and dist(q,
p2) gt best_NN dist(q, p2) ? mark q as affected
query. - c0,6 has an empty influence list ? ignore
- Re-compute NN for q in the NN Re-computation
algorithm
22SEA-CNN CPM A Comparison
- Common features between the two
- Performance metrics
- Use query processing time (or CPU time) at the
centralized server as the primary metric. - Ignore communication cost.
- Employ Grid-based Indexing (simple, fast
maintenance). - Keep a search region for each query to handle
updates. - Are the differences significant?
- CPM saves some computations over SEA-CNN (as
shown in the CPM paper) because CPM uses an
optimal search algorithm. - However, is saving in CPU time still very
important?
23Summary
- Monitoring queries to support LBS is an intensive
research area in the past few years - Short-term research trend seems to be proposals
of new, more advance query types (our next
presentation will discuss Reverse NN, and Group
NN). - Long-term research could be a Moving Object
Databases. Recommend Moving Objects Databases
textbook to gain perspective - Location-management perspective vs.
spatio-temporal data perspective. - Many LBS-based commercial products Verilocation,
uLocate, meetro, EarthComber, CellSpotting. - Standards and Development Software Natural Area
Coding System, Mobile Location Services Reference
Architecture by Sun. - For LBS updated info try LBSZone.
24References
- B99 D. Barbara. "Mobile Computing and
Databases- A Survey. In \em IEEE Transactions
of Knowledge and Data Engineering, 11(1),
108-117, 1999. - S03 http//www.wirelessdevnet.com/features/nacja
n03/ - GS05 R. H. Guting, M. Schneider. Moving Object
Databases. Book. - SV04 J. Schiller, A. Voisard. Location-based
Services. Book. - MHP05 Kyriakos Mouratidis, Marios
Hadjieleftheriou, Dimitris Papadias. Conceptual
Partitioning An Efficient Method for Continuous
Nearest Neighbor Monitoring Nearest Neighbor
Monitoring. SIGMOD 2005. - YPK05 Yu, X., Pu, K., Koudas, N. Monitoring
K-Nearest Neighbor Queries Over Moving Objects.
ICDE, 2005. - XMA05 Xiong, X., Mokbel, M., Aref, W. SEA-CNN
Scalable Processing of Continuous K-Nearest
Neighbor Queries in Spatio-temporal Databases.
ICDE, 2005. - CDT00 Jianjun Chen, David J. DeWitt, Feng
Tian, and Yuan Wang. NiagaraCQ A Scalable
Continuous Query System for Internet Databases.
In SIGMOD, 2000. - CF02 Sirish Chandrasekaran and Michael J.
Franklin. Streaming Queries over Streaming Data.
In VLDB, 2002. (Psoup system).
25Note
- Due date of your presentation slides is November
14 2005.
26Aggregate NN Queries in Spatial Databases and
Location-based Services
- Tai Do
- Data Systems Group, UCF
- November 11, 2005
27Outline
- Aggregate Nearest Neighbor (ANN) queries
- Introduction to ANN.
- Solutions for Group Nearest Neighbor (GNN)
Queries, a specific type of ANN. - Solutions for Continuous Group Nearest Neighbor
Queries (CGNN).
28Aggregate NNExamples and Applications
- Applications
- Business decision making (construction of new
facilities) - Military Rescue (earliest pick-up time)
- Severe weather monitoring (most dangerous area)
29Aggregate NN Definition
- What is ANN?
- A generalized form of NN search (multiple query
points vs. single query point) - Formally
- Given P p1, , pN (set of data points),
Qq1,qn (set of query points) - Aggregate distance function adist(p, Q)
f(pq1, , pqn) - An ANN query returns the data point p with the
minimum aggregate distance - Note AkNN is similar (find k gt1 data points),
we only focus on ANN. - When f sum, the ANN is called Group Nearest
Neighbor Queries.
30Group NN Queries
- Assumptions
- Queries are in memory.
- Data points are in disk and indexed by R-tree.
- Goal
- Minimize the extent and cost of the search (I/O
and CPU time) - Roadmap 3 solutions
- Multiple query method
- Single point method
- Minimum bound method
31Multiple Query Method (MQM)
- Apply multiple conventional NN queries, then
combine the results. - MQM is a straightforward application of the
threshold algorithm (FLN03) - Each query point visits incrementally its NN data
points (1st NN, then 2nd NN, ) - Compute the aggregate distance of the current NN
data point - Do the two above steps until we have seen the
best data point. - Main idea
- Question how do we know that the aggregate
distance of the seen data point is smaller than
the aggregate distance of unseen data points? - Answer Predict minimum aggregate distance of
unseen data points (or in other words, use a
threshold)
32MQM An Example (1)
33MQM An Example (2)
q2
q1
t1 0
t2 0
T 0, best_dist ?, best_NN null
34MQM An Example (3)
- Step 1
- Find the next (1st ) NN of q1
q1
q2
(p10, 2)
t1 2
T t1 t2 2 0 2
35MQM An Example (4)
- Step 2
- if the current aggregate distance lt best_dist ?
Update best_dist and best_NN
- If current best aggregate distance lt T ? Stop
- Else go to the next NN of the next query point
and repeat step 1
q1
q2
(p10, 2)
t1 2
p10
2
5
7
T 2
best_dist ?
best_dist 7
best_NN p10
36MQM An Example (5)
- Step 1
- Find the next (1st ) NN of q2
q1
q2
(p10, 2)
(p11, 3)
t1 2
7
t2 3
2
5
p10
T t1 t2 2 3 5
best_dist 7
best_NN p10
37MQM An Example (6)
- Step 2
- if the current aggregate distance lt best_dist ?
Update best_dist and best_NN
- If current best aggregate distance lt T ? Stop
- Else go to the next NN of the next query point
and repeat step 1
q1
q2
(p10, 2)
(p11, 3)
t1 2
t2 3
p10
2
5
7
best_dist 7
p11
3
3
6
T 5
best_dist 6
best_NN p11
38MQM An Example (7)
- Step 1
- Find the next (2nd ) NN of q1
q1
q2
(p10, 2)
(p11, 3)
7
t2 3
(p11, 3)
2
5
p10
t1 3
p11
3
3
6
T t1 t2 3 3 6
best_dist 6
best_NN p11
39MQM An Example (6)
- Step 2
- if the current aggregate distance lt best_dist ?
Update best_dist and best_NN
- If current best aggregate distance lt T ? Stop
- Else go to the next NN of the next query point
and repeat step 1
q1
q2
(p10, 2)
(p11, 3)
t1 3
(p11, 3)
p10
2
5
7
best_dist 6
t1 3
p11
3
3
6
T 6
No Update
p11
3
3
6
STOP
best_dist 6
best_NN p11
40Single Point Method (SPM)
- Problem with MQM
- Multiple accesses to the same node and retrieve
the same data point (e.g p11) through different
queries. - SPM processes queries by a single traversal.
- Strategy
- Compute the centroid q of Q, which is a point
with small adist(q, Q) - The GNN is a point of P near q.
- Challenges
- The computation of q.
- The range around q, in which we should look for
points of P, before we conclude that no better
GNN can be found.
41SPM Illustration
42SPM The Computation of q
43SPM Finding the range
- To define the range around q find heuristics
that can safely prune nodes in R-tree - Lemma 1
- For each query point qi we have pqi qiqgt
pq - Summing up the n inequalities
- ?pqi ?qiq gt npq ? adist (p, Q) gt npq
- adist (q, Q) (1) - Lemma 1 can be used for pruning intermediate
nodes - Node N can be pruned if
- mindist(N, q) gt (1/n) best_dist
adist(q,Q) (2) - Because when we transform this pruning rule we
have - n mindist(N, q) adist(q,Q) gt best_dist (3)
- For any p in node N dist(p,q) gt mindist(N,q),
so - n dist(p,q) adist(q, Q) gt best_dist (4)
- Using Lemma 1 we have adist(p, Q) gt best_dist,
hence node N can be safely pruned.
44SPM Pruning Illustration
- Both N1 and N2 can be pruned
- best_dist adist(best_NN, Q) 9
- adist(q, Q) 3
- (1/n)(best_dist adist(q,Q)) ½ (9 3) 6
- mindist(N1,q) 10 and mindist(N2,q) 6
45Minimum Bound Method (MBM)
- Like SPM, MBM performs a single query, but uses
the minimum bounding rectangle M of Q (instead of
a centroid q) to prune the search space. - Is MBM obviously better than SPM? No clear
reason. Must evaluate through experiments. - Strategy
- Use good heuristics to identify the qualifying
nodes
46Minimum Bound Method Heuristics
- Heuristic 1 A node N cant contain qualifying
points if - mindist (N, M) gt (1/n)best_dist, because for
any data point p in N - adist(p, Q) gt n mindist(N, M) gt
best_dist - Heuristic 1 prunes N1 but not N2.
- Heuristic 2 A node N can be safely pruned if
- ?(mindist(N, qi)) gt best_dist
- Heuristic 2 prunes both N1 and N2
47Performance Study
48Continuous Group NN
- Assumptions
- Both query points and data points are in memory.
- Method
- Use a grid index.
- Utilize conceptual partitioning of the space
around query Q. - Apply Minimum Bound Method.
49Continuous GNNDetails
- amindist (c, Q) ?(qi in Q) (mindist(c, qi)).
- amindist(c,Q) is the lower bound of mindist(p, Q)
for any data point p in cell c. - The GNN computation is similar to the NN
computation presented in previous class.
50Summary
- Threshold Algorithm
- Simple, useful, and reusable.
- Aggregate Nearest Neighbor Queries in Spatial
Database - Practical applications.
- Good heuristics are important.
- Optimal ANN search remains unsolved???
51References
- GS05 R. H. Guting, M. Schneider. Moving Object
Databases. Book. - PTM05 D. Papadia, Y. Tao, K. Mouratidis, and
C. K. Hui. Aggregate Nearest Neighbor Queries in
Spatial Databases. ACM Trans. On Database
Systems, Vol. 30, No. 2, June 2005, Pages
529-576. - MHP05 Kyriakos Mouratidis, Marios
Hadjieleftheriou, Dimitris Papadias. Conceptual
Partitioning An Efficient Method for Continuous
Nearest Neighbor Monitoring Nearest Neighbor
Monitoring. SIGMOD 2005. - PST04 Dimitris Papadias, Qiongmao Shen, Yufei
Tao, Kyriakos Mouratidis.Group Nearest Neighbor
Queries. ICDE 2004. - XZ06 Tian Xia, Donghui Zhang. Continuous
Reverse Nearest Neighbor Monitoring. ICDE 2006. - FLN03 Ronald Fagin, Amnon Lotem, and Moni
Naorc. Optimal aggregation algorithms for
middleware. Journal of Computer and System
Sciences 66 (2003) 614656. - www.cs.fiu.edu/vagelis/ classes/COP6727/slides/fa
gin.ppt. The animation for the MQM comes from
this.