Title: Incremental Frequent Route Based Trajectory Prediction
1Incremental Frequent Route Based Trajectory
Prediction
Karlsruhe Institute of Technology European Centre
for Soft Computing KTH Royal Institute of
Technology
Anja Bachmann Christian Borgelt Gyözö Gidofalvi
2Outline
- Introduction
- Related work
- IncCCFR
- Trajectory representation
- Stream processing model
- Incremental mining of Closed Contiguous Frequent
Routes (CCFR) - CCFR-based trajectory prediction
- Empirical evaluations
3Introduction
- Congestion is a serious problem
- Economic losses and quality of life degradation
that result from increased and unpredictable
travel times - Increased level of carbon footprint that idling
vehicles leave behind - Increased number of traffic accidents that are
direct results of stress and fatigue of drivers
that are stuck in congestion
- Road network expansion is not a sustainable
solution - Instead monitor ? understand ? control movement
and congestion
4Modern Traffic Prediction and Managemnt System
(TPMS)
- Motivated by
- Widespread adoption of online GPS-based on-board
navigation systems and location-aware mobile
devices - Movement of an individual contains a high degree
of regularity - Use vehicle movement data as follows
- Vehicles periodically send their location (and
speed) to TPMS - TPMS extracts traffic / mobility patterns from
the submitted information - TPMS uses traffic / mobility patterns current /
recent historical locations (and speeds) of the
vehicles for - Short-term traffic prediction and management
- Predict near-future locations of vehicles and
near-future traffic conditions - Inform the relevant vehicles in case of an
(actual / predicted) event - Suggest how and which vehicles to re-route in
case of an event - Long-term traffic and transport planning
5Remaining Challenges
- Sequential pattern based trajectory prediction is
difficult to adopt to capture the temporal and
periodic variations - Trajectory prediction systems model and provide
knowledge about the movement of the objects at a
fixed level of detail, while different
applications (real-time management vs. long-term
planning) need different levels of detail. - Predictions tend to be based on either historical
or current information while both types of
information are relevant. - No end-to-end system for management, incremental
mining and accurate prediction of continuously
evolving trajectories of moving objects.
6Outline
- Introduction
- Related work
- IncCCFR
- Trajectory representation
- Stream processing model
- Incremental mining of Closed Contiguous Frequent
Routes (CCFR) - CCFR-based trajectory prediction
- Empirical evaluations
7Related Work Frequent Pattern Mining
- 20 years of research
- Frequent pattern types itemsets ? sequences ?
graphs - Exponential search space is pruned based on the
anti-monotonicity of the pattern support measure
given a minimum support threshold min_sup - Pattern constraints
- Maximal (lossy) Pattern X is a maximal if X is
frequent and there does not exist another pattern
Y that is a proper superset of X that is
frequent. ? lossy - Closed (lossless) Pattern X is closed if X is
frequent and there does not exist another pattern
Y that is a proper superset of X that has the
same support as X. - Processing models batch ? online / stream ?
incremental
8Related Work Trajectory Prediction
- Prediction model
- Markov model
- Sequential rule / trajectory pattern
- Model basis / generality
- General model for all objects
- Type-base model for similar (type of) objects
- Specific model for each individual object
- Definition of Regions Of Interest (ROI) for
prediction - Application specific ROIs (road segments, network
cells, sensors, etc.) - Density-based ROIs
- Grid-based ROIs
- Prediction provision
- Sequential spatial prediction (loc. of next ROI)
- Spatio-temporal prediction
- Additional movement assumptions or models YES /
NO
9Outline
- Introduction
- Related work
- IncCCFR
- Trajectory representation
- Stream processing model
- Incremental mining of Closed Contiguous Frequent
Routes (CCFR) - CCFR-based trajectory prediction
- Empirical evaluations
10Trajectory Representation
- Grid G with side length glen uniformly partitions
the 2D space - Representation is without limitations, easily
scalable to different level of details -
- Grid based trajectory
- start time
- temporally annotated sequence sequence of
traversed grid cells and associated traversal
times - Modeling the stopping of objects append a pseudo
grid cell (stop) after the last (real) grid
cell of each completed trip trajectory
11Stream Processing Model
- Temporal sliding window model window size and
window stride
stride
size
completed trips
partial trips
12Mining of Closed Contiguous Frequent Routes
- Grow CCFRs (or patterns) in a depth-first fashion
- Start with single grid cells
- Recursively extend by adding one grid cell in
each recursion - Data structure
- Simple flat array representation of the
trajectories is used - References are kept to the current ends of the
pattern occurrences in order to be able to
quickly find and group possible extensions. - Simple and fast closedness checking of contiguous
patterns direct check of possible superpatterns
and their support by generating and testing all
possible extensions of a given pattern - Without limitations, annotate CCFRs with global
traversal times of grid cells
13Increamental CCFR Mining
- General idea from Bifet et al. for incremental
closed subgraph mining - Weight closed patterns by their relative
support and mine the weighted patterns to
reproduce the original pattern set, i.e., the
combined operation of weighting and mining is an
idempotent operation f(x)f(f(x)) - Idempotent pattern weight (ipw) of a pattern is
its support minus the support of all of its
super-patterns in the pattern set - Incremental mining combine and mine patterns of
patterns sets from non-overlapping windows to
reproduce and approximation of results
CCFR(i-2..i)
wi
wi-1
wi-2
stride
mine
mine
ipwi-2
Approx. CCFR(i-2..i)
CCFRi-2
14Capture Temporal and Periodic Variations
- Use the same pattern weighting methodology to
combine patterns from temporally relevant
historical windows - Temporal domain projections to capture periodic
variations at different levels
ipwMonday_at_9am
CCFRMonday_at_9am
ipwTuesday_at_9am
mine
CCFRTuesday_at_9am
ipwFriday_at_9am
CCFRFriday_at_9am
15Faulty Support Definition and the Fix
- Example database of two sequences ABC and ABDBC
- min_sup 2
- Original support def of sequences that contain
the pattern - Closed patterns and their support AB2 and BC2
- NOTE A, B , or C alone are not closed!
- ipw of patterns ipw(AB)2 and ipw(BC)2
- Mining after ipw-weigting yields patterns AB2,
BC2 and B4 ? cannot be! - New support def of times the pattern occurs in
the sequences - Closed patterns and their support B3, AB2 and
BC2 - ipw of patterns ipw(B)3-2-2-1, ipw(AB)2 and
ipw(BC)2 - Mining after ipw-weigting yields patterns AB2,
BC2 and B3 (idempotency) - Fix only works for directed sequences and
contiguous patterns!
16CCFR Based Prediction
- Given a set of CCFRs R, iteratively extend the
query vector q (partial trajectory) that ends in
an anchor a as follows - Find the set of best matching patterns R that
contain the longest contiguous suffix s of q
starting from a - Calculate the successor probability of the cell
grid cells that occur in the patterns in R
directly after an occurrence of s - Retrieve the neighboring cell probability of
every grid cell that occurs in the trips after
the anchor a - Complete the successor probability distribution
over the neighbors of a using the neighboring
cell probabilities - Extend q with the most likely successor grid cell
c and reduce the prediction horizon by the gobal
average of the traversal time of c - Stop and return c if the remaining prediction
horizonlt0 otherwise go to step 1.
17Illustrative Example Trajectories and Mining
18Illustrative Example Prediction
19When Patterns Make a Difference
- Neighboring cell probabilities predict (4.1) with
confidence 57, but the patterns predict (5.2)
with confidence 100.
20When Neighboring Probabilities Fail Avoid
cycles and u-turns!
- Cases when predictions with patterns differ from
predictions with neighboring cell probabilities
- Explicitly rule out u-turns (as well as cycles)
in the prediction
21Outline
- Introduction
- Related work
- IncCCFR
- Trajectory representation
- Stream processing model
- Incremental mining of Closed Contiguous Frequent
Routes (CCFR) - CCFR-based trajectory prediction
- Empirical evaluations
22Empirical Evaluation
- Hardware 64bit Ubuntu 12.10 on Intel Core 2 Quad
Q8400 2.66GHz processor and 4GB memory - Data set 6 day sample of 11K taxis in Wuhan,
China (85M records)
- Outlier removal
- Sampling gaps of more the 120 seconds delimit
trips - Linear interpolation of trips between samples
using 100-meter grid cells - Eliminate short trips (less than 300 seconds or
10 grid cells) - ? 2 million trips that have an average length of
1390 seconds and 94 grid cells and refer to 2
billion grid cells
Raw sample vs. interpolated trips
23Evaluation Measure
24Prediction Tests
- Sliding window model t_wsize 60 minutes,
t_wstride 5 minutes - Prediction horizon upto 5 minutes
- Methods
- global neighboring probabilities only, based on
all trips (even future ones!) - g o global cycle prevention
- g ou global cycle and u-turn prevention
- g best best prediction of global
- local neighboring probabilities only, based on
completed trips in the window - l o local cycle prevention
- l ou local cycle and u-turn prevention
- l best best prediction of local
- 60 patterns with min_sup60 neighboring
probabilities, based on completed trips in the
window - 60, 6d same as 60 but with hour-of-day
projection - 60, 4d same as 60 but with hour-of-day and
weekday-weekend projections
25Absolute Prediction Error
- Absolute prediction error (i.e., average grid
cell distance to the predicted and to best grid
cell) of different methods.
26Relative Prediction Error
- Relative prediction error (i.e., percentage
improvement) of different methods w.r.t. the
baseline predictor global.
27Effects of Incremental Mining
- Using 20 minute subwindows the average prediction
errors virtually unchanged compared to method
60.
Trips during 1 hour
Directly mined CCFRs
Incrementally mined CCFRs
28Conclusions and Future Work
- IncCCFR a novel, incremental approach for
managing, mining, and predicting the
incrementally evolving trajectories of moving
object - Essentially a varying order, deterministic Markov
model that is based on closed contiguous frequent
routes and neighboring cell probabilities - Advantages
- Reduced mining and storage costs
- Ability to combine multiple temporally relevant
mining results from the past to capture temporal
and periodic regularities in movement - Future work
- Use pattern combination approach to parallelize
mining - Use current speed historical CCFRs to be able
to react to rare, unpredictable, sudden changes
29- Thank you for your attention!
- Q/A?