Title: Vladimir Kurbalija, Milo
1Analysis of Constrained Time-Series Similarity
Measures
- Vladimir Kurbalija, Miloš Radovanovic, Zoltan
Geler, Mirjana Ivanovic - Department of Mathematics and Informatics
- Faculty of Science
- University of Novi Sad
- Serbia
2Agenda
- Introduction
- Related Work
- Experimental Evaluation
- Computational Times
- The Change of 1NN Graph
- Conclusions and Future Work
3Time Series
- Time-series (TS) consists of sequence of values
or events obtained over repeated measurements of
time - Time-series analysis (TSA) comprises methods that
attempt to understand such time series - To understand the underlying context of the data
points, or to make forecasts
4Applications and Task Types
- Applications
- stock market analysis,
- economic and sales forecasting,
- observation of natural phenomena,
- scientific and engineering experiments,
- medical treatments etc.
- Task Types
- indexing,
- classification,
- clustering,
- prediction,
- segmentation,
- anomaly detection, etc.
5Important Concepts
- Pre-processing transformation,
- Time-series representation
- Similarity/distance measure
6Pre-processing Transformation
- Raw time series usually contain some
distortions - The presence of distortions can seriously
deteriorate the indexing problem - Some of the most common pre-processing tasks are
- offset translation,
- amplitude scaling,
- removing linear trend,
- removing noise etc.
7Time-series Representation
- Time series are generally high-dimensional data
- Many techniques have been proposed
- Discrete Fourier Transformation (DFT)
- Singular Value Decomposition (SVD)
- Discrete Wavelet Transf. (DWT)
- Piecewise Aggregate Approximation (PAA)
- Adaptive Piecewise Constant Approx. (APCA)
- Symbolic Aggregate approX. (SAX)
- Indexable Piecewise Linear Approx. (IPLA)
- Spline Representation
- etc.
8Similarity/distance Measure
- Similarity-based retrieval is used in all a fore
mentioned task types - The distance between time series needs to be
carefully defined in order to reflect the
underlying (dis)similarity (based on shapes and
patterns). - There is a number of distance measures
- Lp distance (Lp) - Eucledian Distance (for p2)
- Dynamic Time Warping (DTW)
- distance based on Longest Common Subsequence
(LCS) - Edit Distance with Real Penalty (ERP)
- Edit Distance on Real sequence (EDR)
- Sequence Weighted Alignment model (Swale) 31,
etc.
9SimilarityMeasures
- Many of these similarity measures are based on
dynamic programming (DTW, LCS, ERP, EDR...) - The computational complexity of dynamic
programming algorithms is quadratic - The usage of global constraints such as the
Sakoe-Chiba band and the Itakura parallelogram
can significantly speed up the calculation of
similarities - The usage of global constraints can improve the
accuracy of classification
10Our Research
- Dynamic Time Warping (DTW) and Longest Common
Subsequence measure (LCS) - the speed-up gained from these constraints
- the change of the 1-nearest neighbor graph with
respect to the change of the constraint size - FAP (Framework for Analysis and Prediction)
http//perun.pmf.uns.ac.rs/fap/ - UCR Time Series Repository http//www.cs.ucr.edu/
eamonn/time_series_data/
11Agenda
- Introduction
- Related Work
- Experimental Evaluation
- Computational Times
- The Change of 1NN Graph
- Conclusions and Future Work
12Euclidean Metric
- Most intuitive metric for time series, and as a
consequence very commonly used - Very fast computation complexity is linear
- Very brittle and sensitive to small translations
across the time axis
13Dynamic Time Warping (DTW)
- Generalization of Euclidian measure
- Allows elastic shifting of the time axis where in
some points time warps - Computes the distance by finding an optimal path
in the matrix of distances of two time series
14Longest Common Subsequence (LCS)
- Different methodology
- Similarity between two time series is expressed
as a length of the longest common subsequence of
both time series
15Global Constraints
- DTW and LCS are based on dynamic programming
the algorithms search for the optimal path in the
search matrix - Global constraints narrow the search path in the
matrix which results in a significant decrease in
the number of performed calculations
16Agenda
- Introduction
- Related Work
- Experimental Evaluation
- Computational Times
- The Change of 1NN Graph
- Conclusions and Future Work
17Quality of Similarity Measures
- Quality of similarity measures is usually
evaluated indirectly - By assessment of different classifier accuracy
- Simple 1-nearest classifier (1NN) gives among the
best results for time-series data - The accuracy of 1NN directly reflects the quality
of a similarity measure - We report the calculation times for unconstrained
and constrained DTW and LCS - We focus on the 1NN graph and its change with
regard to the change of constraints
18Experimental Evaluation
- The unconstrained measure and a measure with the
following constraints 75, 50, 25, 20, 15,
10, 5, 1 and 0 of the size of the time series - Smaller constraints have more interesting
behavior - Set of experiments was conducted on 38 datasets
from UCR Time Series Repository - The length of time series varies from 24 to 1882
depending of the data set - The number of time series per data set varies
from 60 to 9236.
19Computational Times
- The efficiency of calculating the distance matrix
- The distance matrix for one data set is the
matrix where element (i,j) contains the distance
between i-th and j-th time series - The calculation of the distance matrix is a
time-consuming operation - All experiments are performed on AMD Phenom II X4
945 with 3GB RAM
20Computational Times - DTW
21Computational Times - LCS
22Computational Times
- Introduction of global constraints in both
measures significantly speeds up the process of
distance matrix computation - Direct consequence of a faster similarity measure
- It is known for DTW that smaller values of
constraints can give more accurate classification - The average constraint size, which gives the best
accuracy, for all datasets is 4 of the
time-series length - LCS measure is still not well investigated
23The Change of 1NN Graph
- The nearest neighbor graph is a directed graph
where each time series is connected with its
nearest neighbor - graph for unconstrained measures (DTW and LCS)
and for measures with the following constraints
75, 50, 25, 20, 15, 10, 5, 1 and 0 of
the length of time series - The change of nearest neighbor graphs is tracked
as the percentage of time series (nodes in the
graph) that changed their nearest neighbor
compared to the nearest neighbor in the
unconstrained measure
24The Change of 1NN Graph - DTW
25The Change of 1NN Graph - LCS
26The Change of 1NN Graph
- Both measures behave in a similar manner when the
constraint is narrowed - 1NN graph remains the same until the size of the
constraint is narrowed to approximately 20, and
after that the graph starts to change
significantly - All datasets (for both measures) reach high
percentages of difference (over 50) for small
constraint sizes (5-10) - Constrained measures represent qualitatively
different measures than the unconstrained ones
27Agenda
- Introduction
- Related Work
- Experimental Evaluation
- Computational Times
- The Change of 1NN Graph
- Conclusions and Future Work
28Conclusions
- We examined the influence of global constraints
on two most representative elastic measures for
time series DTW and LCS - Through an extensive set of experiments we showed
that the usage of global constraints can
significantly reduce the computation time - We demonstrated that the constrained measures are
qualitatively different than their unconstrained
counterparts - For DTW it is known that the constrained measures
are more accurate, while for LCS this issue is
still open.
29Future Work
- To investigate the accuracy of the constrained
LCS measure for different values of constraints - To explore the influence of global constraints on
the computation time and 1NN graphs of other
elastic measures like ERP, EDR, Swale, etc. - The constrained variants of these elastic
measures should also be tested with respect to
classification accuracy
30Thank you for your attention
- FAP site
- http//perun.pmf.uns.ac.rs/fap/