Vladimir Kurbalija, Milo - PowerPoint PPT Presentation

About This Presentation
Title:

Vladimir Kurbalija, Milo

Description:

Analysis of Constrained Time-Series Similarity Measures VLADIMIR KURBALIJA, MILO RADOVANOVI , ZOLTAN GELER, MIRJANA IVANOVI DEPARTMENT OF MATHEMATICS AND INFORMATICS – PowerPoint PPT presentation

Number of Views:154
Avg rating:3.0/5.0
Slides: 31
Provided by: argoMatf
Category:

less

Transcript and Presenter's Notes

Title: Vladimir Kurbalija, Milo


1
Analysis of Constrained Time-Series Similarity
Measures
  • Vladimir Kurbalija, MiloÅ¡ Radovanovic, Zoltan
    Geler, Mirjana Ivanovic
  • Department of Mathematics and Informatics
  • Faculty of Science
  • University of Novi Sad
  • Serbia

2
Agenda
  • Introduction
  • Related Work
  • Experimental Evaluation
  • Computational Times
  • The Change of 1NN Graph
  • Conclusions and Future Work

3
Time Series
  • Time-series (TS) consists of sequence of values
    or events obtained over repeated measurements of
    time
  • Time-series analysis (TSA) comprises methods that
    attempt to understand such time series
  • To understand the underlying context of the data
    points, or to make forecasts

4
Applications and Task Types
  • Applications
  • stock market analysis,
  • economic and sales forecasting,
  • observation of natural phenomena,
  • scientific and engineering experiments,
  • medical treatments etc.
  • Task Types
  • indexing,
  • classification,
  • clustering,
  • prediction,
  • segmentation,
  • anomaly detection, etc.

5
Important Concepts
  • Pre-processing transformation,
  • Time-series representation
  • Similarity/distance measure

6
Pre-processing Transformation
  • Raw time series usually contain some
    distortions
  • The presence of distortions can seriously
    deteriorate the indexing problem
  • Some of the most common pre-processing tasks are
  • offset translation,
  • amplitude scaling,
  • removing linear trend,
  • removing noise etc.

7
Time-series Representation
  • Time series are generally high-dimensional data
  • Many techniques have been proposed
  • Discrete Fourier Transformation (DFT)
  • Singular Value Decomposition (SVD)
  • Discrete Wavelet Transf. (DWT)
  • Piecewise Aggregate Approximation (PAA)
  • Adaptive Piecewise Constant Approx. (APCA)
  • Symbolic Aggregate approX. (SAX)
  • Indexable Piecewise Linear Approx. (IPLA)
  • Spline Representation
  • etc.

8
Similarity/distance Measure
  • Similarity-based retrieval is used in all a fore
    mentioned task types
  • The distance between time series needs to be
    carefully defined in order to reflect the
    underlying (dis)similarity (based on shapes and
    patterns).
  • There is a number of distance measures
  • Lp distance (Lp) - Eucledian Distance (for p2)
  • Dynamic Time Warping (DTW)
  • distance based on Longest Common Subsequence
    (LCS)
  • Edit Distance with Real Penalty (ERP)
  • Edit Distance on Real sequence (EDR)
  • Sequence Weighted Alignment model (Swale) 31,
    etc.

9
SimilarityMeasures
  • Many of these similarity measures are based on
    dynamic programming (DTW, LCS, ERP, EDR...)
  • The computational complexity of dynamic
    programming algorithms is quadratic
  • The usage of global constraints such as the
    Sakoe-Chiba band and the Itakura parallelogram
    can significantly speed up the calculation of
    similarities
  • The usage of global constraints can improve the
    accuracy of classification

10
Our Research
  • Dynamic Time Warping (DTW) and Longest Common
    Subsequence measure (LCS)
  • the speed-up gained from these constraints
  • the change of the 1-nearest neighbor graph with
    respect to the change of the constraint size
  • FAP (Framework for Analysis and Prediction)
    http//perun.pmf.uns.ac.rs/fap/
  • UCR Time Series Repository http//www.cs.ucr.edu/
    eamonn/time_series_data/

11
Agenda
  • Introduction
  • Related Work
  • Experimental Evaluation
  • Computational Times
  • The Change of 1NN Graph
  • Conclusions and Future Work

12
Euclidean Metric
  • Most intuitive metric for time series, and as a
    consequence very commonly used
  • Very fast computation complexity is linear
  • Very brittle and sensitive to small translations
    across the time axis

13
Dynamic Time Warping (DTW)
  • Generalization of Euclidian measure
  • Allows elastic shifting of the time axis where in
    some points time warps
  • Computes the distance by finding an optimal path
    in the matrix of distances of two time series

14
Longest Common Subsequence (LCS)
  • Different methodology
  • Similarity between two time series is expressed
    as a length of the longest common subsequence of
    both time series

15
Global Constraints
  • DTW and LCS are based on dynamic programming
    the algorithms search for the optimal path in the
    search matrix
  • Global constraints narrow the search path in the
    matrix which results in a significant decrease in
    the number of performed calculations

16
Agenda
  • Introduction
  • Related Work
  • Experimental Evaluation
  • Computational Times
  • The Change of 1NN Graph
  • Conclusions and Future Work

17
Quality of Similarity Measures
  • Quality of similarity measures is usually
    evaluated indirectly
  • By assessment of different classifier accuracy
  • Simple 1-nearest classifier (1NN) gives among the
    best results for time-series data
  • The accuracy of 1NN directly reflects the quality
    of a similarity measure
  • We report the calculation times for unconstrained
    and constrained DTW and LCS
  • We focus on the 1NN graph and its change with
    regard to the change of constraints

18
Experimental Evaluation
  • The unconstrained measure and a measure with the
    following constraints 75, 50, 25, 20, 15,
    10, 5, 1 and 0 of the size of the time series
  • Smaller constraints have more interesting
    behavior
  • Set of experiments was conducted on 38 datasets
    from UCR Time Series Repository
  • The length of time series varies from 24 to 1882
    depending of the data set
  • The number of time series per data set varies
    from 60 to 9236.

19
Computational Times
  • The efficiency of calculating the distance matrix
  • The distance matrix for one data set is the
    matrix where element (i,j) contains the distance
    between i-th and j-th time series
  • The calculation of the distance matrix is a
    time-consuming operation
  • All experiments are performed on AMD Phenom II X4
    945 with 3GB RAM

20
Computational Times - DTW
21
Computational Times - LCS
22
Computational Times
  • Introduction of global constraints in both
    measures significantly speeds up the process of
    distance matrix computation
  • Direct consequence of a faster similarity measure
  • It is known for DTW that smaller values of
    constraints can give more accurate classification
  • The average constraint size, which gives the best
    accuracy, for all datasets is 4 of the
    time-series length
  • LCS measure is still not well investigated

23
The Change of 1NN Graph
  • The nearest neighbor graph is a directed graph
    where each time series is connected with its
    nearest neighbor
  • graph for unconstrained measures (DTW and LCS)
    and for measures with the following constraints
    75, 50, 25, 20, 15, 10, 5, 1 and 0 of
    the length of time series
  • The change of nearest neighbor graphs is tracked
    as the percentage of time series (nodes in the
    graph) that changed their nearest neighbor
    compared to the nearest neighbor in the
    unconstrained measure

24
The Change of 1NN Graph - DTW
25
The Change of 1NN Graph - LCS
26
The Change of 1NN Graph
  • Both measures behave in a similar manner when the
    constraint is narrowed
  • 1NN graph remains the same until the size of the
    constraint is narrowed to approximately 20, and
    after that the graph starts to change
    significantly
  • All datasets (for both measures) reach high
    percentages of difference (over 50) for small
    constraint sizes (5-10)
  • Constrained measures represent qualitatively
    different measures than the unconstrained ones

27
Agenda
  • Introduction
  • Related Work
  • Experimental Evaluation
  • Computational Times
  • The Change of 1NN Graph
  • Conclusions and Future Work

28
Conclusions
  • We examined the influence of global constraints
    on two most representative elastic measures for
    time series DTW and LCS
  • Through an extensive set of experiments we showed
    that the usage of global constraints can
    significantly reduce the computation time
  • We demonstrated that the constrained measures are
    qualitatively different than their unconstrained
    counterparts
  • For DTW it is known that the constrained measures
    are more accurate, while for LCS this issue is
    still open.

29
Future Work
  • To investigate the accuracy of the constrained
    LCS measure for different values of constraints
  • To explore the influence of global constraints on
    the computation time and 1NN graphs of other
    elastic measures like ERP, EDR, Swale, etc.
  • The constrained variants of these elastic
    measures should also be tested with respect to
    classification accuracy

30
Thank you for your attention
  • FAP site
  • http//perun.pmf.uns.ac.rs/fap/
Write a Comment
User Comments (0)
About PowerShow.com