Title: Fair Use Agreement
1Fair Use Agreement
- This agreement covers the use of all slides on
this CD-Rom, please read carefully. - You may freely use these slides for teaching, if
- You send me an email telling me the class
number/ university in advance. - My name and email address appears on the first
slide (if you are using all or most of the
slides), or on each slide (if you are just taking
a few slides). - You may freely use these slides for a conference
presentation, if - You send me an email telling me the conference
name in advance. - My name appears on each slide you use.
- You may not use these slides for tutorials, or
in a published work (tech report/ conference
paper/ thesis/ journal etc). If you wish to do
this, email me first, it is highly likely I will
grant you permission. - (c) Eamonn Keogh, eamonn_at_cs.ucr.edu
2Indexing Large Human-Motion Databases
- Eamonn Keogh, Themis Palpanas Victor B.
Zordan,Dimitrios Gunopulos - University of California, Riverside
- Marc Cardle
- University of Cambridge
3Motion Capture
- records motion data from live actors
4Motion Capture
- records motion data from live actors
- used for data-driven animation
5Motion Capture in Games Industry
Street NBA
Madden
6Motion Capture in Movie Industry
Troy
Lord of the Rings
7Motivation
- motion capture data
- segmented in short sequences, stored in motion
libraries - composed to create long, realistic motion
sequences - important to find similar sequences
- form pool of similar sequences
- choose the most promising, to continue the motion
8Motivation
- Dynamic Time Warping (DTW)
- Considers only local adjustments in time, to
match two time series - However sometimes global adjustments are required
- DTW is being extensively used
- uniform scaling is complementary
- combination of both techniques offers rich,
high-quality result set
Uniform Scaling
DTW
9Uniform Scaling
- time series
- query, Q, length n
- candidate, C, length m (mgtn)
10Uniform Scaling
- time series
- query, Q, length n
- candidate, C, length m (mgtn)
- stretch Q to length p (npm) Qp
- Qpj Qjn/p, 1 j p
- scaling factor, sf p/n
- max scaling factor, sfmax m/n
Qp
11Problem Statement
- given
- time series, Q
- database of candidate time series, D
- find argminp dist(Qp, D )
- dist(Qp, D ) Euclidean Distance between time
series
12Problem Statement
- given
- time series, Q
- database of candidate time series, D
- find argminp dist(Qp, D )
- dist(Qp, D ) Euclidean Distance between time
series
- challenges
- quickly solve the problem for two time series
- extend solution to scale-up to large time series
databases
13Outline
- Speeding Up Search
- Scaling Up To Large Databases
- Experimental Evaluation
- Related Work
- Conclusions
14Best Uniform Scaling Match
- brute force algorithm
- for each time series in D
- for each sf, 1 sf sfmax
- compute distance between the two time
series - find the best overall match
- time complexity O(D(m-n))
- extremely expensive!
15Lower Bounding Uniform Scaling
- lower bound distance between two time series,
- for any sf, 1 sf sfmax
- desiderata
- fast to compute
- tight bound
- results in fast pruning of candidates that are
guaranteed not to belong to the solution - compute distance only for time series not pruned
by lower bound
16Lower Bounding Uniform Scaling
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
C
m 100
0
10
20
30
40
50
60
70
80
90
100
17Lower Bounding Uniform Scaling
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
- build envelopes, length 80
U
n 80
Ui max( C ?(i-1)m/n? 1,, C ?im/n? )
Li min( C ?(i-1)m/n? 1,, C ?im/n? )
L
0
10
20
30
40
50
60
70
80
90
100
18Lower Bounding Uniform Scaling
Q
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
- build envelopes, length 80
Ui max( C ?(i-1)m/n? 1,, C ?im/n? )
Li min( C ?(i-1)m/n? 1,, C ?im/n? )
0
10
20
30
40
50
60
70
80
90
100
19Lower Bounding Uniform Scaling
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
- build envelopes, length 80
Ui max( C ?(i-1)m/n? 1,, C ?im/n? )
Li min( C ?(i-1)m/n? 1,, C ?im/n? )
0
10
20
30
40
50
60
70
80
90
100
20Lower Bounding Uniform Scaling
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
- compute lower bound
0
10
20
30
40
50
60
70
80
90
100
21Envelope Indexing
- dimensionality of envelopes is high
80 points
0
10
20
30
40
50
60
70
80
90
100
22Envelope Indexing
- dimensionality of envelopes is high
- reduce dimensionality by approximating them
- Piecewise Constant Approximation
8 points
0
10
20
30
40
50
60
70
80
90
100
23Envelope Indexing
- dimensionality of envelopes is high
- reduce dimensionality by approximating them
- Piecewise Constant Approximation
- assume query Q, length 80
Q
0
10
20
30
40
50
60
70
80
90
100
24Envelope Indexing
- dimensionality of envelopes is high
- reduce dimensionality by approximating them
- Piecewise Constant Approximation
- assume query Q, length 80
- we approximate it with 8 points
0
10
20
30
40
50
60
70
80
90
100
25Envelope Indexing
- dimensionality of envelopes is high
- reduce dimensionality by approximating them
- Piecewise Constant Approximation
- assume query Q, length 80
- approximated with 8 points
- compute approximation of
- lower bound
0
10
20
30
40
50
60
70
80
90
100
26Algorithms for Secondary Storage
- use a multidimensional index
- VA-file -gt FastScan algorithm
- R-tree -gt RtreeProbe algorithm
- 2-pass algorithms
- 1. scan approximated envelopes,
- prune search space
- 2. find exact answer using original series
27Outline
- Speeding Up Search
- Scaling Up To Large Databases
- Experimental Evaluation
- Related Work
- Conclusions
28Datasets Used
- motion capture
- data from 124 sensors placed on human actors
- mixed bag
- time series coming from
- medicine, manufacturing, environmental
monitoring, economics, sensor data - experimented with time series databases of
- size 5,000 80,000
- time series length 64 1,024 points
29Main Memory Experiments
- assume database fits in memory
- measure pruning power
- fraction of times each approach calls distance
function - our technique
- 1 order of magnitude
- faster than CD-criterion
-
30Main Memory Experiments
brute force
- assume database fits in memory
- measure pruning power
- fraction of times each approach calls distance
function - our technique
- 1 order of magnitude
- faster than CD-criterion
- 3 orders of magnitude
- faster than brute force
31Disk-Based Experiments
- comparison of
- brute force
- FastScan
- RtreeProbe
32Disk-Based Experiments
- comparison of
- FastScan
- RtreeProbe
33Disk-Based Experiments
- comparison of
- FastScan
- RtreeProbe
34Case Study
35Outline
- Speeding Up Search
- Scaling Up To Large Databases
- Experimental Evaluation
- Related Work
- Conclusions
36Related Work
- Dynamic Time Warping (DTW)
- Yi Faloutsos00Keogh02Zhu
Shasha03Fung Wong03 - Longest Common SubSequence (LCSS)
- Das et al.97Vlachos et al.03
- uniform scaling
- Argyros Ermopoulos03
37Outline
- Speeding Up Search
- Scaling Up To Large Databases
- Experimental Evaluation
- Related Work
- Conclusions
38Conclusions
- studied utility of uniform scaling similarity
matching - applications in
- motion capture libraries, music retrieval,
historical handwritten archives - introduced first lower bounding technique
- proposed indexing method for bounding envelopes
- suitable for very large time series databases
- experimentally evaluated efficiency of technique
- demonstrated quality of results with real motion
capture data
39Outline
40Lower Bounding Uniform Scaling
- assume
- candidate C, length 100
- query Q, length 80
- wish to find best match for any
- scaling of Q between 80-100
- build envelopes, length 80
Ui max( C ?(i-1)m/n? 1,, C ?im/n? )
Li min( C ?(i-1)m/n? 1,, C ?im/n? )
0
10
20
30
40
50
60
70
80
90
100