Title: Visually Mining and Monitoring Massive Time Series
1Visually Mining and Monitoring Massive Time Series
Lin, J, Keogh, E., Lonardi, S., Lankford, J.P.
and Nystrom, D.M.In Proceedings of the 10th ACM
SIGKDD International Converence on Knowledge
Discovery and Data Mining, 2004.
- Amy Karlson
- V. Shiv Naga Prasad
- 15 February 2004
- CMSC 838S
Images courtesy of Jessica Lin and Eamonn Keogh
2What are Time Series?
- Simply
- Observations of a variable made over time
- Typical across a wide variety of domains
- Medicine
- Physiology
- Finance
- Microbiology
- Meteorology
- Surveillance
3Motivation Critical Decision Making
- Domains
- Spacecraft Launch
- Medicine
- Research Directions
- Mining Archives
- Extract rules, patterns, regularities
- Visualizing Streams
- Novel visualization and interaction for
- Query by content
- Motif discovery
- Anomaly detection
4Some Visual Time Series Systems
- Time Searcher
- Direct Manipulation Pattern Query
- Theme Rivers
- Theme strengthover time
- Spirals
- Periodic Data withknown period
Hochheiser and Shniederman
dot.com stocks 1999-2002
Havre, Hetzler, Whitney Nowell InfoVis 2000
Weber et. al
5VizTree
- Construct a subsequence tree to span the space of
subsequences of a given time series. - Use this to collect statistics about the series.
- Size of the structure is independent of the
length of the series.
6VizTree Approach - Overview
- Place windows along the time series to obtain
subsequences. - Quantize along time and value dimension to obtain
sequences of discrete symbols. - Construct a subsequence tree to represent all
possible such sequences. - Collect frequencies of traversal of the branches
of the subsequence tree. - Use these for motif and anomaly detection, and
for comparing time series.
7Subsequences
Place windows along the time series to obtain
subsequences.
8Discretization
- Subsequences are patterns.
- Take windows along time series
- length of window length of subsequence.
- Discretize the range of data - one symbol for
each quantum. - Divide window into segments represent one
segment with one symbol.
9Symbolic Aggregate approXimation(SAX)
Representative symbols
Quantization levels
Segments
One subsequence
Discrete version acdcbdba
10Subsequence Tree - example
a
b
- symbolsa,b,c
- segments per window2
- Tree spans the space of subsequences.
- Branch factor symbols (size of alphabet)
- Depth segments per window
- Branch thickness freq. of occurrence of
subsequence.
a
c
a
b
b
c
a
c
b
c
11VisTree Tool
Demo
12Query by Content Subsequence Matching
- Finding known patterns
- Chunking
- Breaking a time series into individual series
- Methods
- Time (e.g. power usage)
- Shape (e.g. heart beats)
- Search Approaches
- Exact - Slow
- Approximate - Fast
- Exploration
- Hypothesis Testing
---------
VizTree
---------------------
VizTree
13Motif Discovery
- Finding unknown patterns
- Not exact matches
- VisTree allows exploration at varying levels of
precision - E.g., cc vs. ccac
14Anomaly Detection
- Finding abnormal patterns.
- Use data already seen to identify anomalies
- Identified by thin branches
15Comparing Series Diff Tree
- Same parameters ? same tree structure
- Compare the test branch frequencies with respect
to reference branch frequencies - Blue underrepresented
- Green overrepresented
- Red equivalent
- Thickness magnitude
16Thoughts on VizTree (Vis.)
- Most of discovery is implicit
- Manual search
- Parameter setting might be an issue
- Automation might help
- Tree Visualization
- Use of real estate?
- Effective?
- Intuitive?
- Alternatives?
17Thoughts on VizTree (HCI)
- Primarily a tool to for researchers now
- (Also, we might have an outdated version)
- Even so, some HCI suggestions
- Indication of how tree detail relates to tree
overview - Zoom into a specific area of the time series
(rather than zoomscroll) - Selection in subsequence detail relates to
subsequence overview - Unfortunately, least interesting patterns are
most easily accessed (branches at root) - snap to branch or snap to intersection ?
- Ability to turn off highlighting (undo)
18Summary Unique Contributions
- Fundamental support for aperiodic series
- Scalable
- Resource requirements do not grow linearly with
length series - Rich visual feature set
- Global summaries
- Diff-trees between multiple series
- Local patterns and anomalies