Title: Stream Hierarchy Data Mining for Sensor Data
1Stream Hierarchy Data Mining for Sensor Data
- Margaret H. Dunham
- SMU
- Dallas, Texas 75275
- mhd_at_engr.smu.edu
- Vijay Kumar
- UMKC
- Kansas City, Missouri 64110
- kumarv_at_umkc.edu
2From Sensors to Streams An Outline
- Data Stream Overview
- Data Stream Visualization
- Temporal Heat Map
- Data Stream Modeling
- Extensible Markov Model
- Data Stream Hierarchy
3From Sensors to Streams An Outline
- Data Stream Overview
- Data Stream Visualization
- Temporal Heat Map
- Data Stream Modeling
- Extensible Markov Model
- Data Stream Hierarchy
4From Sensors to Streams
- Data captured and sent by a set of sensors is
usually referred to as stream data. - Real-time sequence of encoded signals which
contain desired information. It is continuous,
ordered (implicitly by arrival time or explicitly
by timestamp or by geographic coordinates)
sequence of items - Stream data is infinite - the data keeps coming.
5Data Stream Management Systems (DSMS)
- Software to facilitate querying and managing
stream data. - Retrieve the most recent information from the
stream - Data aggregation facilitates merging together
multiple streams - Modeling stream data to summarize stream
- Visualization needed to observe in real-time the
spatial and temporal patterns and trends hidden
in the data.
6DSMS Problems
- Stream Management development in state similar to
that of databases prior to 1970s - Each system/researcher looks at specific
application or system - No standards concerning functionality
- No standard query language
- Unreasonable to expect end users will access raw
data, data in the DSMS, or even data at a
summarized view - Domain experts need to see a higher level of
data
7Our Proposal
- Four level data abstraction to facilitate the
creation of actionable intelligence for domain
experts evaluating sensor data.
8From Sensors to Streams An Outline
- Data Stream Overview
- Data Stream Visualization
- Temporal Heat Map
- Data Stream Modeling
- Extensible Markov Model
- Data Stream Hierarchy
9Assumptions for Our Research
- End User
- May not be knowledgeable concerning sensors
- Probably a Domain Expert
- May not need to see exact sensor values
- Concerned with trends and approximate values
- Need to see data from MANY sensors at one time
- Need to see data continuously in a visualization
of the stream
10Suppose There Were MANY Sensors
- Traditional line graphs would be very difficult
to read - Requirements for new visualization technique
- High level summary of data
- Handle multiple sensors at once
- Continuous
- Temporal
- Spatial
11Temporal Heat Map
- Also called Temporal Chaos Game Representation
(TCGR) - Temporal Heat Map (THM) is a visualization
technique for streaming data derived from
multiple sensors. - It is a two dimensional structure similar to an
infinite table. - Each row of the table is associated with one
sensor value. - Each column of the table is associated with a
point in time. - Each cell within the THM is a color
representation of the sensor value - Colors normalized (in our examples)
- 0 While
- 0.5 Blue
- 1.0 - Red
12Cisco Internal VoIP Traffic Data
- Complete Stream CiscoEMM.png
- VoIP traffic data was provided by Cisco Systems
and represents logged VoIP traffic in their
Richardson, Texas facility from Mon Sep 22
121732 2003 to Mon Nov 17 112911 2003.
13Derwent River (UK)
Derwent Temporal Heat Map derwentrotate.png
14From Sensors to Streams An Outline
- Data Stream Overview
- Data Stream Visualization
- Temporal Heat Map
- Data Stream Modeling
- Extensible Markov Model
- Data Stream Hierarchy
15Data Stream Modeling Requirements
- Summarization (Synopsis )of data
- Use data NOT SAMPLE
- Temporal and Spatial
- Dynamic
- Continuous (infinite stream)
- Learn
- Forget
- Sublinear growth rate - Clustering
16Extensible Markov Model
- Extensible Markov Model (EMM) at any time t, EMM
consists of a Markov Chain with designated
current node, Nn, and algorithms to modify it,
where algorithms include - EMMCluster, which defines a technique for
matching between input data at time t 1 and
existing states in the MC at time t. - EMMIncrement algorithm, which updates MC at time
t 1 given the MC at time t and clustering
measure result at time t 1. - EMMDecrement algorithm, which removes nodes from
the EMM when needed. - Â In addition, the EMM has associated Data Mining
functions such a Rare Event Detection and
Prediction - Jie Huang, Yu Meng, and Margaret H. Dunham,
Extensible Markov Model, Proceedings IEEE ICDM
Conference, November 2004, pp 371-374.
17EMM Learning
- lt18,10,3,3,1,0,0gt
- lt17,10,2,3,1,0,0gt
- lt16,9,2,3,1,0,0gt
- lt14,8,2,3,1,0,0gt
- lt14,8,2,3,0,0,0gt
- lt18,10,3,3,1,1,0.gt
18EMM Forgetting
19EMM Sublinear Growth Rate
Minnesota Department of Transportation (MnDot)
20From Sensors to Streams An Outline
- Data Stream Overview
- Data Stream Visualization
- Temporal Heat Map
- Data Stream Modeling
- Extensible Markov Model
- Data Stream Hierarchy
21Traditional DBMS Data Abstraction
- Three levels of data abstraction
- Physical,
- Logical
- External
- Data is normally pulled to the user by a query
22Proposed DSMS Data Abstraction
- Abstraction
- Level 0 - Physical Level
- Raw data from sensors
- Cannot be stored
- Level 1 DSMS
- Sensor data is merged, aggregated, and cleansed.
- DSMS queries may be processed against this data.
- Level 2 Model
- Summarization (Synopsis )of data
- Level 3 Domain Expert
- Summary Visualization
- Data is normally pushed to the user
23Levels Lowest Level Highest Level Abstraction Inter-level Data Migration
Memory Hierarchy n External Storage Subset/Cache/Buffer Fetch/Prefetch
DBMS Data Hierarchy 3 Physical Storage External View Fetch, Prefetch
Data Warehouse n Operational Data Cube/ Multidimensional View Aggregation
Stream Hierarchy 4 Sensor Data Visualization/Triggers Automatic Push
24(No Transcript)
25Stream Hierarchy Summary
- Except for the inter-level functionality
requirements, each level functionality is
independent of the others and may differ across
different implementations. - The model used must capture time and ordering of
data, be able to both learn and forget, and use
some variation of clustering. - Visualization at the domain expert level must
capture both time and ordering. It addition it
should be able to be easily read for many sets
of sensors.