Title: AIMS: An Immersidata Management System
1AIMS An Immersidata Management System
- Cyrus Shahabi
- Computer Science Department
- Integrated Media Systems Center
- University of Southern California
- Los Angeles, CA 90089-0781
- shahabi_at_usc.edu
- http//infolab.usc.edu
2Outline
- Definitions and Motivating Applications
- Immersive Data Types (focus immersidata)
- AIMS Architecture
- Subsystems Acquisition, Storage Querying
- Current Status (demo, if time permits)
- Conclusion and Future Work
3Immersive Environments
- Immersive Environments allow a user to become
immersed within an augmented or virtual reality
environment in order to interact with people,
objects, places, and databases. - Examples
- Office of the Future (UNC)
- Fire Fighter Training System (Georgia Tech)
- Planetary Exploration (JPL)
- Physical/Occupational Therapy System (Haifa
Univ.) - Virtual Classroom and Office (USC IMSC)
- Haptic Museum (USC IMSC)
- MRE Mission Rehearsal Exercise (USC ICT)
4Thesis (1)
- It is absolutely critical to understand the data
generated by and for immersive environments - For example, from the data acquired from a users
interactions with an immersive environment (i.e.,
immersidata), we can learn about the users
behavior to - Study human factor issues
- Measure the effectiveness of the environment
- Customize the information delivery
- Identify pitfalls in the system
- Better understand the users intentions
- Improve the system performance
- For immersive and multimedia community!
- For database community
- Immersive sensors are the user interfaces of the
future as a research community we should study
their generated data or we will miss the boat.
5Example Immersive Sensor Data Streams
ltSi, x, y, z, t, vgt
6Application (1) Immersive Sensor Pattern
Recognition On-Line Query Analysis
Recognition System
DB of Labeled Patterns
Immersive environment
7Application (1) American Sign Language (ASL) as
well-defined patterns
1. User makes ASL signs w/ a glove
4. ASL signs recognized
Acquisition Module
2. Sensor values sampled over time
- Recognition modules
- SVD
- Bayesian Classifiers
- Neural Net
Spatio-Temporal (moving sensors) Query Evaluation
3. Semantic description of hand
8Application (1) ASL On-Line QA
- On-Line query and analysis challenges
- A hand sign is composed of a sequence of data
samples across multiple sensor streams - A sequence for one sign has no fixed length
(i.e., cant tell when one ends and the other
starts!)
- An example statement in American Sign Language
(ASL)
shoes
I
yellow
- Two problems (chicken egg-problem) with
interdependent solutions should be addressed - Isolate signs
- Recognize the isolated sign
9Application (2) Immersive ClassroomOff-Line
Query Analysis
- Study attention performance for Normal
ADHD-Diagnosed Children - A classroom as a virtual environment (virtual
students, a virtual teacher, desks, a blackboard,
a window to the playground, doors) - Presence of distracters
- Paper airplane
- Ambient classroom noise
- Students walking
- Cars passing outside, visible through the window
10Application (2) IC Off-Line QA
- User, wearing HMD, is immersed into the class
- Trackers monitor body movements and stream data
to the database - Task pressing a button when a particular letter
pattern is seen on the virtual blackboard (e.g.,
AX)
Displayed Characters
DB
Head sensor data
Arm sensor data
Leg sensor data
Mouse Clicks
Distracters
11Application (2) IC Off-Line QA
- Off-line query and analysis
- Range-sum queries
- Sum of body movements
- Average reaction time to the patterns
- Number of correct hits
- Classification and clustering
- Use a classification technique to differentiate
between normal and ADHD-diagnosed subjects (e.g.,
SVM) - Distinguishing hyperactive kids from normal by
automatically analyzing tracker data major
impact in psychotherapy, able to discriminate and
specify diagnosis in a manner not possible using
existing traditional methods
12Thesis (2)
- Immersive applications in training and simulation
domains, share common data storage and analysis
requirements (i.e., dealing w/ sensor data
streams, aka immersidata) - Hence, instead of building customized systems for
the acquisition, storage and querying needs of
each immersive application, one can design a
general-purpose system addressing many of the
shared requirements
13Common Data Components of Immersive Environments
ACM-ITP02
- User (subject(s))
- Virtual Space
- Actor Objects
- Mission (task objective)
- Immersive Data Types
- Conventional Data user data
- Spatio-Temporal Data immersive space/time data
- Immersidata Sensor Data Streams
14Focus Immersidata MIS99
- Data acquired from users interaction with the
immersive environment - Subject body positions
- Subject recognized gestures
- Can be analyzed to learn about users behavior
- Specifications
- Multidimensional ltsi, x, y, z, t, vgt
- Spatio-Temporal
- Continuous Data Streams (CDS)
- Potentially large in size and bandwidth
requirements - Noisy
, ltsn,xn,yn,zn,hn,pn,rn,tngt, ,
,lts1,x1,y1,z1,h1,p1,r1,t1gt,
15AIMS An Immersidata Management System
3. User interaction module
Application-specific GUI
Pattern isolation heuristic
1. Acquisition module
Pattern matching SVD-based measure
DWPT basis selection for each dimension
Sensor Data Streams
Transformation
4. Query analysis module
2. Storage module
ProPolyne web services
Wavelets packing into disk blocks or DB BLOBS
Immersidata storage (file-system OR-DBMS)
16Challenges of AIMS Subsystems
- Acquisition SIGMETRICS01,ICME02
- Data should be filtered and transformed (similar
to signals) - Database friendly signal processing techniques
are required - Storage SIGMOD03?
- Physical level of storage system should be
designed to store transformed data (e.g., wavelet
coefficients) - Block allocation strategies considering query
patterns - Offline Query and Analysis EDBT02.PODS02
- Approximate, progressive, and efficient
polynomial analytical query on large amount of
multidimensional data - Online Query and Analysis MMM03
- Common challenges with querying continuous data
streams - Real-time pattern recognition on aggregation of
multiple data streams that are incrementally
completing - Data from all streams form the meaningful data
171. Acquisition Module
Approaches
- INPUT Multidimensional streams
- OUTPUT Wavelet coefficients
- Receive multidimensional sensor streams
- In real-time selects different basis per
dimension (optimally) from the DWPT (Discrete
Wavelet Packet Transforms) library - Applies multidimensional transformation to data
(generates multi-resolution representations of
data) - NOTE no compression is applied, no data will be
lost by this process
182. Storage Module
Approaches
- INPUT Wavelet coefficients
- OUTPUT disk blocks
- metadata records
- Optimally packs related wavelet coefficients into
disk blocks (to reduce future I/O cost) and store
them in the file system or within OR-DBMS - Includes corresponding disk blocks info into the
DBMS (Database Management System) for future
queries
19Optimal Disk Placement for Wavelet
DataDependency Graph (Haar wavelets)
20Optimal Disk Placement for Wavelet DataTiling -
Blocking (Haar wavelets)
213. User Interaction Module
Approaches
- INPUT Camera/speech/tracker/immersive-sensor
- OUTPUT application commands and queries
- user profile/state and application
context
- Receives data from various input-devices (beyond
keyboard and mouse) used by the user (e.g., for
data visualization purposes) - Understands the set of requested actions (SVD
mutual-information) - Translate actions to application-specific
commands and/or database queries (takes
user-profile context into account) - Also stores a history of users interactions to be
mined off-line and/or on-line to extract user
state/behavior and application context to
facilitate future interactions by the same user
(e.g., personalization/customization)
224. Query Analysis Module
Approaches
- INPUT Range and point queries
- OUTPUT Aggregate values/Integrated events
- Transforms queries into a consistent wavelet
domain as of data - Performs queries efficiently (and perhaps
approximately or progressively) in the wavelet
domain - Displays the correct resolution/granularity of
aggregate value(s) and/or events to the user
based on user profile (e.g., tolerable latency
time) and/or system requirements and/or data
availability - An event is tagged with space (e.g., latitude,
longitude and altitude), time and bag of
attributes
23AIMS Main Theme Data Manipulation, Query
Analysis in the WAVELET Domain
- Main idea/distinction storage is cheap and
queries are ad-hoc lets keep all the wavelet
coefficients! (no data compression) - Intuition At the data population time, we dont
know which coefficients are more/less important - Different than the signal-processing objective to
reconstruct the entire signal as good as possible - This has been observed by Garofalakis Gibbons,
SIGMOD02, but they proposed other ways to drop
coefficients assuming a uniform workload - Opportunity At the query time, however, we have
the knowledge of what is important to the pending
query
24AIMS Main Theme QA of Wavelets
- Define range-sum query as dot product of query
vector and data vector (also observed by Gilbert
et. al, VLDB2001 but no query transformation) - Offline Multidimensional wavelet transform of
data - At the query time lazy wavelet transform of
query vector (very fast) - Dot product of query and data vectors in the
transformed domain ? exact result - Choose high-energy query coefficients only ? fast
approximate result (90 accuracy by retrieving lt
10 of data) - Choose query coefficients in order of energy ?
progressive result
25Progressive Evaluation of Vector Queries
26Current Status ProPolyne Demonstration
27AIMS with a Twist!
ltx, y, z, t, valuegt Remote Sensor Data
Streams ltlat, long, altitude, t, temperaturegt
3. User interaction module
Application-specific GUI
Pattern isolation heuristic
Pattern matching SVD-based measure
1. Acquisition module
DWPT basis selection for each dimension
Transformation
4. Query analysis module
2. Storage module
ProPolyne web services
Wavelets packing into disk blocks or DB BLOBS
Sensor Data storage (file-system DBMS)
28Conclusion and Future Work
- A new application domain, immersive applications,
and one of its data set, immersidata, were
introduced - Database challenges involved in managing
immersidata discussed - Some direct adoption of the typical database
research techniques (e.g., OLAP) - Some modifications/extensions of the current
research contributions (e.g., in the area of data
streams) that are not applicable immediately - The design of AIMS, an innovative data systems
architecture, were reported - Future Work
- I/O efficient ways for Wavelet transformation and
incremental update - Hybrid sorting of both data and query
coefficients - Prototypical implementation of an end-to-end
application using AIMS - Performance evaluation
29Application (3) Physical/Occupational Therapy
Both On-Line and Off-Line QA
- Rehabilitation research using virtual
environments and gaming technologies - Enables individuals with severe physical
disabilities to use their residual motor
abilities in more efficient and less fatiguing
ways - Patient watches her video projected on a 2-d
virtual environment - Video cameras track body movements
- Animated target characters are manipulated within
the environment - Patient is asked to hit the targets to gain more
score - Potential data analysis tasks
- Offline analysis of user performance in order to
find specific motor disabilities - Online analysis of body movements to add more
targets in the directions which need more
exercises
30 31Haptic Data Acquisition SIGMETRICS01
- Temporal aspect the rate of which the values of
sensors should be sampled? - Trade-off between accuracy bandwidth
utilization - Fixed Sampling
- Sampling at a constant rate max value of speed
is a function of system speed and/or haptic glove - Group Sampling
- Intuitive grouping of sensors different sampling
rate for each group - Adaptive Sampling
- Dynamic sampling within a window of session,
every sensor sampled at an individual optimal
rate
32ProPolyne Features
- Measure can be any polynomial on any
combination of attributes - Can support COUNT, SUM, AVERAGE
- Also supports Covariance, Kurtosis, etc.
- All using one set of pre-computed aggregates
- Independent from how well the data set can be
compressed/approximated by wavelets - Because We show range-sum queries can always
be approximated well by wavelets (not always HAAR
though!) - Low update cost O(logd N)
- Can be used for exact, approximate and
progressive range-sum query evaluation
33Polynomial Range-Sum Queries
- Polynomial range-sum queries Q(R,f,I)
- I is a finite instance of schema F
- R SubSetOf Dom(F), is the range
- f Dom(F) ? R is a polynomial of degree d
34Polynomial Range-Sum Queries as Vector Queries
- The data frequency distribution of I is the
function DI Dom(F) ? Z that maps a
point x to the number of times it occurs in I - To emphasize the fact that a query is an
operator on the data frequency distribution, we
write - Example D(25,50)D(28,55)D(57,120)1 and
D(x)0 otherwise. -
35Overview of Wavelets
H operator computes a local average of array a
at every other point to produce an array of
summary coefficients Ha Example (Haar)
h1/2,1/2
G operator measures how much values in the array
a vary inside each of the summarized blocks to
compute an array of detail coefficients
Ga Example (Haar) g1/2,-1/2
aka wavelet coefficients of a
36Naive Evaluation of Vector Queries Using Wavelets
- Hence, vector queries can be computed in the
wavelet-transformed space as - Algorithm
- Off-line transformation of data vector (or data
distribution function, i.e., D, to be exact) - O (IldlogdN) for sparse data, O (I) Nd for
dense data - Transform the query vector at submission
- O (Nd) !
- Sum-up the products of the corresponding elements
of data and query vectors - Retrieving elements of data vector O (Nd) !
37Fast Evaluation of Vector Queries Using Wavelets
- Main intuitions
- query vector can be transformed quickly because
most of the coefficients are known in advance - Transformed query vector has a large number of
negligible (e.g., zero) values (independent on
how well data can be approximated by wavelet) - Example Haar filter COUNT function on R5,12
on the domain of integers from 0 to 15
GH3a
H4a
At each step, you know the zeros
38Exact Evaluation of Vector Queries
Query SUM(salary) when (25 lt age lt 40) (55k
lt salary lt 150k)
of Wavelet Coefficients 837
of Nonzero Coordinates 4380
39Approximate Evaluation of Vector Queries
40Optimal Disk Placement for Wavelet Data
- The goal is to efficiently store wavelet
coefficients - Efficiently means fast access to stored data, low
I/O complexity, little disk access - How to achieve this create a principle of
locality of reference - Designed for wavelet overlap queries, but can be
extended for polynomial range-sum queries over
multidimensional data
41Optimal Disk Placement for Wavelet DataDiscrete
Wavelet Transform
42SVD Background
- The idea of SVD is based on the following theorem
of linear algebra - If matrix , then there exist
column-orthonormal matrices U and V such that
where and ,
and is a diagonal matrix
such that
43Weighted-Sum SVD
- Each data sequence could be represented as a
matrix, where the columns (r) are the sensors and
hence their is fixed - The similarity metric of two data sequences is
defined on the square matrices - To eliminate the effect that the number of rows
(i.e., the time dimension) in the two matrices
are different (i.e., multiply the matrix by its
transpose matrix)
44Weighted-Sum SVD
45Weighted-Sum SVD
46The Ridge-Climbing Heuristic
- Procedure
- Compute the accumulated similarity values (ASVs)
between the input sequence and all vocabulary
sequences - Keep track of all ASVs
- For each vocabulary sequence, check whether the
ASV is monotonically increasing, and whether a
maximum is reached - Yes put this vocabulary into the candidates pool
- Choose the vocabulary from the candidates pool
with biggest maximal value - Isolate the recognized stream
47The Ridge-Climbing Heuristic
Assume the database only has three vocabulary
sequence, like, yellow, and I.
Input sequence