Title: Learning Stable Multivariate Baseline Models for Outbreak Detection
1Learning Stable Multivariate Baseline Models for
Outbreak Detection
- Sajid M. Siddiqi, Byron Boots, Geoffrey J.
Gordon, Artur W. Dubrawski - The Auton Lab School of Computer Science
Carnegie Mellon University
presented by Robin Sabhnani from the Auton Lab
This work was partly funded by NSF grant
IIS-0325581 and CDC award R01-PH000028
2Motivation
- Lots of health-related data available
- Much of this data is temporal
- Many data sources are also multivariate
3Motivation
- When detecting anomalies, the crucial information
could be hidden in the dynamics of the data as
well as the interaction between different data
streams - Our goal Learn good models for simulating
baseline data for use in training algorithms as
well as detecting anomalies - Linear Dynamical Systems are a good choice
4Outline
- Linear Dynamical Systems
- Learning Stable Models
- Experimental Setup
- Results
- Conclusion
5Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
6Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
7Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
- Dynamics matrix A models temporal evolution
8Linear Dynamical Systems (LDS)
hidden variables (low-dimensional)
. . .
X1
X2
Xt
Xt1
Y1
Y2
Yt
Yt1
observed data (high-dimensional)
- Dynamics matrix A models temporal evolution
- Multivariate Gaussian noise vt , wt models
interaction between streams
9Linear Dynamical Systems
- The Good
- Linear Dynamical Systems (aka State-Space models,
aka Kalman Filters) are a generalization of ARMA
models and can represent a wide range of time
series - LDS parameters can be learned from data
- The Bad
- LDSs learned from data are often unstable
- Simulation from an unstable LDS degenerates
10Stability
- Stability of an LDS depends on its dynamics
matrix A - Let ?1,,?n be the eigenvalues of A in
decreasing order of magnitude - A is stable if ?1 lt 1
- Constraining ?1 during learning is hard
- We devise an iterative optimization method that
beats previous approaches in efficiency and
accuracy
A Constraint Generation Approach to Learning
Stable Linear Dynamical Systems, S. Siddiqi, B.
Boots, G. Gordon, NIPS 2007
11Stability
- Learning stable LDS models allows us to
- Compress large temporal multivariate datasets
- Generate realistic data sequences
- Predict the future given some data
- Deviations from predicted data indicate anomalies
12Experimental Setup
- Data
- OTC drug sales data for 22 categories in 29
Pittsburgh zip codes over 60 days - track all zipcodes for cough/cold category
(multi-zipcode data) - track all drug-categories for city of pittsburgh
(multi-drug data) - Experiments
- Learn a LDS model using first 15 days, and
- Simulate a sequence (qualitative task)
- Reconstruct state sequence (quantitative task)
- Predict future occurrences (quantitative task)
- Algorithms
- Constraint Generation (our method),
- LB-1 (state of the art stability algorithm),
- Least Squares (naïve, no stability guarantees)
Subspace Identification with guaranteed
stability using subspace identification, S. Lacy
and D. Bernstein, ACC 2002
13Data Simulations
- Instability causes Least-Squares simulations to
diverge - Constraint Generation yields most realistic
simulations that are also stable
14State Reconstruction
- Obtained by computing the residual ?t Axt
xt12 , where xt are the estimated states - Least squares has the best score by definition,
since it is learned by regression on xt?xt1, but
at the cost of instability
- Stable methods trade off reconstruction error vs.
stability - Constraint Generation learns the most accurate
models that are also stable
15Prediction (preliminary results)
- Average prediction error obtained by tracking
(filtering) up to time t, then simulating upto
time t and calculating the sum of squared error,
and averaging this over all t and t gt t
- Stable methods yield superior results to least
squares
16Conclusion
- Linear Dynamical Systems effective at modeling
multivariate time series data - Stability crucial for accurate performance
- Superior performance of stable methods in
baseline generation and prediction on OTC data - Constraint Generation learns a more accurate
model with more realistic simulations, most
efficiently. Further work needed on prediction
accuracy metric.
A Constraint Generation Approach to Learning
Stable Linear Dynamical Systems, S. Siddiqi, B.
Boots, G. Gordon, NIPS 2007
17Thank You! Questions?
- further questions to siddiqi_at_cs.cmu.edu