Title: On Detecting Disease Outbreaks: Univariate and Multivariate Monitoring
1On Detecting Disease OutbreaksUnivariate and
Multivariate Monitoring
- Inbal Yahav
- With Galit Shmueli and Thomas Lotze
- The work was partially supported by NIH grant
RFA-PH-05-126.
2Introduction to Biosurveillance
0
- Natural and Bioterrorist-related disease
outbreaks can harm a large population at a fast
rate. - Early awareness and quick treatment can make the
difference! - We want to detect an outbreak as early as
possible - We focus on daily aggregates of pre-diagnostic
data that form time series - Multiple series within a single source or across
multiple sources
3Desired Preliminaries
Publicly available
Data
Labeled
Where?
Signature
iid...
4Existing Preliminaries
- Insufficient, unlabeled data
- Multiple sources
5Projects
- Project Mimic (http//www.projectmimic.com/)
Lead Thomas Lotze
6Projects
- Algorithm Combination for Improved Performance in
BioSurveillance Systems
7Projects
- Directionally-Sensitive Multivariate Control
Charts
8Detecting Outbreaks
- Explainable Patterns
- Seasonality
- Holidays
- Day of Week
Output residuals (assumed to be Normal)
DOW
- Common methods
- Linear Regression (mostly used)
- 7-Day Differencing
- Moving Average
- Holt Winters (exponential smoothing)
Output binary (outbreak/ no outbreak)
- Common methods
- Shewhart
- CuSum
- EWMA
9Algorithm Combination for ImprovedPerformance in
BioSurveillance Systems
10Method Combination
- Idea linear combinations of methods
- The objective is defined as the following cost
function -
Residuals
- Residuals combination Vs. control charts
combination -
Control Chart
11Combining Monitoring Charts
Combine outputs
Residuals
Control Charts
Shewhart
?a1
010100011100000001000
?ai 1
CUSUM
?a2
0, 0.5, 0.8, 0, 0, 0, 0, 0.2, .
010110000000000001000
EWMA
?a3
000001010101010001000
Alert if output gt threshold
12Residuals Combination
Raw data
Residuals
Monitor
Combined Residuals
?a1
R1
?aiRi
? a2
R2
? a3
R3
?ai 1
13Benefits of Algorithm Combination
- Current methods capture different types of
outbreak, but outbreak signature is unknown - Multiple testing as a solution ? increases FA
rate exponentially
? Combine Methods
- Alert threshold depends on data characteristics
(mean, variance), but those may change over time
? Combine Dynamically
14Illustration
Inject 20 step outbreaks
15Illustration cont
Cost for different thresholds
CUSUM
EWMA
Shewhart
Shewhart ?(Shewhart) 1/2 ?(CUSUM) 1/4
?(EWMA) 1/4
16Illustration cont
- Different threshold
- Different combination rule
- (?i1/3)
Infinite possibilities to combine ? MIP
17Formulate as MIP
18Multivariate Analysis Directionally-Sensitive
Multivariate Control Charts with an Application
to BioSurveillance
19Introduction
- Multiple series
- Different diseases
- Multiple symptoms
- Multiple locations
- Multiple data sources
- Additional info as OTC
- Multiple univariate testing as a solution
- FA increases exponentially
- Miss weak multivariate outbreak signatures
20Introduction
- Improve detection using correlation between
multiple series - Main challenge directional sensitivity
(increase in at least one of the means)
21Existing methods Non-Directional
- Multivariate Shewhart Hotelling
-
- Alert if
- Multivariate EWMA (Lowry et al 1992)
22Existing methods Directional Sensitive
- Hotelling (implementable)
- Follmann (1996) Correction to bi-directional
Hotelling - Testik and Runger (TR, 2006) Quadratic
programming - Both assume iid normal, known covariance matrix
- We generalize to Multivariate EWMA (with/ without
restart) - Which one performs better in practice??
23Evaluating Performance
- False alert rate
- Impact of cross correlation (magnitude)
- Number of series
- Robustness to assumptions
- Length of training data (unknown covariance
matrix) - Autocorrelation
- Multivariate Poisson Distribution
- True alert rate
24Flavor of Results FA
Higher variance
Higher bias
Distribution of FA as a function of training data
length
25Flavor of Results TA
- We inject outbreaks of size o
- to a subset s
- of the p series
- How does the true alert rate changes?
26Flavor of Results TA
27Flavor of Results Authentic Data
28Summary of Results
- All four methods require at least one year of
data (!!) to estimate covariance matrix (when
there are more than 5 series) - TRs method performs slightly better for normally
distributed data - Follmanns is more robust to normality assumption
(autocorrelation, Poisson data)
29Summary and Conclusion Remarks
- Two projects were presented with the goal of
improving performance in BioSurveillance Systems - Methods combination to preprocess and detect
outbreak in univariate data - Directionally-Sensitive Multivariate Control
Charts extensive performance analysis - We aim at defining monitoring guidelines to
assist CDC and other biosurveillance systems - When and how to monitor multivariate vs.
univariate - When and how to combine monitoring algorithms
- What performance can be expected in different
cases
30For Aadditional Information
- Inbal Yahav
- iyahav_at_rhsmith.umd.edu
- http//www.rhsmith.umd.edu/faculty/phd/inbal/