On Detecting Disease Outbreaks: Univariate and Multivariate Monitoring - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

On Detecting Disease Outbreaks: Univariate and Multivariate Monitoring

Description:

Illustration cont' Cost for different thresholds. Shewhart. CUSUM. EWMA. Shewhart ... Illustration cont' Different threshold. Different combination rule ( i=1/3) ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 31
Provided by: rhsmi
Category:

less

Transcript and Presenter's Notes

Title: On Detecting Disease Outbreaks: Univariate and Multivariate Monitoring


1
On Detecting Disease OutbreaksUnivariate and
Multivariate Monitoring
  • Inbal Yahav
  • With Galit Shmueli and Thomas Lotze
  • The work was partially supported by NIH grant
    RFA-PH-05-126.

2
Introduction to Biosurveillance
0
  • Natural and Bioterrorist-related disease
    outbreaks can harm a large population at a fast
    rate.
  • Early awareness and quick treatment can make the
    difference!
  • We want to detect an outbreak as early as
    possible
  • We focus on daily aggregates of pre-diagnostic
    data that form time series
  • Multiple series within a single source or across
    multiple sources

3
Desired Preliminaries
Publicly available
Data
Labeled
Where?
Signature
iid...
4
Existing Preliminaries
  • Insufficient, unlabeled data
  • Multiple sources

5
Projects
  • Project Mimic (http//www.projectmimic.com/)

Lead Thomas Lotze
6
Projects
  • Algorithm Combination for Improved Performance in
    BioSurveillance Systems

7
Projects
  • Directionally-Sensitive Multivariate Control
    Charts

8
Detecting Outbreaks
  • Explainable Patterns
  • Seasonality
  • Holidays
  • Day of Week

Output residuals (assumed to be Normal)
DOW
  • Common methods
  • Linear Regression (mostly used)
  • 7-Day Differencing
  • Moving Average
  • Holt Winters (exponential smoothing)

Output binary (outbreak/ no outbreak)
  • Common methods
  • Shewhart
  • CuSum
  • EWMA

9
Algorithm Combination for ImprovedPerformance in
BioSurveillance Systems
10
Method Combination
  • Idea linear combinations of methods
  • The objective is defined as the following cost
    function

Residuals
  • Residuals combination Vs. control charts
    combination

Control Chart
11
Combining Monitoring Charts
Combine outputs
Residuals
Control Charts
Shewhart
?a1
010100011100000001000
?ai 1
CUSUM
?a2
0, 0.5, 0.8, 0, 0, 0, 0, 0.2, .
010110000000000001000
EWMA
?a3
000001010101010001000
Alert if output gt threshold
12
Residuals Combination
Raw data
Residuals
Monitor
Combined Residuals
?a1
R1
?aiRi
? a2
R2
? a3
R3
?ai 1
13
Benefits of Algorithm Combination
  • Current methods capture different types of
    outbreak, but outbreak signature is unknown
  • Multiple testing as a solution ? increases FA
    rate exponentially

? Combine Methods
  • Alert threshold depends on data characteristics
    (mean, variance), but those may change over time

? Combine Dynamically
14
Illustration
Inject 20 step outbreaks
15
Illustration cont
Cost for different thresholds
CUSUM
EWMA
Shewhart
Shewhart ?(Shewhart) 1/2 ?(CUSUM) 1/4
?(EWMA) 1/4
16
Illustration cont
  • Different threshold
  • Different combination rule
  • (?i1/3)

Infinite possibilities to combine ? MIP
17
Formulate as MIP
18
Multivariate Analysis Directionally-Sensitive
Multivariate Control Charts with an Application
to BioSurveillance
19
Introduction
  • Multiple series
  • Different diseases
  • Multiple symptoms
  • Multiple locations
  • Multiple data sources
  • Additional info as OTC
  • Multiple univariate testing as a solution
  • FA increases exponentially
  • Miss weak multivariate outbreak signatures

20
Introduction
  • Improve detection using correlation between
    multiple series
  • Main challenge directional sensitivity
    (increase in at least one of the means)

21
Existing methods Non-Directional
  • Multivariate Shewhart Hotelling
  • Alert if
  • Multivariate EWMA (Lowry et al 1992)

22
Existing methods Directional Sensitive
  • Hotelling (implementable)
  • Follmann (1996) Correction to bi-directional
    Hotelling
  • Testik and Runger (TR, 2006) Quadratic
    programming
  • Both assume iid normal, known covariance matrix
  • We generalize to Multivariate EWMA (with/ without
    restart)
  • Which one performs better in practice??

23
Evaluating Performance
  • False alert rate
  • Impact of cross correlation (magnitude)
  • Number of series
  • Robustness to assumptions
  • Length of training data (unknown covariance
    matrix)
  • Autocorrelation
  • Multivariate Poisson Distribution
  • True alert rate

24
Flavor of Results FA
Higher variance
Higher bias
Distribution of FA as a function of training data
length
25
Flavor of Results TA
  • We inject outbreaks of size o
  • to a subset s
  • of the p series
  • How does the true alert rate changes?

26
Flavor of Results TA
27
Flavor of Results Authentic Data
28
Summary of Results
  • All four methods require at least one year of
    data (!!) to estimate covariance matrix (when
    there are more than 5 series)
  • TRs method performs slightly better for normally
    distributed data
  • Follmanns is more robust to normality assumption
    (autocorrelation, Poisson data)

29
Summary and Conclusion Remarks
  • Two projects were presented with the goal of
    improving performance in BioSurveillance Systems
  • Methods combination to preprocess and detect
    outbreak in univariate data
  • Directionally-Sensitive Multivariate Control
    Charts extensive performance analysis
  • We aim at defining monitoring guidelines to
    assist CDC and other biosurveillance systems
  • When and how to monitor multivariate vs.
    univariate
  • When and how to combine monitoring algorithms
  • What performance can be expected in different
    cases

30
For Aadditional Information
  • Inbal Yahav
  • iyahav_at_rhsmith.umd.edu
  • http//www.rhsmith.umd.edu/faculty/phd/inbal/
Write a Comment
User Comments (0)
About PowerShow.com