Strategies for Prospective Biosurveillance Using Multivariate Time Series - PowerPoint PPT Presentation

About This Presentation
Title:

Strategies for Prospective Biosurveillance Using Multivariate Time Series

Description:

Howard Burkom1, Yevgeniy Elbert2, Sean Murphy1. 1Johns Hopkins Applied Physics Laboratory ... 'no outbreak at any hospital j' j=1,...,N ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: Rev127
Category:

less

Transcript and Presenter's Notes

Title: Strategies for Prospective Biosurveillance Using Multivariate Time Series


1
Strategies for Prospective Biosurveillance Using
Multivariate Time Series
  • Howard Burkom1, Yevgeniy Elbert2, Sean Murphy1
  • 1Johns Hopkins Applied Physics Laboratory
  • National Security Technology Department
  • 2 Walter Reed Army Institute for Research
  • Tenth Biennial CDC and ATSDR Symposium on
    Statistical Methods
  • Panelist Statistical Issues in Public Health
    Surveillance for Bioterrorism Using Multiple Data
    Streams
  • Bethesda, MD March 2, 2005

2
Defining the Multivariate Temporal Surveillance
Problem
  • Multivariate Nature of Problem
  • Many locations
  • Multiple syndromes
  • Stratification by age, gender, other covariates
  • Surveillance Challenges
  • Defining anomalous behavior(s)
  • Hypothesis tests--both appropriate and timely
  • Avoiding excessive alerting due to multiple
    testing
  • Correlation among data streams
  • Varying noise backgrounds
  • Communication with/among users at different
    levels
  • Data reduction and visualization
  • Varying Nature of the Data
  • Trend, day-of-week, seasonal behavior depending
    on data type grouping

3
Problem to combine multiple evidence sources
for increased sensitivity at manageable alert
rates
height of outbreak
Recent Respiratory Syndrome Data
early cases
4
Multivariate Hypothesis Testing
  • Parallel monitoring
  • Null hypothesis no outbreak of unspecified
    infection in any of hospitals 1N (or counties,
    zipcodes, )
  • FDR-based methods (modified Bonferroni)
  • Consensus monitoring
  • Null hypothesis no respiratory outbreak
    infection based on hosp. syndrome counts, clinic
    visits, OTC sales, absentees
  • Multiple univariate methods combining p-values
  • Fully multivariate MSPC charts
  • General solution system-engineered blend of
    these
  • Scan statistics paradigm useful when data permit

5
Univariate Alerting Methods
  • Data modeling regression controls for weekly,
    holiday, seasonal effects
  • Outlier removal procedure avoids training on
    exceptional counts
  • Baseline chosen to capture recent seasonal
    behavior
  • Standardized residuals used as detection
    statistics
  • Process control method adapted for daily
    surveillance
  • Combines EWMA, Shewhart methods for sensitivity
    to gradual or sudden signals
  • Parameters modified adaptively for changing data
    behavior
  • Adaptively scaled to compute 1-sided
    probabilities for detection statistics
  • Small-count corrections for scale-independent
    alert rates
  • Outputs expressed as p-values for comparison,
    visualization

6
Parallel Hypotheses Multiple TestingAdapting
Standard Methods
  • P-values p1,,pn with multiple null hypotheses
    desired type I error rate a
  • no outbreak at any hospital j j1,,N
  • Bonferroni bound error rate is achieved with
    test pj lt a /N, all j (conservative)
  • Simes 1986 enhancement (after Seeger, Elkund)
  • Put p-values in ascending order P(1),,P(n)
  • Reject intersection of null hypotheses if any
    P(j) lt j a / N
  • Reject null for j lt j (or use more complex
    criteria)

7
Parallel Hypotheses Criteria to Control False
Alert Rate
  • Simes-Seeger-Elkund criterion
  • Gives expected alert rate near desired a for
    independent signals
  • Applied to control the false discovery rate (FDR)
    for many common multivariate distributions
    (Benjamini Hochberg, 1995)
  • FDR Exp( false alerts / all alerts )
  • Increased power over methods controlling Pr(
    single false alert )
  • Numerous FDR applications, incl. UK health
    surveillance in (Marshall et al, 2003)

8
Stratification and Multiple Testing

Counts unstratified by age
Counts ages 0-4
Counts ages 5-11
Counts ages 71
EWMA- Shewhart
EWMA- Shewhart
EWMA- Shewhart
EWMA- Shewhart
p-value, ages 0-4
p-value, ages 5-11
p-value, ages 71

aggregate p-value
Modified Bonferroni (FDR)
composite p-value
MIN
resultant p-value
9
Consensus MonitoringMultiple Univariate Methods
  • Fishers combination rule (multiplicative)
  • Given p-values p1, p2,,pn
  • F is c2 with 2n degrees of freedom, for pj
    independent
  • Recommended as stand-alone method
  • Edgingtons rule (additive)
  • Let S sum of p-values p1, p2,,pn
  • Resultant p-value
  • ( stop when (S-j) lt 0 )
  • Normal curve approximation formula for large n
  • Consensus method sensitive to multiple
    near-critical values

10
Multiple Univariate Criteria 2D Visualization
Nominal univariate criteria
Edgington
Fisher
11
934 days of EMS Data
  • 12 time series separate syndrome groups of
    ambulance calls
  • Poisson-like counts negligible day-of-week,
    seasonal effects
  • EWMA-Shewhart algorithm applied to derive
    p-values
  • Each row is mean over ALL combinations

12
Multivariate Control Charts
  • T2 statistic (X- m) S-1(X- m)
  • X multivariate time series syndromic claims,
    OTC sales, etc.
  • S estimate of covariance matrix from baseline
    interval
  • Alert based on empirical distribution to alert
    rate
  • MCUSUM, MEWMA methods filter X seeking shorter
    average run length
  • Hawkins (1993) T2 particularly bad at
    distinguishing location shifts from scale shifts
  • T2 nondirectional
  • Directional statistic (mA - m) S-1(X- m), where
    mA m is direction of change

13
MSPC Example 2 Data Streams
14
Evaluation Injection in Authentic and Simulated
Backgrounds
  • Background
  • Authentic 2-8 correlated streams of daily resp
    syndrome data (23 mo.)
  • Simulated negative binomial data with authentic
    m, modeled overdispersion
    with s2 km
  • Injections (additional attributable cases)
  • Each case stochastic draw from point-source
  • epicurve dist. (Sartwell lognormal model)
  • 100 Monte Carlo trials single outbreak effect
    per trial
  • With and without time delays between effects
    across streams

( 1-specificity )
alerted
signals


)
ection
Pr(det
injected
signals

( sensitivity )
15
Multivariate Comparison
Example faint, 1-s peak signal with in 4
independent data streams, with differential
effect delays
16
ROC Effects of Data Correlation
Example faint, 2-s peak signal with 2 of 6
highly correlated data streams, with differential
effect delays
Degradation of multiple, univariate methods
Detection Probability
Effect of strong, consistent correlation on
multivariate methods
Daily False Alarm Probability
17
Conclusions
  • Comprehensive biosurveillance requires an
    interweaving of parallel and consensus monitoring
  • Adapted hypothesis tests can help maintain
    sensitivity at practical false alarm rates
  • But background data and cross-correlation must be
    understood
  • Parallel monitoring FDR-like methods required
    according to scope, jurisdiction of surveillance
  • Multiple univariate
  • Fisher rule useful as stand-alone combination
    method
  • Edgington rule gives sensitivity to consensus of
    tests
  • Multivariate
  • MSPC T2-based charts offer promise when
    correlation is consistent significant, but
    their niche in routine, robust, prospective
    monitoring must be clarified

18
Backups
19
References 1
  • Testing Multiple Null Hypotheses
  • Simes, R. J., (1986) "An improved Bonferroni
    procedure for multiple tests of significance",
    Biometrika 73 751-754.
  • Benjamini, Y., Hochberg, Y. (1995). " Controlling
    the False Discovery Rate a Practical and
    Powerful Approach to Multiple Testing ", Journal
    of the Royal Statistical Society B, 57 289-300.
  • Hommel, G. (1988). "A stagewise rejective
    multiple test procedure based on a modified
    Bonferroni test , Biometrika 75,383-386.
  • Miller C.J., Genovese C., Nichol R.C., Wasserman
    L., Connolly A., Reichart D., Hopkins A.,
    Schneider J., and Moore A. , Controlling the
    False Discovery Rate in Astrophysical Data
    Analysis, 2001, Astronomical Journal , 122, 3492
  • Marshall C, Best N, Bottle A, and Aylin P,
    Statistical Issues in Prospective Monitoring of
    Health Outcomes Across Multiple Units, J. Royal
    Statist. Soc. A (2004), 167 Pt. 3, pp. 541-559.
  • Testing Single Null Hypotheses with multiple
    evidence
  • Edgington, E.S. (1972). "An Additive Method for
    Combining Probability Values from Independent
    Experiments. , Journal of Psychology , Vol. 80,
    pp. 351-363.
  • Edgington, E.S. (1972). "A normal curve method
    for combining probability values from independent
    experiments. , Journal of Psychology , Vol. 82,
    pp. 85-89.
  • Bauer P. and Kohne K. (1994), Evaluation of
    Experiments with Adaptive Interim Analyses,
    Biometrics 50, 1029-1041

20
References 2
  • Statistical Process Control
  • Hawkins, D. (1991). Mulitivariate Quality
    Control Based on Regression-Adjusted Variables ,
    Technometrics 33, 161-75.
  • Mandel, B.J, The Regression Control Chart, J.
    Quality Technology (1) (1969) 11-9.
  • Wiliamson G.D. and VanBrackle, G. (1999). "A
    study of the average run length characteristics
    of the National Notifiable Diseases Surveillance
    System, Stat Med. 1999 Dec 1518(23)3309-19.
  • Lowry, C.A., Woodall, W.H., A Multivariate
    Exponentially Weighted Moving Average Control
    Chart, Technometrics, February 1992, Vol. 34, No.
    1, 46-53
  • Point-Source Epidemic Curves Simulation
  • Sartwell, P.E., The Distribution of Incubation
    Periods of Infectious Disease, Am. J. Hyg. 1950,
    Vol. 51, pp. 310-318 reprinted in Am. J.
    Epidemiol., Vol. 141, No. 5, 1995
  • Philippe, P., Sartwells Incubation Period Model
    Revisited in the Light of Dynamic Modeling, J.
    Clin, Epidemiol., Vol. 47, No. 4, 419-433.
  • Burkom H and Rodriguez R, Using Point-Source
    Epidemic Curves to Evaluate Alerting Algorithms
    for Biosurveillance, 2004 Proceedings of the
    American Statistical Association, Statistics in
    Government Section CD-ROM, Toronto American
    Statistical Association (to appear)

21
MSPC 2-Stream Example Detail of Aug. Peak
22
Effect of Combining Evidence
Algorithm P-values
height of outbreak
secondary event
early cases
23
Bayes Belief Net (BBN) Umbrella
  • To include evidence from disparate evidence types
  • Continuous/discrete data
  • Derived algorithm output or probabilities
  • Expert/heuristic knowledge
  • Graphical representation of conditional
    dependencies
  • Can weight statistical hypothesis test evidence
    using heuristics not restricted to fixed
    p-value thresholds
  • Can exploit advances in data modeling,
    multivariate anomaly detection
  • Can model
  • Heuristic weighting of evidence
  • Lags in data availability or reporting
  • Missing data

24
Bayes Network Elements
Flu
Anthrax
Flu Season
GI Anomaly
Resp Anomaly
Sensor Alarm
Posterior probabilities
P(Flu Evidence) P(Anthrax Evidence)
0.70 0.0023
0.67 0.09
0.08 0.005
0.07 0.17
Evidence
gtgt
gtgt
gt
lt
25
Structure of BBN Model for Asthma Flare-ups
Syndromic
Asthma
Interaction
Asthma Military RX
Cold/Flu Season and Irritant
Resp Anomaly
SubFreezing Temp
Cold/Flu Season
Cold/Flu Season Start
Resp Military OV
Pollution
Resp Military RX
Resp Civilian OV
Ozone
Resp Civilian OTC
PM 2.5
AQI
Season
Allergen
Weed Pollen
Tree Pollen
Grass Pollen
Mold Spores
Season
Level
Season
Level
Season
Level
Season
Level
26
BBN Application to Asthma Flare-ups
  • Availability of practical, verifiable data
  • For truth data daily clinical diagnosis counts
  • For evidence daily environmental, syndromic
    data
  • Known asthma triggers with complex interaction
  • Air quality (EPA data)
  • Concentration of particulate matter, allergens
  • Ozone levels
  • Temperature (NOAA data)
  • Viral infections (Syndromic data)
  • Evidence from combination of expert knowledge,
    historical data
Write a Comment
User Comments (0)
About PowerShow.com