Title: Population-Wide Anomaly Detection
1Population-Wide Anomaly Detection
- Weng-Keen Wong1, Gregory Cooper2, Denver Dash3,
John Levander2, John Dowling2, Bill Hogan2,
Michael Wagner2
1School of Electrical Engineering and Computer
Science, Oregon State University, 2Realtime
Outbreak and Disease Surveillance Laboratory,
University of Pittsburgh, 3Intel Research, Santa
Clara
2Motivation
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
- Suppose you monitor Emergency Department (ED)
data which arrives in realtime - Can you specifically detect a large scale anthrax
attack?
3Model non-outbreak conditions and notice
deviations
Traditional Univariate Methods eg. Control chart,
CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
Multivariate methods eg. WSARE
2. Sat 2001-03-13 SCORE -0.00000464 PVALUE
0.00000000 12.42 ( 58/467) of today's cases
have 20 Age lt 30 AND Respiratory Syndrome
True 6.53 (653/10000) of baseline have 20
Age lt 30 AND Respiratory Syndrome True
4Model non-outbreak conditions and notice
deviations
Traditional Univariate Methods eg. Control chart,
CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
These are non-specific methods they look for
anything unusual in the data but not specifically
for the onset of an anthrax attack.
Multivariate methods eg. WSARE
2. Sat 2001-03-13 SCORE -0.00000464 PVALUE
0.00000000 12.42 ( 58/467) of today's cases
have 20 Age lt 30 AND Respiratory Syndrome
True 6.53 (653/10000) of baseline have 20
Age lt 30 AND Respiratory Syndrome True
5Population-wide ANomaly Detection and Assessment
(PANDA)
- A detector specifically for a large-scale outdoor
release of inhalational anthrax - Uses a massive causal Bayesian network
- Population-wide approach each person in the
population is represented as a subnetwork in the
overall model
6Population-Wide Approach
Anthrax Release
Global nodes
Interface nodes
Time of Release
Location of Release
Each person in the population
Person Model
Person Model
Person Model
- Note the conditional independence assumptions
- Anthrax is infectious but non-contagious
7Population-Wide Approach
Anthrax Release
Global nodes
Interface nodes
Time of Release
Location of Release
Each person in the population
Person Model
Person Model
Person Model
- Structure designed by expert judgment
- Parameters obtained from census data, training
data, and expert assessments informed by
literature and experience
8Person Model (Initial Prototype)
Anthrax Release
Location of Release
Time Of Release
Gender
Gender
Age Decile
Age Decile
Home Zip
Home Zip
Other ED Disease
Other ED Disease
Anthrax Infection
Anthrax Infection
Respiratory from Anthrax
Respiratory CC From Other
Respiratory from Anthrax
Respiratory CC From Other
Respiratory CC
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
ED Admit from Anthrax
ED Admit from Other
Respiratory CC When Admitted
Respiratory CC When Admitted
ED Admission
ED Admission
9Person Model (Initial Prototype)
Anthrax Release
Location of Release
Time Of Release
Female
20-30
50-60
Male
Gender
Gender
Age Decile
Age Decile
Home Zip
Home Zip
Other ED Disease
Other ED Disease
Anthrax Infection
Anthrax Infection
15213
15146
Respiratory from Anthrax
Respiratory CC From Other
Respiratory from Anthrax
Respiratory CC From Other
Respiratory CC
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
ED Admit from Anthrax
ED Admit from Other
Unknown
False
Respiratory CC When Admitted
Respiratory CC When Admitted
ED Admission
ED Admission
Yesterday
never
10Prototype is Computationally Feasible
- Aside from caching tricks, there are two main
optimizations - Incremental Updating
- Equivalence Classes
-
- Performance
- On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of
initialization time, 3 seconds for each hours
worth of ED data
See Cooper G.F., Dash D.H., Levander J.D., Wong
W-K, Hogan W. R., Wagner M. M. Bayesian
Biosurveillance of Disease Outbreaks. In
Proceedings of the 20th Conference on UAI.
Banff, Canada AUAI Press 2004. pp94-103.
11What do you gain with a population-wide approach?
- Coherent framework for
- Incorporating background knowledge
- Incorporating different types of evidence
- Data fusion
- Explanation
121. Incorporating Background Knowledge
- Limited data from actual anthrax attacks
available - Postal attacks 2001 (Only 11 people affected, not
representative of a large scale attack) - Sverdlovsk 1979
- But literature contains studies on the
characteristics of inhalational anthrax
131. Incorporating Background Knowledge
- Can coherently incorporate different types of
background knowledge eg. for inhalational
anthrax - Progression of symptoms
- Incubation period
- Spatial dispersion pattern
141. Incorporating Background Knowledge
- Can coherently incorporate different types of
background knowledge eg. for inhalational
anthrax - Progression of symptoms
- Incubation period
- Spatial dispersion pattern
At an individual level
151. Incorporating Background Knowledge
- Can coherently incorporate different types of
background knowledge eg. for inhalational
anthrax - Progression of symptoms
- Incubation period
- Spatial dispersion pattern
Can represent this by the effects over individuals
162. Incorporating Evidence
- Easily incorporate different types of evidence
eg. spatial, temporal, demographic, symptomatic - Easily incorporate new evidence that
distinguishes an individual (or individuals) from
others - Modify the appropriate person model
173. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
- No data available during an actual anthrax attack
that captures the correlation between these two
data sources. - By modeling the actions of individuals, and
incorporating background knowledge, we can come
up with a plausible model of the effects of an
attack on these two data sources.
183. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
ED data individual patient records, available
usually in real-time
OTC data aggregated over zipcode and available
daily
193. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
By representing at the finest granularity (ie.
each individual), we can easily deal with
different spatial and temporal granularity in
data fusion.
See Wong, W-K, Cooper G.F., Dash D.H., Dowling,
J.N., Levander J.D., Hogan W. R., Wagner M. M.
Bayesian Biosurveillance Using Multiple Data
Streams. In Proceedings of the 3rd National
Syndromic Surveillance Conference, 2004.
204. Explanation
- Important to know why the model believes an
anthrax attack is occurring - Can find the subset of evidence E that most
influences such a belief - In PANDA, E would correspond to a group of
individuals - Identify the individuals that most contribute to
the hypothesis of an attack
214. Explanation
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
M 20-30 15213 True 2 days ago
Currently, we identify the top equivalence
classes that contribute the most to the
hypothesis that an attack is occurring
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
F 20-30 15213 True 2 days ago
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
M 30-40 15213 True 2 days ago
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
F 40-50 15213 True 2 days ago
Can also use the Bayesian network to calculate
the most likely location of release and time of
release
22Future Work
- More sophisticated person models
- Improved explanation capabilities
- Validation of data fusion model
- More disease models apart from anthrax
- Contagious disease models
- Combining outputs from multiple Bayesian detectors
23Thank You!
RODS Laboratory http//rods.health.pitt.edu Bayes
ian Biosurveillance http//www.cbmi.pitt.edu/pand
a/
This research was supported by grants IIS-0325581
from the National Science Foundation,
F30602-01-2-0550 from the Department of Homeland
Security, and ME-01-737 from the Pennsylvania
Department of Health.