Population-Wide Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Population-Wide Anomaly Detection

Description:

Population-Wide Anomaly Detection Weng-Keen Wong1, Gregory Cooper2, Denver Dash3, John Levander2, John Dowling2, Bill Hogan2, Michael Wagner2 1School of Electrical ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 24
Provided by: Wong86
Category:

less

Transcript and Presenter's Notes

Title: Population-Wide Anomaly Detection


1
Population-Wide Anomaly Detection
  • Weng-Keen Wong1, Gregory Cooper2, Denver Dash3,
    John Levander2, John Dowling2, Bill Hogan2,
    Michael Wagner2

1School of Electrical Engineering and Computer
Science, Oregon State University, 2Realtime
Outbreak and Disease Surveillance Laboratory,
University of Pittsburgh, 3Intel Research, Santa
Clara
2
Motivation
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
  • Suppose you monitor Emergency Department (ED)
    data which arrives in realtime
  • Can you specifically detect a large scale anthrax
    attack?

3
Model non-outbreak conditions and notice
deviations
Traditional Univariate Methods eg. Control chart,
CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
Multivariate methods eg. WSARE
2. Sat 2001-03-13 SCORE -0.00000464 PVALUE
0.00000000 12.42 ( 58/467) of today's cases
have 20 Age lt 30 AND Respiratory Syndrome
True 6.53 (653/10000) of baseline have 20
Age lt 30 AND Respiratory Syndrome True
4
Model non-outbreak conditions and notice
deviations
Traditional Univariate Methods eg. Control chart,
CUSUM, EWMA, time series models
Spatial methods eg. Spatial Scan Statistic
These are non-specific methods they look for
anything unusual in the data but not specifically
for the onset of an anthrax attack.
Multivariate methods eg. WSARE
2. Sat 2001-03-13 SCORE -0.00000464 PVALUE
0.00000000 12.42 ( 58/467) of today's cases
have 20 Age lt 30 AND Respiratory Syndrome
True 6.53 (653/10000) of baseline have 20
Age lt 30 AND Respiratory Syndrome True
5
Population-wide ANomaly Detection and Assessment
(PANDA)
  • A detector specifically for a large-scale outdoor
    release of inhalational anthrax
  • Uses a massive causal Bayesian network
  • Population-wide approach each person in the
    population is represented as a subnetwork in the
    overall model

6
Population-Wide Approach
Anthrax Release
Global nodes
Interface nodes
Time of Release
Location of Release
Each person in the population
Person Model
Person Model
Person Model
  • Note the conditional independence assumptions
  • Anthrax is infectious but non-contagious

7
Population-Wide Approach
Anthrax Release
Global nodes
Interface nodes
Time of Release
Location of Release
Each person in the population
Person Model
Person Model
Person Model
  • Structure designed by expert judgment
  • Parameters obtained from census data, training
    data, and expert assessments informed by
    literature and experience

8
Person Model (Initial Prototype)
Anthrax Release
Location of Release
Time Of Release


Gender
Gender
Age Decile
Age Decile
Home Zip
Home Zip
Other ED Disease
Other ED Disease
Anthrax Infection
Anthrax Infection
Respiratory from Anthrax
Respiratory CC From Other
Respiratory from Anthrax
Respiratory CC From Other
Respiratory CC
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
ED Admit from Anthrax
ED Admit from Other
Respiratory CC When Admitted
Respiratory CC When Admitted
ED Admission
ED Admission
9
Person Model (Initial Prototype)
Anthrax Release
Location of Release
Time Of Release


Female
20-30
50-60
Male
Gender
Gender
Age Decile
Age Decile
Home Zip
Home Zip
Other ED Disease
Other ED Disease
Anthrax Infection
Anthrax Infection
15213
15146
Respiratory from Anthrax
Respiratory CC From Other
Respiratory from Anthrax
Respiratory CC From Other
Respiratory CC
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
ED Admit from Anthrax
ED Admit from Other
Unknown
False
Respiratory CC When Admitted
Respiratory CC When Admitted
ED Admission
ED Admission
Yesterday
never
10
Prototype is Computationally Feasible
  • Aside from caching tricks, there are two main
    optimizations
  • Incremental Updating
  • Equivalence Classes
  • Performance
  • On P4 3.0 Ghz machine, 2 GB RAM, 45 seconds of
    initialization time, 3 seconds for each hours
    worth of ED data

See Cooper G.F., Dash D.H., Levander J.D., Wong
W-K, Hogan W. R., Wagner M. M. Bayesian
Biosurveillance of Disease Outbreaks. In
Proceedings of the 20th Conference on UAI.
Banff, Canada AUAI Press 2004. pp94-103.
11
What do you gain with a population-wide approach?
  • Coherent framework for
  • Incorporating background knowledge
  • Incorporating different types of evidence
  • Data fusion
  • Explanation

12
1. Incorporating Background Knowledge
  • Limited data from actual anthrax attacks
    available
  • Postal attacks 2001 (Only 11 people affected, not
    representative of a large scale attack)
  • Sverdlovsk 1979
  • But literature contains studies on the
    characteristics of inhalational anthrax

13
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of
    background knowledge eg. for inhalational
    anthrax
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern

14
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of
    background knowledge eg. for inhalational
    anthrax
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern

At an individual level
15
1. Incorporating Background Knowledge
  • Can coherently incorporate different types of
    background knowledge eg. for inhalational
    anthrax
  • Progression of symptoms
  • Incubation period
  • Spatial dispersion pattern

Can represent this by the effects over individuals
16
2. Incorporating Evidence
  • Easily incorporate different types of evidence
    eg. spatial, temporal, demographic, symptomatic
  • Easily incorporate new evidence that
    distinguishes an individual (or individuals) from
    others
  • Modify the appropriate person model

17
3. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever
  • No data available during an actual anthrax attack
    that captures the correlation between these two
    data sources.
  • By modeling the actions of individuals, and
    incorporating background knowledge, we can come
    up with a plausible model of the effects of an
    attack on these two data sources.

18
3. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever

ED data individual patient records, available
usually in real-time
OTC data aggregated over zipcode and available
daily
19
3. Data Fusion
ED data
OTC data
Date / Time Admitted Age Gender Home Zip Chief Complaint
Aug 1, 2005 302 20-30 Male 15213 Shortness of breath
Aug 1, 2005 307 40-50 Male 15146 Diarrhea
Aug 1, 2004 309 70-80 Female 15132 Fever

By representing at the finest granularity (ie.
each individual), we can easily deal with
different spatial and temporal granularity in
data fusion.
See Wong, W-K, Cooper G.F., Dash D.H., Dowling,
J.N., Levander J.D., Hogan W. R., Wagner M. M.
Bayesian Biosurveillance Using Multiple Data
Streams. In Proceedings of the 3rd National
Syndromic Surveillance Conference, 2004.
20
4. Explanation
  • Important to know why the model believes an
    anthrax attack is occurring
  • Can find the subset of evidence E that most
    influences such a belief
  • In PANDA, E would correspond to a group of
    individuals
  • Identify the individuals that most contribute to
    the hypothesis of an attack

21
4. Explanation
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
M 20-30 15213 True 2 days ago
Currently, we identify the top equivalence
classes that contribute the most to the
hypothesis that an attack is occurring
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
F 20-30 15213 True 2 days ago
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
M 30-40 15213 True 2 days ago
Gender Age Decile Home Zip Respiratory Symptoms Date Admitted
F 40-50 15213 True 2 days ago
Can also use the Bayesian network to calculate
the most likely location of release and time of
release
22
Future Work
  • More sophisticated person models
  • Improved explanation capabilities
  • Validation of data fusion model
  • More disease models apart from anthrax
  • Contagious disease models
  • Combining outputs from multiple Bayesian detectors

23
Thank You!
RODS Laboratory http//rods.health.pitt.edu Bayes
ian Biosurveillance http//www.cbmi.pitt.edu/pand
a/
This research was supported by grants IIS-0325581
from the National Science Foundation,
F30602-01-2-0550 from the Department of Homeland
Security, and ME-01-737 from the Pennsylvania
Department of Health.
Write a Comment
User Comments (0)
About PowerShow.com