Space-Time Scan Statistics for Early Warning Systems - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Space-Time Scan Statistics for Early Warning Systems

Description:

University of Connecticut: David Gregorio, Zixing Fang ... Fang, Kulldorff, Gregorio: Brain cancer in the United States 1986-1995, A Geographical Analysis. ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 71
Provided by: wend139
Category:

less

Transcript and Presenter's Notes

Title: Space-Time Scan Statistics for Early Warning Systems


1
Space-Time Scan Statistics for Early Warning
Systems
  • Martin Kulldorff
  • Department of Ambulatory Care and Prevention
  • Harvard University Medical School
  • and Harvard Pilgrim Health Care

2
Content
  • Background on Disease Surveillance
  • Purely Spatial Scan Statistics Brain Cancer in
    the United States
  • Early Warning System using a Space-Time
    Permutation Scan Statistic Syndromic
    Surveillance in New York City
  • Various Extensions

3
Collaborators
  • Harvard Medical School Ken Kleinman, Richard
    Platt, Katherine Yih
  • New York City Dep Health Jessica Hartman, Rick
    Heffernan, Farzad Mostashari
  • University of Connecticut David Gregorio, Zixing
    Fang
  • Universidad Federal Minais Gerais Renato
    Assunção, Luiz Duczmal

4
Importance of Early Disease Outbreak Detection
  • Eliminate health hazards
  • Warn about risk factors
  • Earlier diagnosis of new cases
  • Quarantine cases
  • Scientific research concerning treatments,
    vaccines, etc.
  • Early detection is especially critical for
  • infectious diseases

5
Disease Surveillance
  • Data Sources
  • Disease Registries
  • Reportable Diseases
  • Electronic Health Records
  • Health Insurance Claims Data
  • Vital Statistics (Mortality)
  • Types of Data
  • Diagnosed Diseases
  • Symptoms (Syndromic Surveillance)
  • Lab Test Results
  • Pharmaceutical Drug Sales

6
Disease Surveillance
  • Frequency of Analyses
  • Daily
  • Weekly
  • Monthly
  • Yearly

7
Purely Temporal Methods
  • Farrington CP, Andrews NJ, Beale AD, Catchpole MA
    (1996) A statistical algorithm for the early
    detection of outbreaks of infectious disease. J R
    Stat Soc A Stat Soc 159 547563.
  • Hutwagner LC, Maloney EK, Bean NH, Slutsker L,
    Martin SM (1997) Using laboratory-based
    surveillance data for prevention An algorithm
    for detecting salmonella outbreaks. Emerg Infect
    Dis 3 395400.
  • Nobre FF, Stroup DF (1994) A monitoring system to
    detect changes in public health surveillance
    data. Int J Epidemiol 23 408418.
  • Reis B, Mandl K (2003) Time series modeling for
    syndromic surveillance. BMC Med Inform Decis Mak
    3 2.

8
Three Important Issues
  • An outbreak may start locally.
  • Purely temporal methods can be used
    simultaneously for multiple geographical areas,
    but that leads to multiple testing.
  • Disease outbreaks may not conform to the
    pre-specified geographical areas.

9
Why Use a Scan Statistic?
  • With disease outbreaks
  • We do not know where they will occur.
  • We do not know their geographical size.
  • We do not know when they will occur.
  • We do not know how rapidly they will emerge.

10
One-Dimensional Scan Statistic
11
The Spatial Scan Statistic
  • Create a regular or irregular grid of
    centroids covering the whole study region.
  • Create an infinite number of circles around
    each centroid, with the radius anywhere from zero
    up to a maximum so that at most 50 percent of the
    population is included.

12
(No Transcript)
13
  • For each circle
  • Obtain actual and expected number of cases
    inside and outside the circle.
  • Calculate likelihood function.
  • Compare Circles
  • Pick circle with highest likelihood function as
    Most Likely Cluster.
  • Inference
  • Generate random replicas of the data set under
    the null-hypothesis of no clusters (Monte Carlo
    sampling).
  • Compare most likely clusters in real and random
    data sets (Likelihood ratio test).

14
Poisson Likelihood Function
  • c / µ c x (C-c)/(C- µ) C-c
  • ccases in circle
  • µ expected cases in circle
  • C total cases

15
Spatial Scan Statistic Properties
  • Adjusts for inhomogeneous population density.
  • Simultaneously tests for clusters of any size and
    any location, by using circular windows with
    continuously variable radius.
  • Accounts for multiple testing.
  • Possibility to include confounding variables,
    such as age, sex or socio-economic variables.
  • Aggregated or non-aggregated data (states,
    counties, census tracts, block groups,
    households, individuals).

16
U.S. Brain Cancer Mortality1986-1995
deaths rate (95 CI) Children (age lt20)
5,062 0.75 (0.66-0.83) Adults (age 20)
106,710 6.0 (5.8-6.2) Adult Women
48,650 4.9 (4.7-5.0) Adult Men
58,060 7.2 (7.0-7.5) annual deaths / 100,000
17
Brain Cancer
  • Known risk factors
  • High dose ionizing radiation
  • Selected congenital and genetic disorders
  • Explains only a small percent of cases.
  • Potential risk factors
  • N-nitroso compounds?, phenols?, pesticides?,
    polycyclic aromatic hydrocarbons?, organic
    solvents?

18
Adjustments
All subsequent analyses where adjusted for
  • Age
  • Gender
  • Ethnicity (African-American, White, Other)

19
Brain Cancer Mortality, Children 1986-1995
20
Spatial Scan Statistic, Children
21
Children Seven Most Likely Clusters
Cluster Obs Exp RR
p 1. Carolinas 86 51 1.7 0.24 2.
California 16 4.9 3.3 0.74 3. Michigan
318 250 1.3 0.74 4. S Carolina 24 10 2.5 0.79 5
. Kentucky-Tenn 127 88 1.4 0.79 6.
Wisconsin 10 2.4 4.1 0.98 7. Nebraska 12 3.6 3.3
0.99
22
Conclusions Children
No statistically significant clusters
detected. Any part of the pattern seen on the
original map may be due to chance.
23
What About Adults?
24
Brain Cancer Mortality, Adults 1986-1995
25
Spatial Scan Statistic Adults
26
Spatial Scan Statistic, Women
27
Women Most Likely Clusters
Cluster Obs Exp RR
p 1. Arkansas et al. 2830 2328 1.22 0.0001
2. Carolinas 1783 1518 1.17 0.0001 3. Oklahoma
et al. 1709 1496 1.14 0.003 4. Minnesota et
al. 2616 2369 1.10 0.01 10. N.J. /
N.Y. 1809 2300 0.79 0.0001 11. S Texas 127
214 0.59 0.0001 12. New Mexico et al.
849 1049 0.81 0.0001
28
Spatial Scan Statistic Men
29
Men Most Likely Clusters
Cluster Obs Exp RR
p 1. Kentucky et al. 3295 2860 1.15 0.0001
2. Carolinas 1925 1658 1.16 0.0001 3. Arkansas
et al. 1143 964 1.19 0.001 4. Washington
et al. 1664 1455 1.14 0.003 5. Michigan 1251 1074
1.17 0.005 11. N.J. / N.Y. 2084 2615 0.80 0.00
01 12. S Texas 157 262 0.60 0.0001 13. New
Mexico et al. 1418 1680 0.84 0.0001 14. Upstate
N.Y. et al. 1642 1895 0.87 0.0001
30
Conclusions Adults
It is possible to pinpoint specific areas with
higher and lower rates that are statistically
significant, and unlikely to be due to
chance. The exact borders of detected clusters
are uncertain. Similar patterns for men and
women.
31
Conclusion General
The spatial scan statistic can be useful as an
addition to disease maps, in order to determine
if the observed patterns are likely due to chance
or not. A complement rather than a replacement
for regular disease maps.
32
Space-Time Scan Statistic
Use a cylindrical window, with the circular base
representing space and the height representing
time. We will only consider cylinders that reach
the present time.
33
  • For each cylinder
  • Obtain actual and expected number of cases
    inside and outside the cylinder.
  • Calculate likelihood function.
  • Compare Cylinders
  • Pick cylinder with highest likelihood function
    as Most Likely Cluster.
  • Inference
  • Generate random replicas of the data set under
    the null-hypothesis of no clusters (Monte Carlo
    sampling).
  • Compare most likely clusters in real and random
    data sets (Likelihood ratio test).

34
  • For each cylinder
  • Obtain actual and expected number of cases
    inside and outside the cylinder.
  • Calculate likelihood function.
  • Compare Cylinders
  • Pick cylinder with highest likelihood function
    as Most Likely Cluster.
  • Inference
  • Generate random replicas of the data set under
    the null-hypothesis of no clusters (Monte Carlo
    sampling).
  • Compare most likely clusters in real and random
    data sets (Likelihood ratio test).

35
Space-Time Permutation Scan Statistic
  • 1. For each cylinder, calculate the expected
  • number of cases conditioning on the marginals
  • µst Sscst x Stcst / C
  • where cst cases at time t in location s
  • and C total number of cases

36
Space-Time Permutation Scan Statistic
2. For each cylinder, calculate Tst cst /
µst cst x (C-cst)/(C- µst) C-cst if cst gt
µst 1, otherwise 3. Test statistic
T maxst Tst
37
Space-Time Permutation Scan Statistic
  • 4. Generate random replicas of the data set
    conditioned on the marginals, by permuting the
    pairs of spatial locations and times.
  • 5. Compare test statistic in real and random data
    sets using Monte Carlo hypothesis testing (Dwass,
    1957)
  • p rank(Treal) / (1replicas)

38
Space-Time Permutation Scan Statistic Properties
  • Adjusts for purely geographical clusters.
  • Adjusts for purely temporal clusters.
  • Simultaneously tests for outbreaks of any size at
    any location, by using a cylindrical windows with
    variable radius and height.
  • Accounts for multiple testing.
  • Aggregated or non-aggregated data (counties,
    zip-code areas, census tracts, individuals, etc).

39
(No Transcript)
40
Lets Try It!
  • Historic data, Nov 15, 2001 Nov 14, 2002
  • Diarrhea, all age groups
  • Use last 30 days of data.
  • Temporal window size 1-7 days
  • Spatial window size 0-5 kilometers
  • Residential zip code and hospital coordinates

41
Results Hospital Analyses
Date days hosp cases exp RR p
recurrence interval A Nov 21 6 1 101
73.6 1.4 0.0008 1 / 3.4 years B Jan 11
1 1 10 2.3 4.4
0.0007 1 / 3.9 years C Feb 26 4 2
97 66.9 1.4 0.0018
1 / 1.5 years D Mar 31 2 1 38
19.2 2.0 0.0017 1 / 1.6 years
E Nov 1 6 3 122 86.6
1.4 0.0017 1 / 1.6 years F Nov 2
7 3 135 98.3 1.4
0.0008 1 / 3.4 years
42
Results Residential Analyses


reccurence Date days zips cases
exp RR p interval G Feb
9 2 15 63 34.7
1.8 0.0005 1 / 5.5 years H Mar 7 2
8 63 37.3 1.7 0.0027 1
/ 1.0 years
43
(No Transcript)
44
(No Transcript)
45
Real-Time Daily Analyses
  • Starting November 1, 2003.
  • Respiratory, Fever/Flu, Diarrhea, (Vomiting)
  • Hospital (and Residential) Analyses
  • Spatial window size 0-5 kilometers
  • Temporal window size 1-7 days

46
Real-Time Results, Nov 24, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 2 3
80 57.4 1.4 0.13 every 8
days Fever/Flu 3 1 24
14.8 1.6 0.68 every day Diarrhea 2
4 18 8.2 2.2 0.04
every 26 days
47
Real-Time Results, Nov 25, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 7 1
45 30.4 1.5 0.46 every 2
days Fever/Flu 1 5 50
31.5 1.6 0.04 every 23
days Diarrhea 3 4 22 11.5
1.9 0.17 every 6 days
48
Real-Time Results, Nov 26, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 5 2
233 199.4 1.1 0.63 every 2
days Fever/Flu 7 7 299 252.1
1.2 0.05 every 22 days Diarrhea 4
4 23 12.6 1.8 0.22
every 5 days
49
Real-Time Results, Nov 27, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 1 4
41 26.9 1.5 0.45 every 2
days Fever/Flu 6 4 181 142.9
1.3 0.03 every 36 days Diarrhea 5
3 29 14.1 1.7 0.50
every 2 days
50
Real-Time Results, Nov 28, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 2 4
98 78.8 1.2 0.82 every
day Fever/Flu 7 5 228 178.0
1.3 0.001 every 1000 days Diarrhea 6
3 29 17.5 1.5 0.26
every 4 days
51
Real-Time Results, Nov 29, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 7 2
146 123.6 1.2 0.95 every
day Fever/Flu 7 4 253 195.7
1.3 0.001 every 1000 days Diarrhea 7
4 44 29.4 1.5 0.21
every 5 days
52
Real-Time Results, Nov 30, 2003 Hospital
Analysis
Syndrome days hosp cases exp RR p
recurrence interval Respiratory 1 1
19 10.7 1.8 0.69 every
day Fever/Flu 6 9 429 364.1
1.2 0.002 every 500 days Diarrhea 1
5 12 4.4 2.7 0.06
every 17 days
53
Summary
  • Four strong diarrhea signals
  • Two were early signals for city-wide outbreaks
    likely due to norovirus.
  • One was an early signal for a city-wide children
    outbreak, likely due to rotavirus.
  • One small outbreak of unknown etiology.
  • Three medium strength diarrhea signals
  • All during the rotavirus outbreak, possibly due
    to a shift in the geographical epicenter
  • One real-time fever/flu signal, coinciding with
    the start of the flu season.

54
Different Data Streams
  • For example
  • Nurses Hotline Calls
  • Regular Physician Visits
  • Emergency Department Visits
  • Ambulance Dispatches
  • Pharmaceutical Drug Sales
  • Lab Test Results

55
Multiple Data Streams
  • For each cylinder, add the Poisson log
    likelihoods Tst
  • log T1st log T2st log T3st
  • Test statistic T maxst Tst

56
Syndromic Surveillance in Boston Upper and Lower
GI
  • Harvard Pilgrim Health Care HMO members cared for
    by Harvard Vanguard Medical Associates
  • Historical Data from Jan 1 to Dec 31, 2002
  • Mimicking Surveillance from Sept 1 to Dec 31, 2002

57
Three Data Streams
  • Telephone Calls ( 20 / day)
  • Urgent Care Visits ( 9 / day)
  • Regular Physician Visits ( 22 / day)
  • Multiple contacts by the same person removed.

58
Strongest Signal October 18
  • Recurrence Interval
  • Multiple Data Streams lt 1 / 1000 days
  • Single Data Streams
  • Tele lt 1 / 1000 days
  • Urgent every day
  • Regular every day

59
October 18 Signal
  • Friday
  • Number of Cases 5
  • Expected Cases 0.04
  • Location Zip Code 01740
  • Time Length One Day

60
October 18 Signal
  • Friday
  • Number of Cases 5
  • Expected Cases 0.04
  • Location Zip Code 01740
  • Time Length One Day
  • Diagnosis Pinworm Infestation (all 5)

61
October 18 Signal
  • Friday
  • Number of Cases 5 (all tele)
  • Expected Cases 0.04
  • Location Zip Code 01740
  • Time Length One Day
  • Diagnosis Pinworm Infestation (all 5)
  • Same Family Mother, Father, 3 Kids

62
Limitations
  • Space-time clusters may occur for other reasons
    than disease outbreaks
  • Automated detection systems does not replace the
    observant eyes of physicians and other health
    workers.
  • Epidemiological investigations by public health
    department are needed to confirm or dismiss the
    signals.

63
Scan Statistics for Irregular Shaped Clusters
  • Duczmal, Assunção. A simulated annealing strategy
    for the detection of arbitrarily shaped spatial
    clusters. Computational Statistic and Data
    Analysis, 2004.
  • Patil, Talllie. Upper level set scan statistic
    for detecting arbitrarily shaped hotspots.
    Environmental and Ecological Statistics, 2004.
  • Iyengar. Space-time clusters with flexible
    shapes. Morbidity and Mortality Weekly Report,
    2005.
  • Tango, Takahashi. A flexibly shaped spatial scan
    statistic for detecting clusters. Int J Health
    Geographics, 2005.
  • Assunção, Costa, Tavares, Ferreira. Fast
    detection of arbitrarily shaped disease clusters.
    Statistics in Medicine, 2006.

64
Probability Models
  • Poisson model (e.g. incidence, mortality)
  • Bernoulli model (e.g. case-control data)
  • Normal model (e.g. weight, blood lead levels)
  • Exponential model (e.g. survival data)
  • Ordinal model (e.g. early, medium and late stage
    cancer)
  • Space-time permutation model (when only case data
    is available)

65
Application Areas
  • Chronic Diseases
  • Infectious Diseases
  • Health Services
  • Accidents
  • Brain Imaging
  • Toxicology
  • Veterinary Medicine
  • Psychology
  • Demography
  • Criminology
  • History
  • Archeology
  • Ecology

66
Examples of Applications
  • Beato Filho, Assunção, Silva, Marinho, Reis,
    Almeida. Homicide clusters and drug traffic in
    Belo Horizonte, Minas Gerais, Brazil from 1995 to
    1999. Cadernos de Saúde Pública, 2001.
  • Pellegrini. Analise espaço-temporal da
    leptospirose no municipio do Rio de Janeiro.
    Fiocruz, 2002.
  • Andrade, Silva, Martelli, Oliveira, Morais Neto,
    Siqueira Junior, Melo, Di Fabio. Population-based
    surveillance of pediatric pneumonia use of
    spatial analysis in an urban area of Central
    Brazil. Cadernos de Saúde Pública, 2004.
  • Ceccato. Homicide in São Paulo, Brazil Assessing
    spatial-temporal and weather variations. J
    Environmental Psychology, 2005.
  • Simões, Mendes, Marques, Pereira, Bagagli.
    Spatial clusters of paracoccidioido-mycosis in
    southeastern Brazil. Revista do Instituto de
    Medicina Tropical de São Paulo, 2005.

67
SaTScan Software
Free. Download from www.satscan.org
  • Registered users in 116 countries
  • USA
  • Canada
  • United Kingdom
  • Brazil
  • Italy
  • . . .
  • 100s. Albania, Bhutan, Burma, Fiji, Grenada,
    Guinea, Iraq, Macao, Madagascar, Malawi, Malta,
    etc

68
Future Topics
  • Irregular shaped clusters
  • Non-Euclidean neighbor definitions
  • Multivariate data
  • Multiple locations per observation
  • Computational speed

69
Acknowledgement
  • Research funded by
  • Alfred P Sloan Foundation
  • Centers for Disease Control and Prevention
  • Massachusetts Department of Health
  • National Cancer Institute
  • National Institute of Child Health and
    Development
  • National Institute of General Medical Sciences
  • Modeling Infectious Disease Agent Study (MIDAS)

70
References
  • Kulldorff. A spatial scan statistic.
    Communications in Statistics, Theory and Methods.
    261481-1496, 1997.
  • Fang, Kulldorff, Gregorio Brain cancer in the
    United States 1986-1995, A Geographical Analysis.
    Neuro-Oncology, 6179-187, 2004.
  • Kulldorff, Heffernan, Hartman, Assunção,
    Mostashari. A space-time permutation scan
    statistic for disease outbreak detection. PLoS
    Medicine, 2(3)e59, 2005.
  • Kulldorff, Mostashari, Duczmal, Yih, Kleinman,
    Platt. Multivariate spatial scan statistics for
    disease surveillance. Statistics in Medicine,
    2006, in press.
  • Kulldorff and IMS Inc. SaTScan v.7.0 Software
    for the spatial and space-time scan statistics,
    2004. Free http//www.satscan.org/
Write a Comment
User Comments (0)
About PowerShow.com