Title: Cancer Incidence Smoothing
1Spatial Smoothing of Public Health Data
Glen D. Johnson New York State Department of
Health Division of Family Health Office of the
Medical Director and The University at Albany
School of Public Health Department of
Environmental Health Sciences
2Prostate Cancer Incidence by ZIP codeadjusted
for age and race New York State1994-1998
3Prostate Cancer Incidence by ZIP codeadjusted
for age and race New York State1994-1998
4Consider . Area 1 (rural)
Area 2 (urban) Population
500
50,000expected
2.5
250 Say, observed values taken over Time Frame
A observed cases 2
250 then
SIR 2/2.5 0.8
250/250 1.0 Time Frame
B observed cases 3
251 then
SIR 3/2.5 1.2
251/250 1.004
Therefore, an increase of one case results in a
very drastic difference in the perceived relative
risk, although it is most likely due to chance
5(No Transcript)
6Smoothing Approaches
- Non-Parametric, empirical
- Increase size of areal mapping units / decrease
resolution- say ZIP code vs. county- or
automated zonal aggregation (covered in another
lecture and lab) - Spatial Filtering
- Head-banging
- Parametric, model based
- Observation yi treated as a random variable
arising from a true, but unknown, value µi plus
error e -
- Bayesian Modeling
7(No Transcript)
8Spatial Filtering
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Head-banging
18- set number of nearest neighbors, Nmaximum number
of triples, NTRIPand minimum angle for each
triple (say, must be 135o)For each point i,
identify a set of triples based on the criteria
above. - For each of j1, , NTRIP triples, identify the
highest and lowest endpoints, H j and L j
respectively. - Obtain the weighted median for the high values
(Hmed) and low values (Lmed). Weights measure
reliability, typically inverse of population
size. - If the value at point i is such that Lmed Hmed, then yi remains unchanged. Otherwise, if
the cumulative weight of all endpoints of all
triples exceeds the center point weight times the
number of triples, thenif yi Lmed, orif yi Hmed, assign yi Hmed - After estimating a new value for each point i
1,, n, then repeat steps 2-4 above until
convergence is achieved.
19A Head-banging Application Atlas of United
States Mortality http//www.cdc.gov/nchs/products/
pubs/pubd/other/atlas/atlas.htm
20Parametric, Model-Based Methods
21Empirical Bayes Smoothing
Principal For each of i1, , n locations, the
observed number of cases, yi, is assumed to have
arisen from a Poisson probability distribution
with parameter ?i, which is the true but unknown
rate (this is what we want to estimate). The
Poisson parameter ?i is then assumed to have
arisen from a prior distribution. In this case
a Gamma distribution, which is characterized by a
shape parameter, a, and a scale parameter, ß.
22The smoothed rate is then obtained as the
posterior expected value,
Where a and ß are estimated from the data (thus
being an empirical Bayes solution) Note as ni
increases, and therefore yi increases, the
estimate of ?i approaches yi / ni . So estimates
for areas with large populations essentially
equal the raw rate, while the rate is pulled
towards the grand mean (a / ß) as the underlying
population decreases.
23Empirical Bayes smoothers were initially designed
to smooth unstable rates towards the grand mean.
In other words, to borrow strength from all
the data equally. (Clayton and Kaldor, 1987,
Biometrics, 43671-681) A localized version
smooths unstable rates towards a local
neighborhood mean only, therefore preserving more
of the spatial pattern. (Marshall, 1991, Applied
Statistics, 40(2)283-294) Both of these
empirical Bayes smoothers can be easily applied
in GeoDa software (due to Luc Anselin)
24Fully Bayesian smoothing (a very brief
overview) increasingly used by geographic
epidemiologists Main advantages include
flexibility obtain full distribution of
possible outcomes - allows many
ways to view the outcome
(mean, median, percentiles) -
inference based on actual probability
distributions, instead of
confidence intervals
25 Observations yi for location i 1, ,n
treated as having arose from a Poisson
distribution with a true, but unknown, rate ?i ,
so the expected number of cases ?ini . The
Poisson rate is then modeled as a function of
(possibly) covariates that vary spatially along
with yi plus a random effect e. More
specifically
26The random effect is distributed conditionally on
location, such that
All stochastic terms of the model (?i, e and ß if
applicable) are solved for as full posterior
distributions through an iterative method called
Monte Carlo Markov Chain
27Example posterior distribution of standardized
incidence ratios, estimated as simulated values
of ?
28Example Output Posterior Kernel Densities of
Prostate Cancer Incidence (94-98) for Some
Manhattan ZIP Codes
29Comparison of some Urban and Rural ZIP Codes
Upper Manhattan
Adirondacks (Hamilton Co.)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35- Summary
- Smoothing serves to
- help see a more meaningful underlying spatial
pattern - improve estimation for - areas with small
populations - rare diseases - Want to borrow strength from neighbors on an
as needed basis - preserve stable
estimates (from large populations) -
stabilize unstable estimates (from small
populations) - Take nonparametric approach for simplicity
or parametric approach for flexibility
(which is but one application of spatial
regression modeling)
36References
- Spatial Filtering
- Rushton, G. and Lolonis, P. 1996. Exploratory
spatial analysis of birth defect rates in an
urban population. Statistics in Medicine,
15717-726. - Talbot, T.O., Kulldorff, M., Forand, S.P. and
Haley, V.B. 2000. Evaluation of spatial
filters to create smoothed maps of health
data. Statistics in Medicine, 192399-2408. - Head-Banging
- Mungiole, M., Pickle, L.W. and Hansen-Simonson,
K. 1999. Application of a weighted
head-banging algorithm to mortality data maps.
Statistics in Medicine, 183201-3209
37- Bayesian Modeling references
- Waller, L.A. and Gotway, C.A. 2004. Applied
Spatial Statistics for Public Health Data.
Wiley. 494 pp. - Johnson, G.D. 2004. Smoothing Small Area Maps of
Prostate Cancer Incidence in New York State
(USA) using Fully Bayesian Hierarchical
Modelling. Int. J. Health Geographics 2004, 329
( http//www.ij-healthgeographics.com/content/3
/1/29 ) - Elliot, P., Wakefield, J.C., Best, N.G. and
Briggs, D.J. 2000. Spatial Epidemiology
Methods and Applications. Oxford. 475 pp. - Statistics in Medicine. 2000. Vol. 19 (special
issue on disease mapping) - Lawson, A. et al. 1999. Disease Mapping and
Risk Assessment for Public Health. Wiley.
482 pp.
38Method and Software Sources
Spatial Filtering http//www.uiowa.edu/geog/heal
th/index8.html Head-Banging http//srab.cancer.go
v/headbang/ Empirical Bayes http//geodacenter.a
su.edu/ (site for GeoDa) Fully Bayesian
Modeling http//www.mrc-bsu.cam.ac.uk/bugs/welcom
e.shtml (site for WINBUGS)