Kevin A Henry, Ph.D - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Kevin A Henry, Ph.D

Description:

Cases geocoded to a zip code centroid may not be located in the correct census tract. ... 2000 Census block populations aggregated into zip codes (Tele Atlas, 2006) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 16
Provided by: kevinhenry
Category:
Tags: codes | henry | kevin | zip

less

Transcript and Presenter's Notes

Title: Kevin A Henry, Ph.D


1
Estimating the accuracy of different
geographical imputation methods
  • Kevin A Henry, Ph.D
  • New Jersey Cancer Registry
  • Cancer Epidemiology Services
  • Frank Boscoe, Ph.D
  • New York State Cancer Registry

Paper Presentation NAACCR Annual Meeting, 2007,
Detroit, MI
2
Introduction
  • Geographical Imputation
  • Methods to assign a case a geographic location
    that is approximate or accurate given available
    geographic and demographic data
  • Goal of geo-imputation is to assign a case a
    location at one geographical aggregate level
    based on information from one or more known
    geographical aggregates (Boscoe 2007).
  • Assigned locations can be
  • Area (e.g. census tract, block group)
  • Point (e.g. latitude longitude within census
    tract)

Geo-imputation Example Zip code to census tract
08648
Black Population 3260
Available Case Information
1,639
692
50.
  • Zip code08648
  • Race Black

21
19.5
293
636
8.9
3
Introduction
  • Why should we geo-impute?
  • Studies can be biased due to the geographic
    non-randomness of ungeocoded cases or cases
    geocoded to zip code centroid (Oliver et al.
    2006).
  • Cases geocoded to a zip code centroid may not be
    located in the correct census tract.
  • Removing cases geocoded by zip code can result in
    selection bias.
  • Cases geocoded to zip code centroids can inflate
    case counts at the location where the zip
    centroid falls.
  • Should we geo-impute?

No systematic evaluation of geo-imputation has
been completed to determine which method offers
the best predictive power.
4
Study Objective
  • Examine the usefulness of geo-imputation for
    assigning census tracts to cases that have
    been previously geocoded to only a zip code
    centroid.

Study Questions
  • What census tract demographic information (e.g.
    race, age) provides the best predictive value to
    assign a case to the correct census tract?
  • Is demographic based geo-imputation better than
    two alternatives?
  • 1) Selecting census tracts within a zip code
    zone randomly
  • 2) Using the census tracts originally assigned
    to cases based on the zip code centroid
    location.

5
Background What is a zip code
  • ZIP or Zone Improvement Program are linear
    features associated with specific roads or
    specific addresses
  • Zip code zones are created by digitizing
    boundaries around geographically street ranges

Census Tracts Falling Within in Zip Code Zone
Zip Code Centroid
Street Segments Used for Geocoding
6
Background New Jersey Zip Codes
  • 558 zip code zones
  • 92 of zip codes have 2 or more potential census
    tracts
  • 1 zip code has 23 potential census tracts
  • Average tracts per zip code 6

Census Tracts Per Zip Code
25
20
15
Percent
10
5
0
1
3
5
7
9
11
13
15
17
19
21
23
Tract Frequency
7
Methods Study Population
  • New Jersey residents diagnosed with breast,
    prostate and colorectal cancer geocoded to a full
    street address (2000-2004, N96,852, NJSCR)
  • Additional study exclusions (N4100)
  • No age or race
  • Invalid zip codes
  • Invalid census tracts
  • Cases geocoded to zip centroids with only one
    census tract
  • Registry Variables

Original Case Data
Imputed Case Data
Compared with
Census Tracts Assigned to Cases
Census Tracts Assigned to Cases
Truth
8
Methods Demographic Data
  • Creation of Census Tract Populations
  • 2000 Census block populations aggregated into zip
    codes (Tele Atlas, 2006).
  • Census tract populations created to include only
    populations within zip code.

Zip code 07524
Total Tract Population
  • 2000 SF1 Census populations included

-Total Population (P001001) -White alone
(P003003) -Black or African Amer. alone
(P003004) -Asian alone (P003006) -Hispanic or
Latino (P004002) -Total Population by age
(P012003-P012049)
3,101
6,774

Census Block Population
  • Cumulative probabilities calculated for each
    tract per zip code.

9
Method Geo-imputation
Step 1
Step 2
Calculate Cumulative Probabilities From CT
Population
Generate random number for each case (0-1)
07001
3
2
18.4
32.8
4
1
15.9.
32.7
10
Methods Test Samples
  • Random samples for race and age groups
    stratified by population density (Quintiles).
  • Geo-imputations completed for each subset
  • Compared imputed census tracts with the tracts
    from the original case data (truth).
  • Each imputation was run 1000 times.
  • Results Boxplots of mean of matches.

11
Results
35
Rural
30
Urban
25
Mean Percent Correct
20
No imputation (17.1)
15
10
5,079 - 11,579
1,133 - 2,882
2,883 - 5,078
11,579
Population Per Square Mile by Census Tract
12
Results
30
26.3
Asia, White, Black Hispanic Combined
25
24.6
(24)
22.2
22
Mean Percent Correct
20
No imputation (17.1)
15
Random
13
10

Asian
Black
Hispanic
White
Total Population
N4000
N3000
N25000
N1500
N33,500
Population
13
Results
30
25
Age Combined (24.9)
20
Mean Percent Correct
No imputation (17.1)
15
Random
13
10
40-44
45-49
50-54
55-59
60-61
62-64
65-66
67-69
70-74
75-79
80-84
85
Age groups
14
Conclusion
  • Geo-imputation provides a higher match rate than
    no-imputation or randomly allocating tracts.
  • Percent correct dependent on population density.
  • Imputation based on race specific population
    slightly higher than total population (23.1 vs
    24 ).
  • States with larger rural populations would likely
    have better match rates than New Jersey.
  • Geographic imputation does offer some advantages
    and no serious drawbacks compared with the
    alternative of excluding ungeocoded cases from an
    analysis.

15
Thank you
Write a Comment
User Comments (0)
About PowerShow.com