Geocoding addresses from a large populationbased study: Lessons learned and applied

1 / 40
About This Presentation
Title:

Geocoding addresses from a large populationbased study: Lessons learned and applied

Description:

MapQuest.com for addresses; Anywho.com for phone number and name ... Anywho.com works well for reverse telephone lookups and name lookups ... –

Number of Views:107
Avg rating:3.0/5.0
Slides: 41
Provided by: janemc7
Category:

less

Transcript and Presenter's Notes

Title: Geocoding addresses from a large populationbased study: Lessons learned and applied


1
Geocoding addresses from a large population-based
study Lessons learned and applied
  • Jane McElroy
  • University of WisconsinComprehensive Cancer
    Center
  • Gaylord E Nelson Institute for Environmental
    Studies
  • 27 Nov 2002
  • Monitoring Population Health (PM 803)
  • Population Health and Sciences Department

2
Geocoding
  • Showing the location of mailing addresses on a
    map by converting these addresses into respective
    latitude and longitude coordinates

3
Requirements
  • Authority file
  • e.g., TIGER street map (available online for 1995
    and 2000)
  • Input file
  • Data of study participants addresses
  • GIS software to compare the two files
  • ArcView 3.2 (used for this presentation)
  • Centrus (used by the State)
  • Dynamap
  • Geographic Data Technology, Inc., etc.

4
Street matching options in ESRIs ArcView 3.2
  • US street
  • US street w/ zone
  • (zonei.e.,zip code or mcd)
  • Zip 4
  • Etc.

Input file choices
Authority file
We used US street with zone for our study but
intersections dont have zone option
5
Authority file(from Jerry Sullivan, DOA)
  • Statutory authority for addressing is vested in
    the county however, some counties never
    exercised this power, leaving it to townships.
  • Wisconsin therefore has several dozen different
    addressing systems. These systems vary based on
  • point of origin
  • directionality (NSEW, center out)
  • use of prefixes, or not
  • numbers per mile (100, 264, 300, 400, 1000)
  • mixing directions on a single road segment

6
Authority File Limitations
  • Incomplete information
  • (e.g.,missing direction prefix, very small
    streets not included)
  • Spelling mistakes
  • Ranges not provides
  • (especially true for county, state and federal
    roads)
  • Quality varies by county
  • (due to the resources available by county to
    compile street map information)
  • Accuracy ??????

7
Input File Limitations
  • Post office standardized addresses not recorded
  • (e.g., WI Rap Wisconsin Rapids)
  • Use of PO Box Rural Route, RFD, or Firecodes as
    address
  • Spelling mistakes
  • Alias road name used but official name provide
    in map
  • (e.g., County Road AA Steed Road)

8
Breast Cancer Population-based Case-Control Study
Phase 1 1988 1991 Age 20-79 Phase 2 1991
1994 Age 50-79
Total number 14,804
Cases from statewide cancer registry Controls
from DOT drivers license list (and Health Care Financing
Admin list (65-79 yrs old)
Response rate 85 (cases) and 87 (controls)
9
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
Yes n12,950
No n1,854
Successful match with Step 2 county level maps?
Successful match using Step 3 Internet
mapping engines?
No n1,915
Yes n276
yes
No n746
Yes
Recontact of study participants for Step
4 better address successful
no
GIS software Geocoded addresses n11,311 (77)
GIS software Step 5 Geocoded zip code
centroid n470 (3.0)
Internet mapping engines Geocoded
addresses n3,023 (20)
10
Why worry about the address quality?
participants rural 8222 (56)0-30 2679
(18)31-55 1843 (12)56-70 1350
(9)...71-99 704 (5)..100
rurality
12 cnty 14 cnty 14 cnty 17
cnty 15 cnty
11
Geocoding Flowchart
Mailing Addresses N14,804
12
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
13
Quality of Addresses from Different Sources
n14,804
Garbled addressesnot able to fix with software
Already standardized
Fixed with standardization software
14
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
yes
Successful match with Step 2 county level maps?
15
Study Participants Geocoding Results Statewide
Batch
N10140 (68)
Bayfield cnty n5 of 36
6 geocoded in wrong county
Sawyer cnty n0 of 36
16
Lincoln
Winnebago
LaCrosse
Jefferson Waukesha Milwaukee Racine Kenosha
Sauk
17
Dane county TIGER 2000 vs 1995
18
Ashland county TIGER 2000 vs 1995
19
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
yes
no
no
Successful match with Step 2 county level maps?
Successful match using Step 3 Internet
mapping engines?
20
Look up address examples
  • N2395 USH 53
  • 543 CHICOG ST
  • 3223 500ST ,APT 24
  • 546 HWY B BX 395 E RR,3
  • WINTERGREEN APT 1001, 5603 JANESVILLE ROA
  • E13984 TOWN CK LK RD

21
Strategies to Improve the Match Rate in-house
  • Look-up study participants using Internet mapping
    engines by 1) their addresses, 2) their telephone
    numbers, 3) their names
  • MapQuest.com for addresses Anywho.com for phone
    number and name
  • Add better address, intersections, or xy
    coordinate to input file

22
Look up address examples
  • N2395 USH 53 ? N2395 US HWY 53
  • 543 CHICOG ST ? 543 Chicago St
  • 3223 500ST ,APT 24 ? 3223 500th St
  • 546 HWY B BX 395 E RR,3 ? 546 Cnty Rd B
  • WINTERGREEN APT 1001, 5603 JANESVILLE ROA ? 5603
    Janesville Rd
  • E13984 TOWN CK LK RD ? E13984 Town Creek Lake
    Rd

23
Look-up screen
24
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
yes
no
no
Successful match with Step 2 county level maps?
Successful match using Step 3 Internet
mapping engines?
yes
no
Recontact of study participants for Step
4 better address successful
25
Re-contacted no hope participants
Clear spatial bias as indicated by the study
participants that we could not find better
addresses to geocode. Implications in any type
of spatial analysis
Percentage number recontacted/total number
by county
26
Strategies to Improve the Match Rate no hope
file
  • Recontact by telephone study participants
  • 2. For PO Box addresses, can contact postmaster
    and request street address associated with that
    PO box
  • US Title 39
  • Code of Federal Regulations
  • Section 265.4(a)(4)(i)
  • From PO Box Rental Form 1093, request name and
    street address of PO Box holder

27
Response and geocoding rate of study participants
recontacted by mortality statusn597
28
Why obtain intersection info?
  • 196 addresses and intersections geocoded

50 match rate
34 match rate
23 match rate
Intersection match status
11 improvement in match rate w/ intersection
Address match status
29
Why avoid county, state, federal road as part of
intersection?
  • 176 addresses obtained from recontacting
    participants

30
Geocoding Flowchart
Mailing Addresses N14,804
Post Office Standardized Step 1
Addresses
Yes n12,950
No n1,854
Successful match with Step 2 county level maps?
Successful match using Step 3 Internet
mapping engines?
No n1,915
Yes n276
yes
No n746
Yes
Recontact of study participants for Step
4 better address successful
no
GIS software Geocoded addresses n11,311 (77)
GIS software Step 5 Geocoded zip code
centroid n470 (3.0)
Internet mapping engines Geocoded
addresses n3,023 (20)
31
Final Geocode
participants rural 8222 (56)0-30 2679
(18)31-55 1843 (12)56-70 1350
(9)...71-99 704 (5)..100
rurality
12 cnty 14 cnty 14 cnty 17
cnty 15 cnty
32
Things we learned 1 Post office standardize the
addresses
  • Use post office service or other standardization
    software packages (e.g., Semaphore Corp) for
    retrospective data
  • Train address gatherers to only accept geocodable
    addresses (no PO Box, RR, RFD) and enter them in
    a standardized fashion or design a screen to
    facilitate data entry

33
Things we learned 2Geocoding addresses
  • Geocode by county
  • Geocode both address and intersection
  • Use appropriate map by county (TIGER 1995 and
    TIGER 2000) for addresses and intersections

34
Things we learned 3Geocoding Intersections
  • Use the study participants street address as one
    of the parts of the intersection
  • Avoid as much as possible county, state and
    federal roads as part of the intersection
  • Dont assume the study participant understands
    the word intersection
  • Dont use major road when asking for
    intersection information

35
Things we learned 4Internet mapping engines
  • MapBlast.com no longer provides x,y coordinates
    on lineMapTech is designing an interface to do
    the same for a fee (150 update charge)
  • Anywho.com works well for reverse telephone
    lookups and name lookups
  • Need to establish a priori acceptance criteria
    for reverse lookups and code the decision rule
    used

36
Things we learned 5Costs
  • Software
  • ArcView for geocoding
  • MapTech for xy coordinants
  • FoxPro to design screen for updating addresses
    (excel spreadsheet works too)
  • Post office standardization software
  • Personnel
  • Geocoder(s) (w/skill in geocoding softwarewe
    trained ours)
  • Time
  • ½ hour / county to geocode input file (done at
    least 3 times)
  • 20-30 lookups/hour with experienced geocoder
  • Study management estimation start-up (40 hrs)
    weekly (5-10 hrs)

37
Questions to ask when reading a study that used
geocoding
  • What is the matching rate?
  • How is that obtainede.g., by state by county
    by zip code?
  • 98 metropolitan area 40 rural 69 overall
    which is OK but there can be study implications
  • What is the accuracy criteria?
  • This mean how reliable are the locations?
  • How much fuzziness was allowed to geocode (called
    spelling and match sensitivity scores)

38
Questions to ask when reading a study that used
geocoding cont
  • Are the un-geocoded locations randomly
    distributed over the area of analysis so there is
    not a clear spatial bias?
  • Are the un-geocoded locations a very small
    percentage of total such there is that minimal
    impact on the analysis?

39
Conclusion
  • Much more complicated and labor intensive than
    seems at first glance to get the matching rate
    above 68 for our addresses
  • Study ramification based on matching rate
  • Weakest link (at this date) is incomplete
    authority file, especially for rural areas
  • Once locations are geocoded, very interesting
    analyses can be done (e.g. Roberts, Kriegers
    and Rushtons analysis)

40
Acknowledgements
  • Drs. Patrick Remington, Amy Trentham-Dietz,
    Stephanie Robert, and Polly Newcomb for their
    collaboration on the design and implementation of
    the project
  • Drs. Henry Anderson, Larry Hanrahan, Russell
    Kirby, Marty Kanarek, Colin Jefcoate, and William
    Sonzogni for advice and support on this project
  • Laura Stephenson of the Wisconsin Cancer
    Reporting System for assistance with data
  • Betty Granda, Christina Kantor, Elizabeth
    Mannering, Kathy Peck, Lisa Sieczkowski, Jerry
    Phipps, John Hampton, Nicole Angresano, Mina Kim,
    and Linda Haskins for data collection and study
    management
  • Ayak Reec, Jeffrey Pearson, Indiana Strombom,
    Stephanie Holmes, LeAnn Anderson, Kwang Kim, and
    Luxme Harihan for geocoding
  • Mary Pankratz, Math Heinzel, Peter Nepokroeff,
    John Laedlein, and Gene Hafermann for technical
    support.
  • This study was supported by National Cancer
    Institute grants RO1 CA47147 and U01 CA82004
Write a Comment
User Comments (0)
About PowerShow.com