Cleansing and Geocoding Spatial Data - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Cleansing and Geocoding Spatial Data

Description:

... and financial data warehouse at the Ohio State University Medical Center (OSUMC) ... in analyzing the quality of historical data coming from the source ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 16
Provided by: Sant110
Learn more at: http://bmi.osu.edu
Category:

less

Transcript and Presenter's Notes

Title: Cleansing and Geocoding Spatial Data


1
Cleansing and Geocoding Spatial Data
Jyoti Kamal, PhD
2
Introduction
In developing an enterprise-wide clinical and
financial data warehouse at the Ohio State
University Medical Center (OSUMC), we invested
tremendous time and effort in analyzing the
quality of historical data coming from the source
system as a first step to our commitment to data
quality. The OSUMC has developed an in house
technique and process to cleanse and geocode
spatial data in a single pass on the city, state
and zip code combination. In developing this
technique and process there were several goals in
mind.
3
Goals
Identify potential marketing regions for hospital
outreach clinic services. Identify regions by
patient demographics for direct marketing
mailings. Correlate abnormal diagnosis with
patient regional distributions. Identify
geographic correlations with other patient
trends. Identify regions by physician
referrals. Bio-surveillance Retrospective public
health surveillance.
4
Available Data
7 years worth of patient addresses 7 years worth
of patient demographics
Data Issues
Unknown integrity of data No geocoding of data
5
Cleansing Logic and Process
Purchased list of valid city, state and zip code
combinations along with zip code to county
mapping, zip code centered latitude and longitude
coordinates Data loaded into warehouse and
refreshed monthly to keep synchronized with
United States Postal Service Tool Informix Data
Stage TM used for extraction, transformation and
load.
6
Cleansing Logic and Process continued
Figure 1. Cleansing logic showing different
transform stages and output files.
7
Cleansing Logic and Process continued
Table 1. Sample of data before and after
cleansing and the additional geographic
information attached with each output record.
Changes in the cleansed data are italicized
8
Spatial Analysis
To provide analytical ease we divided the
geographic area around OSUMC in predefined rings
and sectors. To define the rings we chose
imaginary concentric circles in increments of ten
miles (larger increment after hundred miles) and
partitioned them in eight equal sectors of 45
degrees, each indicating a broad different
direction. One can, in fact, choose any
increment and angle depending on the granularity
of distribution desired. With this picture in
mind, we could write an application, which in
combination with patient zip code and the data
shown in table 2, could determine which ring and
sector the patient zip codes would fall. From
this information, in turn, it was easy to
determine what percent of the OSUMC patient
population fell in which ring sector
9
Spatial Analysis
Table 2. Sample of distance and bearing
calculations in reference to the OSUMC
10
Spatial Analysis
Figure 2. Center of these circles is the OSUMC.
The rings are drawn in increments of 10 miles and
the circles are divided in eight equal sectors of
45 degrees, each signifying a broad direction. N
North NE North East E East SE South East
S South SW South West W West NW North West
11
Results
Figure 3. Patient distribution by distance and
bearing in relationship to the OSUMC
12
Results
Figure 4. Top five services by patient count and
gender distribution for patients that are in the
age range of 40-45, live within 10 to 20 miles on
the south east direction of the OSUMC
13
Results
Figure 5. Patient distribution by in relationship
to the OSUMC for oncology services
14
Conclusions
Process was sufficient for the OSUMC Inexpensive
process Extended the application beyond patient
data to physician data Satisfied business
analysis and marketing personnel with service,
gender and age patient distribution data
15
Acknowledgments
Authors thank Dr. Joel Saltz, Dr. Hagop Mekhjian
and Maxine Moehring for their support and
encouragement. The ongoing help of the Data
warehouse team, in particular, Jennifer
Santangelo, Mike Ostrander and Israel Rosales is
greatly appreciated. Kevin Li and Mike
Ostranders work on the validation process and
OLAP application is invaluable.
Write a Comment
User Comments (0)
About PowerShow.com