Title: GIS Data Sources and Data Quality
1GIS Data Sources and Data Quality
Glen Johnson New York State Department of
Health and School of Public Health, University at
Albany
2GIS marriage of geospatial objects and
attributes
1. Geospatial Objects vector coverage
- points, lines, polygons raster
coverage (grid cells) - images, model
results 2. Attribute Data information
associated with geospatial objects
3Vector data - points (locations of residences,
hospitals, hazardous waste sites, etc.) - lines
(roads, streams, etc.) - polygons (- artificial,
for human management, such as
counties, census tracts, ZIP codes
- real, such as
watershed boundaries, geology, soil, water
bodies) Raster data (for continuous spatial
coverage) - remote sensing products -
aerial photographs, orthorectified to adjust for
distortion - satellite imagery - grid-based
model results (air pollution, soil erosion, etc.
) - digital elevation models (DEMs) for
topography
4(No Transcript)
5So where acquire all these data ?
- Many sources, increasing every year
- Our focus is on Public Health applications
- We will highlight key sources
- You are responsible for the quality of any data
used for your GIS projects
6Some key GIS data sources(Other than what ships
with commercial software or that you can purchase)
NationwideFederal Geographic Data Committee
(FGDC) Geospatial One-Stop http//gos2.geodata.
gov/wps/portal/gos Statewide (a quick sample)
New York State GIS Clearinghouse http//www.nysg
is.state.ny.us/(though access is increasingly
limited for security reasons) New York State
Data Center http//www.nylovesbiz.com/nysdc/downlo
ad_intro.asp)
7Global (another quick sample) Cornell
University Librarywww.library.cornell.edu/olinuri
s/ref/maps/intldata.htmlStanford University
Librarywww-sul.stanford.edu/depts/gis/web.html
Many many URLs today just search
8Procuring Data Download Uncompress Translate
(for particular software and coordinate
projections) need metadata for information on
coordinate projections
9- If several GIS users need the same data, best
to locate on a central server or database - Avoid wasteful duplication of effort
- Central maintenance - updating - data quality
assurance
If using a virtual globe environment like Google
Earth, ArcExplorer, etc., much of the
public-domain data are already in place
10Source data Geographic objects attributes
PC or central server or Database (personal or
enterprise)
tables / reports
statistical and other external analyses
Other, external data
Mapsprinted or served for more interactive
viewing/analysis
11- Attribute data for public health applications
include - Population (based on census)
- Health Outcomes
- Exposure
- Environmental
12U.S. census data source www.census.gov Census
geography based on TIGER files (see handout)
13US Census TIGER FilesNew York State (2000)
- County - 62
- Census Tract 4,907
- Block Group 15,079
- Census Block 298,506
14Census Geography, Albany City
15Census Geography, Albany County
16Socio-demographic attribute data
- Census Short Form (Summary Tape File 1 in 1990
Summary File 1 in 2000) - 100 data
- Lowest level of geography is census block
- age groups
- Sex
- Race (much more detailed in 2000)
- Ethnicity (Hispanic origin)
- Housing Units
17- Census Long Form (STF3 in 1990 SF3 in 2000)
- 1 in 6 households sampled.
- Lowest level of census geography is block group.
- Education
- Income
- Housing
- Source of water and sewer
- Commuting time
- Country of origin
- Occupation
- many other attributes (variables)
18- subject to confidentiality protection at the
personal level- public domain data often
available at an aggregated level
Health Outcomes Data
- Vital statistics. Birth and Death (ICD Codes)
- Hospital Discharge Data (ICD, DRG, MDC codes) -
SPARCS in New York - Cancer Incidence
- Congenital Malformations
- STDs, HIV/AIDS
- Infectious Diseases
19Exposure Registries
- Some New York State examples
- Occupational Heavy Metals Registry
- Childhood Blood Lead Reporting System
- Radon Registry
- Volatile Organic Compound Registry
- Pesticide Registry
20Environmental Exposure Sources(mostly from U.S.
EPA and state agencies)some examples
- Toxic Release Inventory
- Inactive Hazardous Waste Sites
- Municipal landfills
- Discharges to water (SPDES)
- Household measures of radon
- Soil sample data
- Drinking water contaminants
- Air pollution modeled and measured
- Contaminants in fish
- Power plants
- Contaminants in raw and finished drinking water
21Data Quality in GIS
22Producer - responsible for documenting data
quality (producing metadata)
User - responsible for checking data quality,
especially with respect to the particular
application
feedback
23Data Quality Standards
Federal Geographic Data Committee (FGDC)
(www.fgdc.gov) - established Spatial Data
Transfer Standard and Content Standards for
Digital Geospatial Metadata
in other words provides common set of
terminology and common structure for
geospatial metadata - fundamental data
quality information to be reported -
tests to be performed
24Fundamental aspects of data quality that apply to
both geospatial and attribute components of a GIS
- Accuracy (closeness to truth)
- Resolution (level of detail)
- Consistency (logical?)
- Completeness (degree of omission)
geospatial data quality requirements depend on
application and scale
25Scale means different things Map Scale - say
1 on map 24,000 on real land - 124,000 is
said to be a larger scale map than, say
1100,000 Measurement Scale (primary unit of
observation, also known as grain or resolution)
- i.e. areal unit of aggregation (census tract,
ZIP code, etc.) - pixel size in raster (grid)
image Extent - spatial boundary within which
a study applies - i.e. state of New York, Kings
County, Adirondack Blue Line, etc.
26Spatial / Positional accuracy is a function of
map scale. Accuracy Standards employed by the
U.S.G.S. for Various Scale Maps
11,200 3.33 feet 12,400 6.67 feet 14,800
13.33 feet 110,000 27.78 feet 112,000
33.33 feet 124,000 40.00 feet 163,360
105.60 feet 1100,000 166.67 feet
This means that when we see a point on a map we
have its "probable" location within a certain
area. The same applies to lines.
27Spatial Accuracy of a Point Object
28Spatial Accuracy of a Line Object
29Digitizing Errors
Other issues of geospatial errors
30- Attribute Quality
- same issues as with non-GIS studies (must do
usual data checking and cleaning) - attribute errors become spatial errors when
attributes are used for mapping (for example,
incorrect entry of ZIP code or
latitude/longitude)
31EPA 1989 Toxic Release Release InventoryQuery
for New York State sites
32EPA 1989 Toxic Release Release Inventory Query
for New York State sites
33(No Transcript)
34- Resolution
- - spatial
- nodes / line (vector format data) pixel
size (raster format data) - - temporal
- Incidence rates over 5 yr period vs. 1 yr period
Low resolution not necessarily bad Optimum
resolution depends on application and
consequently desired map scale
35When digitizing natural boundaries, greater
resolution generally means greater accuracy (at
the cost of greater data storage requirements and
processing times)
36Local analysis, such as identifying buildings
in a neighborhood, may call for fine resolution
digital orthophotos
37Can zoom in very close before actual pixel
structure emerges
38(No Transcript)
39Regional analysis, such as analyzing land cover
patterns, may be better suited for coarser
resolution satellite imagery
Southeast Pennsylvania land cover based on
30-meter resolution LANDSAT image
40Not appropriate for local analysis, such as
identifying buildings
Philadelphia International Airport, zoomed in
from previous image
41Optimum resolution is balance between objective
of analysis and data storage/processing efficiency
Consider increase resolution by halving the
length of a pixel side results in quadrupling
the data set size
42Temporal Accuracy
43- Key Points on Data Quality
- accuracy, resolution, consistency, completeness
- scale dependence
- consider spatial, temporal and attribute errors
- metadata - how complete is it? - does it
exist at all?
44An excellent, up-to-date overview of many aspects
of GIS and Spatial analysis that is freely
accessible De Smith, Goodchild and Longley,
2006-2008 Geospatial Analysis - a comprehensive
guide. http//www.spatialanalysisonline.com/outp
ut/