Title: Data Quality
1Data Quality
- GiGo garbage in, garbage out
- Cos its in the computer, dont mean its right
Its not the things you dont know that matter,
its the things you know that arent
so. Will Rogers Famous Okie GI
specialist
2Data Quality How good is your data?
- Scale
- ratio of distance on a map to the equivalent
distance on the earth's surface - Primarily an output issue
- Precision or Resolution
- the exactness of measurement or description
- Relates to both input and output
- Accuracy
- the degree of correspondence between data and the
real world - Relates primarily to input
- Currency
- the degree to which data represents the world at
the present moment in time - Documentation or Metadata
- data about data
- Recording all of the above
- Standards
- Common or agreed-to ways of doing things
- Data built to standards is more valuable since
its more easily shareable
3Scale
- ratio of distance on a map, to the equivalent
distance on the earth's surface. - Large scale --gtlarge detail, small area covered
(1200 or 12,400) - Small scale --gtsmall detail, large area
(1250,000) - A given object (e.g. land parcel) appears larger
on a large scale map - scale can never be constant everywhere on a map
cos of map projection - problem is worst for small scale maps certain
projections (e.g. mercator) - can be true from a single point to everywhere
- can be true along a line , or a set of lines
- on large scale maps, adjustments often made to
achieve close to true scale everywhere - scale representation
- Verbal (good for interpretation.) 0ne inch each
equals one statute mile - representative fraction (RF) 1 63,360(good for
measurement)(smaller fractionsmaller scale - 12,000,000 smaller than 12,000)
- scale bar(good if enlarged/reduced)
use them all on a map!
4Scale Examples
- Common Scales
- 1200 (116.8ft)
- 12,000 (156 yards 1cm20m)
- 120,000 (5cm1km)
- 124,000 (12,000ft)
- 125,000 (1cm.5km)
- 150,000 (2cm1km)
- 162,500 (1.6cm1km 1.986mi)
- 163,360 (11mile 1cm.634km)
- 1100,000 (11.58mi 1cm1km)
- 1500,000 (17.9mi 1cm5km)
- 11,000,000(115.8mi 1cm10km)
- 17,500,000(1118mi) 1cm750km)
- Large versus Small
- large above 112,500
- medium 113,000 - 1126,720
- small 1130,000 - 11,000,000
- very small below 11,000,000
- ( really, relative to whats available for a
given area Maling 1989) - Map sheet examples
- 124,000 7.5 minute USGS Quads
- (17 by 22 inches 6 by 8 miles)
- 17,500,000 US wall map
- (26 by 16 inches)
- 120,000,000 US 8.5 X 11
5Scale and GIS Systems
- in theory, a GIS is scale independent since
output can be produced at any scale - in practice, there is an implicit range of scales
or maximum scale for anticipated output since
scale also affects other issues such as - what features to show
- manholes only on large scale maps
- how features will be represented
- manhole a polygon at 150 cities a point at
11,000,000 - decisions regarding accuracy and precision of
data - the larger the scale, the more critical is
accuracy
6Precision or Resolution its not the same as
scale or accuracy!
- Precision the exactness of measurement or
description - the size of the smallest feature which can be
displayed, recognized, or described - Can apply to space, time (e.g. daily versus
annual), or attribute (douglas fir v. conifer) - for raster data, it is the size of the pixel
(resolution) - e.g. for NTGISC digital orthos is 1.6ft (half
meter) - raster data can be resampled by combining
adjacent cells - this decreases resolution but saves storage
- eg 1.6 ft to 3.2 ft (1/4 storage) to 6.4 ft
(1/16 storage) - resolution and scale
- generally, increasing to larger scale allows
features to be observed better and requires
higher resolution - but, because of the human eyes ability to
recognize patterns, features in a lower
resolution data set can sometimes be observed
better by decreasing the scale (6.4 ft
resolution shown at 1400 rather than 1200) - resolution and positional accuracy
- you can see a feature (resolution), but it may
not be in the right place (accuracy) - higher accuracy generally costs much more to
obtain than higher resolution - accuracy cannot be greater (but may be much less)
than resolution (e.g. if pixel size is one meter,
then best accuracy possible is one meter)
7Accuracy
- Positional Accuracy (sometimes called
Quantitative accuracy) - Spatial
- horizontal accuracy distance from true location
- vertical accuracy difference from true height
- Temporal
- Difference from actual time and/or date
- Attribute (Thematic) Accuracy or Consistency
(Qualitative Accuracy)--the validity concept from
experimental design/stat. inf.) - a feature is what the GIS/map purports it to be
- a railroad is a railroad, and not a road
- A soil sample agrees with the type mapped
- Completeness--the reliability concept from
experimental design/stat. inf.) - Are all instances of a feature the GIS/map claims
to include, in fact, there? - Simply put, how much data is missing?
- Logical Consistency
- The presence of contradictory relationships in
the database - Some crimes recorded at place of occurrence,
others at place where report taken - Data for one country is for 2000, for another its
for 2001 - Annual data series not taken on same day/month
etc. (sometimes called lineage error) - Data uses different source or estimation
technique for different years (again, lineage)
8Sources of Error
- Sources
- Inherent instability of the phenomena itself
- E.g. Random variation of most phenomena (e.g.
leaf size) - Measurement
- E.g. surveyor or instrument error
- Model used to represent data
- E.g. choice of spheroid, or classification
systems - Data encoding and entry
- E.g. keying or digitizing errors
- Data processing
- E.g. single versus double precision, algorithms
used - Propagation or cascading from one data set to
another - E.g. using inaccurate layer as source for another
layer
- Example for Positional Accuracy
- choice of spheroid and datum
- choice of map projection and its parameters
- accuracy of measured locations (surveying) of
features on earth - media stability (stretching ,folding, wrinkling
of maps, photos) - human drafting, digitizing or interpretation
error - resolution /or accuracy of drafting/digitizing
equipment - registration accuracy of tics
- machine precision coordinate rounding error in
storage and manipulation - other unknown
9Measurement of Positional Accuracy
- usually measured by root mean square error the
square root of the average squared errors - Usually expressed as a probability that no more
than P of points will be further than S distance
from their true location. - Loosely we say that the rmse tells us how far
recorded points in the GIS are from their true
location on the ground, on average. - More correctly, based on the normal distribution
of errors, 68 of points will be rmse distance or
less from their true location, 95 will be no
more than twice this distance, providing the
errors are random and not systematic (i.e. the
mean of the errors is zero) - e.g. for NTGISC digital orthos RMSE is 3.2 feet
(one meter) - for USGS Digital Ortho Quads RMSE spec. is
approx. 33 feet or 10 meters (but in reality
much better) - -- with GPS, height is 2 or 3 times less
accurate in practice at high precisionthan
horizontal (officially the spec is 1.5, but data
collection errors affect vertical the most) -
10USA National Map Accuracy Standards
- established in 1941 by the US Bureau of the
Budget (now OMB) for use with US Geological
Survey maps (Maling, 1989, p. 146) - horizontal accuracy not more than 10 of tested,
well defined points shall be more than the
following distances from their true location - 162,500 1/50th of an inch (.02)
- 124,000 1/40th of an inch (amended to
1/50.02 in 1947) - 112,000 1/30 of an inch (.033)
- Thus, on maps with a scale of 163,360 (11
mile) 90 - of points should be within 105.6 feet (63360 X
.02)/12) of their true location. - on USGS quads with a scale of 124,000
(12,000ft) 90 of points should be within 40
feet (24,000 X .02)/12 of their true location. - on a map with a scale of 112,000 (11,000ft),
90 of points should be within 33 feet (1,000 X
.033), approx. 10 meters - gives rise to the loose, but often used,
statement that the NMAS is 10 meters - Inadequate for the computer age
- how many points? how select?
- how determine their true location
- what about qualitative accuracy and completeness?
- Unfortunately, the new standard doesnt
address all these issues either
11USA National Standard for Spatial Data Accuracy
- Geospatial Positioning Accuracy Standard
(FGDC-STD-007) - Part 3, National Standard for Spatial Data
Accuracy FGDC-STD-007.3-1998 - specifies a statistic and testing methodology
for positional (horizontal and vertical) accuracy
of maps and digital data - replacement for National Map Accuracy Standard
of 1941/47 - no single threshold metric to achieve (as with
old Standard), but users encouraged to establish
thresholds for specific applications - accuracy reported in ground units
- testing method compares data set point coordinate
values with coordinate values from a higher
accuracy source for readily visible or
recoverable ground points - altho. uses points, principles apply to all
geospatial data including point, vector and
raster objects - other standards for data content will adopt NSSDA
for particular spatial objects - copies of the standard available at
http//www.fgdc.gov - Accuracy Standard has 7 parts, of which parts 4-7
apply to specific data types
12Currency Is my data up-to-date?
- data is always relative to a specific point in
time, which must be documented. - there are important applications for historical
data (e.g. analyzing trends), so dont
necessarily trash old data - current data requires a specific plan for
on-going maintenance - may be continuous, or at pre-defined points in
time. - otherwise, data becomes outdated very quickly
- currency is not really an independent quality
dimension it is simply a factor contributing to
lack of accuracy regarding - consistency some GIS features do not match
those in the real world today - completeness some real world features are
missing from the GIS database
Many organizations spend substantial amounts
acquiring a data set without giving any thought
to how it will be maintained.
13Summary Resolution, Scale, Accuracy Storage
14Summary Minimum Documentation Requirements
- geodetic datum name (e.g NAD27)--which implies
- ellipsoid/spheroid name (earth model) e.g. Clark
1866 - point of origin (ties ellipsoid to earth) e.g
Meades Ranch - required for all GIS data bases and maps
- projection name and its parameters and its
measurement units - Required for all maps since 2-D by nature
- Required for GIS if data is in X-Y projected
form - Source information
- accuracy standard(s) to which built
- author/publisher/creator name and/or data source
- date(s) of data collection/update, and of map/gis
creation - Cartographers demand each map document the above
also - north arrow, map scale
- graticule indication
- at least four latitude/longitude tic marks, with
values in degrees - at least four X-Y tic marks, with values and
units measurement (feet, meters, etc.)
If GIS data in lat/long, must know datum. If GIS
data in XY, must know datum and projection info)