Data Quality - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Data Quality

Description:

It's not the things you don't know that matter, it's the things you know that ... problem is worst for small scale maps & certain projections (e.g. mercator) ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 15
Provided by: ronb159
Category:
Tags: data | mercator | quality

less

Transcript and Presenter's Notes

Title: Data Quality


1
Data Quality
  • GiGo garbage in, garbage out
  • Cos its in the computer, dont mean its right

Its not the things you dont know that matter,
its the things you know that arent
so. Will Rogers Famous Okie GI
specialist
2
Data Quality How good is your data?
  • Scale
  • ratio of distance on a map to the equivalent
    distance on the earth's surface
  • Primarily an output issue
  • Precision or Resolution
  • the exactness of measurement or description
  • Relates to both input and output
  • Accuracy
  • the degree of correspondence between data and the
    real world
  • Relates primarily to input
  • Currency
  • the degree to which data represents the world at
    the present moment in time
  • Documentation or Metadata
  • data about data
  • Recording all of the above
  • Standards
  • Common or agreed-to ways of doing things
  • Data built to standards is more valuable since
    its more easily shareable

3
Scale
  • ratio of distance on a map, to the equivalent
    distance on the earth's surface.
  • Large scale --gtlarge detail, small area covered
    (1200 or 12,400)
  • Small scale --gtsmall detail, large area
    (1250,000)
  • A given object (e.g. land parcel) appears larger
    on a large scale map
  • scale can never be constant everywhere on a map
    cos of map projection
  • problem is worst for small scale maps certain
    projections (e.g. mercator)
  • can be true from a single point to everywhere
  • can be true along a line , or a set of lines
  • on large scale maps, adjustments often made to
    achieve close to true scale everywhere
  • scale representation
  • Verbal (good for interpretation.) 0ne inch each
    equals one statute mile
  • representative fraction (RF) 1 63,360(good for
    measurement)(smaller fractionsmaller scale
  • 12,000,000 smaller than 12,000)
  • scale bar(good if enlarged/reduced)

use them all on a map!
4
Scale Examples
  • Common Scales
  • 1200 (116.8ft)
  • 12,000 (156 yards 1cm20m)
  • 120,000 (5cm1km)
  • 124,000 (12,000ft)
  • 125,000 (1cm.5km)
  • 150,000 (2cm1km)
  • 162,500 (1.6cm1km 1.986mi)
  • 163,360 (11mile 1cm.634km)
  • 1100,000 (11.58mi 1cm1km)
  • 1500,000 (17.9mi 1cm5km)
  • 11,000,000(115.8mi 1cm10km)
  • 17,500,000(1118mi) 1cm750km)
  • Large versus Small
  • large above 112,500
  • medium 113,000 - 1126,720
  • small 1130,000 - 11,000,000
  • very small below 11,000,000
  • ( really, relative to whats available for a
    given area Maling 1989)
  • Map sheet examples
  • 124,000 7.5 minute USGS Quads
  • (17 by 22 inches 6 by 8 miles)
  • 17,500,000 US wall map
  • (26 by 16 inches)
  • 120,000,000 US 8.5 X 11

5
Scale and GIS Systems
  • in theory, a GIS is scale independent since
    output can be produced at any scale
  • in practice, there is an implicit range of scales
    or maximum scale for anticipated output since
    scale also affects other issues such as
  • what features to show
  • manholes only on large scale maps
  • how features will be represented
  • manhole a polygon at 150 cities a point at
    11,000,000
  • decisions regarding accuracy and precision of
    data
  • the larger the scale, the more critical is
    accuracy

6
Precision or Resolution its not the same as
scale or accuracy!
  • Precision the exactness of measurement or
    description
  • the size of the smallest feature which can be
    displayed, recognized, or described
  • Can apply to space, time (e.g. daily versus
    annual), or attribute (douglas fir v. conifer)
  • for raster data, it is the size of the pixel
    (resolution)
  • e.g. for NTGISC digital orthos is 1.6ft (half
    meter)
  • raster data can be resampled by combining
    adjacent cells
  • this decreases resolution but saves storage
  • eg 1.6 ft to 3.2 ft (1/4 storage) to 6.4 ft
    (1/16 storage)
  • resolution and scale
  • generally, increasing to larger scale allows
    features to be observed better and requires
    higher resolution
  • but, because of the human eyes ability to
    recognize patterns, features in a lower
    resolution data set can sometimes be observed
    better by decreasing the scale (6.4 ft
    resolution shown at 1400 rather than 1200)
  • resolution and positional accuracy
  • you can see a feature (resolution), but it may
    not be in the right place (accuracy)
  • higher accuracy generally costs much more to
    obtain than higher resolution
  • accuracy cannot be greater (but may be much less)
    than resolution (e.g. if pixel size is one meter,
    then best accuracy possible is one meter)

7
Accuracy
  • Positional Accuracy (sometimes called
    Quantitative accuracy)
  • Spatial
  • horizontal accuracy distance from true location
  • vertical accuracy difference from true height
  • Temporal
  • Difference from actual time and/or date
  • Attribute (Thematic) Accuracy or Consistency
    (Qualitative Accuracy)--the validity concept from
    experimental design/stat. inf.)
  • a feature is what the GIS/map purports it to be
  • a railroad is a railroad, and not a road
  • A soil sample agrees with the type mapped
  • Completeness--the reliability concept from
    experimental design/stat. inf.)
  • Are all instances of a feature the GIS/map claims
    to include, in fact, there?
  • Simply put, how much data is missing?
  • Logical Consistency
  • The presence of contradictory relationships in
    the database
  • Some crimes recorded at place of occurrence,
    others at place where report taken
  • Data for one country is for 2000, for another its
    for 2001
  • Annual data series not taken on same day/month
    etc. (sometimes called lineage error)
  • Data uses different source or estimation
    technique for different years (again, lineage)

8
Sources of Error
  • Sources
  • Inherent instability of the phenomena itself
  • E.g. Random variation of most phenomena (e.g.
    leaf size)
  • Measurement
  • E.g. surveyor or instrument error
  • Model used to represent data
  • E.g. choice of spheroid, or classification
    systems
  • Data encoding and entry
  • E.g. keying or digitizing errors
  • Data processing
  • E.g. single versus double precision, algorithms
    used
  • Propagation or cascading from one data set to
    another
  • E.g. using inaccurate layer as source for another
    layer
  • Example for Positional Accuracy
  • choice of spheroid and datum
  • choice of map projection and its parameters
  • accuracy of measured locations (surveying) of
    features on earth
  • media stability (stretching ,folding, wrinkling
    of maps, photos)
  • human drafting, digitizing or interpretation
    error
  • resolution /or accuracy of drafting/digitizing
    equipment
  • registration accuracy of tics
  • machine precision coordinate rounding error in
    storage and manipulation
  • other unknown

9
Measurement of Positional Accuracy
  • usually measured by root mean square error the
    square root of the average squared errors
  • Usually expressed as a probability that no more
    than P of points will be further than S distance
    from their true location.
  • Loosely we say that the rmse tells us how far
    recorded points in the GIS are from their true
    location on the ground, on average.
  • More correctly, based on the normal distribution
    of errors, 68 of points will be rmse distance or
    less from their true location, 95 will be no
    more than twice this distance, providing the
    errors are random and not systematic (i.e. the
    mean of the errors is zero)
  • e.g. for NTGISC digital orthos RMSE is 3.2 feet
    (one meter)
  • for USGS Digital Ortho Quads RMSE spec. is
    approx. 33 feet or 10 meters (but in reality
    much better)
  • -- with GPS, height is 2 or 3 times less
    accurate in practice at high precisionthan
    horizontal (officially the spec is 1.5, but data
    collection errors affect vertical the most)

10
USA National Map Accuracy Standards
  • established in 1941 by the US Bureau of the
    Budget (now OMB) for use with US Geological
    Survey maps (Maling, 1989, p. 146)
  • horizontal accuracy not more than 10 of tested,
    well defined points shall be more than the
    following distances from their true location
  • 162,500 1/50th of an inch (.02)
  • 124,000 1/40th of an inch (amended to
    1/50.02 in 1947)
  • 112,000 1/30 of an inch (.033)
  • Thus, on maps with a scale of 163,360 (11
    mile) 90
  • of points should be within 105.6 feet (63360 X
    .02)/12) of their true location.
  • on USGS quads with a scale of 124,000
    (12,000ft) 90 of points should be within 40
    feet (24,000 X .02)/12 of their true location.
  • on a map with a scale of 112,000 (11,000ft),
    90 of points should be within 33 feet (1,000 X
    .033), approx. 10 meters
  • gives rise to the loose, but often used,
    statement that the NMAS is 10 meters
  • Inadequate for the computer age
  • how many points? how select?
  • how determine their true location
  • what about qualitative accuracy and completeness?
  • Unfortunately, the new standard doesnt
    address all these issues either

11
USA National Standard for Spatial Data Accuracy
  • Geospatial Positioning Accuracy Standard
    (FGDC-STD-007)
  • Part 3, National Standard for Spatial Data
    Accuracy FGDC-STD-007.3-1998
  • specifies a statistic and testing methodology
    for positional (horizontal and vertical) accuracy
    of maps and digital data
  • replacement for National Map Accuracy Standard
    of 1941/47
  • no single threshold metric to achieve (as with
    old Standard), but users encouraged to establish
    thresholds for specific applications
  • accuracy reported in ground units
  • testing method compares data set point coordinate
    values with coordinate values from a higher
    accuracy source for readily visible or
    recoverable ground points
  • altho. uses points, principles apply to all
    geospatial data including point, vector and
    raster objects
  • other standards for data content will adopt NSSDA
    for particular spatial objects
  • copies of the standard available at
    http//www.fgdc.gov
  • Accuracy Standard has 7 parts, of which parts 4-7
    apply to specific data types

12
Currency Is my data up-to-date?
  • data is always relative to a specific point in
    time, which must be documented.
  • there are important applications for historical
    data (e.g. analyzing trends), so dont
    necessarily trash old data
  • current data requires a specific plan for
    on-going maintenance
  • may be continuous, or at pre-defined points in
    time.
  • otherwise, data becomes outdated very quickly
  • currency is not really an independent quality
    dimension it is simply a factor contributing to
    lack of accuracy regarding
  • consistency some GIS features do not match
    those in the real world today
  • completeness some real world features are
    missing from the GIS database

Many organizations spend substantial amounts
acquiring a data set without giving any thought
to how it will be maintained.
13
Summary Resolution, Scale, Accuracy Storage
14
Summary Minimum Documentation Requirements
  • geodetic datum name (e.g NAD27)--which implies
  • ellipsoid/spheroid name (earth model) e.g. Clark
    1866
  • point of origin (ties ellipsoid to earth) e.g
    Meades Ranch
  • required for all GIS data bases and maps
  • projection name and its parameters and its
    measurement units
  • Required for all maps since 2-D by nature
  • Required for GIS if data is in X-Y projected
    form
  • Source information
  • accuracy standard(s) to which built
  • author/publisher/creator name and/or data source
  • date(s) of data collection/update, and of map/gis
    creation
  • Cartographers demand each map document the above
    also
  • north arrow, map scale
  • graticule indication
  • at least four latitude/longitude tic marks, with
    values in degrees
  • at least four X-Y tic marks, with values and
    units measurement (feet, meters, etc.)

If GIS data in lat/long, must know datum. If GIS
data in XY, must know datum and projection info)
Write a Comment
User Comments (0)
About PowerShow.com