Uncertainty, error, and quality control - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Uncertainty, error, and quality control

Description:

... by the extent of a common characteristic, such as climate, landform, or soil type. ... experts may disagree on what combination of characteristics defines a zone, ... – PowerPoint PPT presentation

Number of Views:219
Avg rating:3.0/5.0
Slides: 32
Provided by: hongj
Category:

less

Transcript and Presenter's Notes

Title: Uncertainty, error, and quality control


1
Uncertainty, error,and quality control
  • Lecture 13
  • Feb 25, 2004

2
GIS is not perfect
  • A GIS cannot perfectly represent the world for
    many reasons, including
  • The world is too complex and detailed.
  • The data structures or models (raster, vector, or
    TIN) used by a GIS to represent the world are not
    discriminating or flexible enough.
  • We make decisions (how to categorize data, how to
    define zones) that are not always fully informed
    or justified.
  • It is impossible to make a perfect representation
    of the world, so uncertainty is inevitable
  • Uncertainty degrades the quality of a spatial
    representation

3
A Conceptual View of Uncertainty
Real World
Conception
Measurement Representation
Data conversion and Analysis
error propagation
Result
4
1. Uncertainty in the conception of geographic
phenomena
  • Many spatial objects are not well defined or
    their definition is to some extent arbitrary, so
    that people can reasonably disagree about whether
    a particular object is x or not. There are at
    least four types of conceptual uncertainty
  • Spatial uncertainty
  • Vagueness
  • Ambiguity
  • Regionalization problems

5
  • Spatial uncertaintySpatial uncertainty occurs
    when objects do not have a discrete, well defined
    extent. They may have indistinct boundaries
    (where exactly does a wetland end?), they may
    have impacts that extend beyond their boundaries
    (should an oil spill be defined by the dispersion
    of pollutants or by the area of environmental
    damage?), or they may simply be statistical
    entities. The attributes ascribed to spatial
    objects may also be subjectivefor example, the
    spatial distributions of poverty and biodiversity
    depend on human interpretations of what these
    things mean.
  • Vagueness (obscureness)
  • Vagueness occurs when the criteria that
    define an object as x are not explicit or
    rigorous. For example, In a land cover analysis,
    how many oaks (or what proportion of oaks) must
    be found in a tract of land to qualify it as oak
    woodland? What incidence of crime (or resident
    criminals) defines a high crime neighborhood?

6
  • AmbiguityAmbiguity occurs when y is used as a
    substitute, or indicator, for x because x is not
    available. The link between direct indicators and
    the phenomena for which they substitute is
    straightforward and fairly unambiguous. Soil
    nutrient levels (y) are a direct indicator of
    crop yields (x). Indirect indicators tend to be
    more ambiguous and opaque. Wetlands (y) are an
    indirect indicator of animal species diversity
    (x). Of course, indicators are not simply direct
    or indirect they occupy a continuum. The more
    indirect they are, the greater the ambiguity and
    the less certain it is that an object being
    approximated using y really is x.
  • Regionalization problemsRegional geography is
    largely founded on the creation of a mosaic of
    zones that make it easy to portray spatial data
    distributions. A uniform zone is defined by the
    extent of a common characteristic, such as
    climate, landform, or soil type. Functional zones
    are areas that delimit the extent of influence of
    a facility or featurefor example, how far people
    travel to a shopping center or the geographic
    extent of support for a football team.
    Regionalization problems occur because zones are
    artificial. In the development of climate zones,
    for instance, experts may disagree on what
    combination of characteristics defines a zone,
    how these characteristics should be weighted to
    create a composite indicator, and what the
    minimum size threshold for a zone is. This should
    not be surprising after all, spatial
    distributions tend to change gradually, while
    zones imply that there are sharp boundaries
    between them.

7
2.1 Uncertainty in the measurement of geographic
phenomena
  • Error occurs in physical measurement of objects.
    This error creates further uncertainty about the
    true nature of spatial objects.
  • Physical measurement error
  • Digitizing error
  • Error caused by combining data sets with
    different lineages

8
  • Physical measurement errorInstruments and
    procedures used to make physical measurements are
    not perfectly accurate. For example, a survey of
    Mount Everest might find its height to be 8,850
    meters, with an accuracy of plus or minus 5
    meters. In addition, the earth is not a
    perfectly stable platform from which to make
    measurements. Seismic motion, continental drift,
    and the wobbling of the earth's axis cause
    physical measurements to be inexact. (GPSing
    error, remote sensing error)
  • Digitizing errorA great deal of spatial data has
    been digitized from paper maps. Digitizing, or
    the electronic tracing of paper maps, is prone to
    human error. Lines may be drawn too far, not far
    enough, or missed entirely. Errors caused by
    digitizing mistakes can be partially, but not
    completely, fixed by software. Additional error
    occurs because adjacent data digitized from
    different maps may not align correctly. This
    problem can also be partially corrected through a
    software technique called rubbersheeting.
  • Error caused by combining data sets with
    different lineagesData sets produced by
    different agencies or vendors may not match
    because different processes were used to capture
    or automate the data. For example, buildings in
    one data set may appear on the opposite side of
    the street in another data set. Error may also be
    caused by combining sample and population data or
    by using sample estimates that are not robust at
    fine scales. "Lifestyle" data are derived from
    shopping surveys and provide business and service
    planners with up-to-date socioeconomic data not
    found in traditional data sources like the
    census. Yet the methods by which lifestyle data
    are gathered and aggregated to zones or are
    compared to census data may not be scientifically
    rigorous

9
Accuracy and Precision
  • Precision a measure of repeatability
  • Accuracy a measure of reliability, is the
    difference between reality and our measurement

3
1
4
2
Shooting a target
10
Digitizing Error
Any digitized map requires Considerable
post-processing Check for missing features
Connect lines Remove spurious polygons Some of
these steps can be automated
11
2.2 Uncertainty in the representation of
geographic phenomena
  • Representation is closely related to
    measurement. Representation is not just an input
    to analysis, but sometimes also the outcome of
    it. For this reason, we consider representation
    separately from measurement. the world is
    infinitely complex, but computer system are
    finite. representation is all about the choices
    that are made in capturing knowledge about the
    world
  • Uncertainty in earth model ellipsoid models,
    datum, projection types
  • Uncertainty in the raster data model (structure)
  • Uncertainty in the vector data model (structure)

12
  • Uncertainty in the raster data structureThe
    raster structure partitions space into square
    cells of equal size (also called pixels). Spatial
    objects x, y, and z emerge from cell
    classification, in which Cell A1 is classified as
    x, Cell A2 as y, Cell A3 as z, and so on, until
    all cells are evaluated. A spatial object x can
    be defined as a set of contiguous cells
    classified as x. Commonly, a cell is not purely
    one thing or another, but might contain some x,
    some y, and maybe a bit of z within its area.
    These impure cells are termed "mixels." Because a
    cell can hold only one value, a mixel must be
    classified as if it were all one thing or
    another. Therefore, the raster structure may
    distort the shape of spatial objects.
  • Uncertainty in the vector data structureSocioecon
    omic datafacts about people, houses, and
    householdsare often best represented as points.
    For various reasons (to protect privacy, to limit
    data volume), data are usually aggregated and
    reported at a zonal level, such as census tracts
    or ZIP Codes. This distorts the data in two ways
    first, it gives them a spatially inappropriate
    representation (polygons instead of points)
    second, it forces the data into zones whose
    boundaries may not respect natural distribution
    patterns.

13
Error in raster
  • raster
  • - because of the distortions due to flattening,
    cells in a raster can never be perfectly equal in
    size on the Earths surface.
  • - when information is represented in raster form
    all detail about variation within cells is lost,
    and instead the cell is given a single value.
    largest share, central point (f.g. USGS DEM), and
    mean value (f.g. remote sensing imagery)

Largest share
8x(1/6)6x(5/6)6.33 8x(3/4)6x(1/4)7.5 8x(1/7)6
x(6/7)6.29
Central point
14
Map representation error
15
3. Uncertainty in the data conversion and
analysis of geographic phenomena
  • Uncertainties in data lead to uncertainties in
    the results of analysis Data conversion and
    spatial analysis methods can create further
    uncertainty
  • Data conversion error
  • Georeferencing and resampling (nearest, bilinear,
    cubic)
  • Projection and datum conversions
  • The ecological fallacy
  • The modifiable areal unit problem (MAUP)
  • Classification errors

16
  • The ecological fallacyThe ecological fallacy is
    the mistake of assuming that an overall
    characteristic of a zone is also a characteristic
    of any location or individual within the zone.
  • The Modifiable Areal Unit Problem (MAUP)The
    results of data analysis are influenced by the
    number and sizes of the zones used to organize
    the data. The Modifiable Area Unit Problem has at
    least three aspects
  • The number, sizes, and shapes of zones affect the
    results of analysis.
  • The number of ways in which fine-scale zones can
    be aggregated into larger units is often great.
  • There are usually no objective criteria for
    choosing one zoning scheme over another.
  • - An example of the influence of the number of
    zones on analysis is the 1950 study by Yule and
    Kendall which found that the correlation between
    wheat and potato yields in England changed from
    low to high as the data were grouped into fewer
    and fewer zones (starting with 48 and ending with
    2).
  • - An example of the influence of zone shape is
    gerrymandering, in which voting district
    boundaries are manipulated in order to engineer a
    desired election outcome.

17
(No Transcript)
18
zone shape change
19
Classification error and quality check
20
Selecting ROIs
Alfalfa
Cotton
Grass
Fallow
21
Background ETM, 7/15/01 Top image IKONOS,
Oct, 2000 Classification Result
22
Confusion Matrix
Classification results
Ground truth
23
Bases of Confusion Matrix
  • Producer accuracy is a measure indicating the
    probability that the classifier has labeled an
    image pixel into Class A given that the ground
    truth is Class A.
  • User accuracy is a measure indicating the
    probability that a pixel is Class A given that
    the classifier has labeled the pixel into Class A
  • Overall accuracy is total classification
    accuracy.
  • Kappa index (another parameter for overall
    accuracy) is a more useful index for evaluating
    accuracy.
  • Errors of commission represent pixels that belong
    to another class but are labeled as belonging to
    the class.
  • Errors of omission represent pixels that belong
    to the ground truth class but that the
    classification technique has failed to classify
    them into the proper class.

24
4. Error Propagation
  • the errors in the input will propagate to the
    output of the operation
  • error propagation measures the impacts of error
    (uncertainty) in data on the results of GIS
    operations

Real World
Conception
Measurement Representation
Data conversion and Analysis
error propagation
Result
25
Quantitative error propagation
The output uncertainty is a function of the input
errors (uncertainties), Assuming that errors are
independent and random in variables of x1, x2, ,
xn
1. One variable
p
p
26
Quantitative error propagation cont.
Variables in additive and subtractive relations
2. Multiple variables Variables in additive or
subtractive relations
3. Multiple variables Variables in power law
relations
27
4. Multiple variables Variables in multiply or
divide relations
5. Multiple variables The errors of variables
have correlation
If you are interested in knowing more about this
topic error propagation, please see this
document http//www.uottawa.ca/academic/arts/geo
graphie/lpcweb/web6102/download/error_prop.pdf
28
Two Examples
  • If we use 3 scales of maps 130,000 (map1),
    150,000 (map2), and 1250,000 (map3) used for a
    final map or analysis. If we assume this is
    additive process, so the map function u can be
    like this

7.5 m
12.5 m
62.5 m
64.2 m
29
Correlated
Uncorrelated
30
Living with uncertainty
  • uncertainty is inevitable and easier to find,
  • use metadata to document the uncertainty
  • sensitivity analysis to find the impacts of input
    uncertainty on output,
  • rely on multiple sources of data,
  • be honest and informative in reporting the
    results of GIS analysis.
  • US Federal Geographic Data Committee lists five
    components of data quality attribute accuracy,
    positional accuracy, logical consistency,
    completeness, and lineage (details see
    www.fgdc.gov)

31
Main references
  • Paul A. Longley et al., 2001, Geographic
    Information Systems and Science, John Wiley
    Sons press.
  • ESRI www.esri.com
  • http//www.uottawa.ca/academic/arts/geographie/lpc
    web/web6102/download/error_prop.pdf
  • FGDC www.fgdc.gov
Write a Comment
User Comments (0)
About PowerShow.com