Title: Image Classification_ Accuracy Assessment
1Image Classification Accuracy Assessment
Reorganized By Jwan M Aldoski
Department of Civil Engineering , Faculty of
Engineering, Universiti Putra Malaysia, 43400
UPM Serdang, Selangor Darul Ehsan. Malaysia.
2Where in the World?
3Learning objectives
- Remote sensing science concepts
- Rationale and technique for post-classification
smoothing - Errors of omission vs. commission
- Accuracy assessment
- Sampling methods
- Measures
- Fuzzy accuracy assessment
- Math Concepts
- Calculating accuracy measures overall accuracy,
producers accuracy and users accuracy and kappa
coefficient. - Skills
- Interpreting Contingency matrix and Accuracy
assessment measures
4Post-classification smoothing
- Most classifications have a problem with salt
and pepper, i.e., single or small groups of
mis-classified pixels, as they are point
operations that operate on each pixel independent
of its neighbors - Salt and pepper may be real. The decision on
whether to filter/eliminate depends on the choice
of the minimum mapping unit does it equal
single pixel or an aggregation - Majority filtering replaces central pixel with
the majority class in a specified neighborhood (3
x 3 window) con alters edges - Eliminate clumps like pixels and replaces
clumps under size threshold with majority class
in local neighborhood pro doesnt alter edges
5Example Majority filtering
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 8 2 6 6 2 2 2 2 2
6 6 2 6 2 6 8
2 6
3x3 window
Class 6 majority in window
Example from ERDAS IMAGINE Field Guide, 5th ed.
6Example reduce single pixel salt and pepper
6 6 6 6 6 2 6 6 6 6 2 2 6 6 6 2 2 2 6 6 2 2 2 2
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 8 2 6 6 2 2 2 2 2
7Example altered edge
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 8 2 6 6 2 2 2 2 2
6 6 6 6 6 2 6 6 6 6 2 2 6 6 6 2 2 2 6 6 2 2 2 2
8Example Majority filtering
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 8 2 6 6 2 2 2 2 2
6 6 2 6 2 6 8
2 6
3x3 window
Class 6 majority in window
Example from ERDAS IMAGINE Field Guide, 5th ed.
9Example ERDAS Eliminate no altered edge
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 8 2 6 6 2 2 2 2 2
6 6 6 6 6 2 6 6 2 6 2 6 2 6 6 2 2 2 6 6 2 2 2 2
Small clump eliminated
10Accuracy Assessment
- Always want to assess the accuracy of the final
thematic map! How good is it? - Various techniques to assess the accuracy of
the classified output by comparing the true
identity of land cover derived from reference
data (observed) vs. the classified (predicted)
for a random sample of pixels - The accuracy assessment is the means to
communicate to the user of the map and should be
included in the metadata documentation
11Accuracy Assessment
- R.S. classification accuracy usually assessed and
communicated through a contingency table,
sometimes referred to as a confusion matrix - Contingency table m x m matrix where m of
land cover classes - Columns usually represent the reference data
- Rows usually represent the remote sensed
classification results (i.e. thematic or
information classes)
12Accuracy Assessment Contingency Matrix
13Accuracy Assessment
- Sampling Approaches to reduce analyst bias
- simple random sampling every pixel has equal
chance - stratified random sampling of points will be
stratified to the distribution of thematic layer
classes (larger classes more points) - equalized random sampling each class will have
equal number of random points
- Sample size at least 30 samples per land cover
14How good is good?
- How accurate should the classified map be?
- General rule of thumb is 85 accuracy
- Really depends on how much risk you are willing
to accept if the map is wrong - Are you interested in more in the overall
accuracy of the final map or in quantifying the
ability to accurately identify and map individual
classes - Which is more acceptable overestimation or
15How good is good? Example
- USGS_NPS National Vegetation classification
standard - Horizontal positional locations meet National Map
Accuracy standards - Thematic accuracy gt80 per class
- Minimum Mapping Unit of 0.5 ha
- http//biology.usgs.gov/npsveg/aa/indexdoc.html
16A whole set of field reference point can be
developed using some sort of random allocation
but due to travel/access constraints, only a
subset of points is actually visited. Resulting
in a not truly random distribution.
17Accuracy Assessment Issues
- What constitutes reference data? - higher
spatial resolution imagery (with visual
interpretation) - ground truth GPSed
field plots - existing GIS maps - Reference data can be polygons or points
18Accuracy Assessment Issues
- Problem with mixed pixels possibility of
sampling only homogeneous regions (e.g., 3x3
window) but introduces a subtle bias - If smoothing was undertaken, then should assess
accuracy on that basis, i.e., at the scale of the
mmu - If a filter is used should be stated in metadata
- Ideally, of overall map that so qualifies
should be quantified, i.e., 75 of map is
composed of homogenous regions greater than 3x3
in size thus 75 of map assessed, 25 not
19Errors of Omission vs. Commission
- Error of Omission pixels in class 1 erroneously
assigned to class 2 from the class 1 perspective
these pixels should have been classified as
class1 but were omitted - Error of Commission pixels in class 2
erroneously assigned to class 1 from the class 1
perspective these pixels should not have been
classified as class but were included
20Errors of Omission vs. Commission from a Class2
Omission error pixels in Class2 erroneously
assigned to Class 1
Commission error pixels in Class1 erroneously
assigned to Class 2
of pixels
Class 1
Class 2
Digital Number
21Accuracy Assessment Measures
- Overall accuracy divide total correct (sum of
the major diagonal) by the total number of
sampled pixels can be misleading, should judge
individual categories also - Producers accuracy measure of omission error
total number of correct in a category divided by
the total in that category as derived from the
reference data measure of underestimation - Users accuracy measure of commission error
total number of correct in a category divided by
the total that were classified in that category
measure of overestimation
22Accuracy Assessment Contingency Matrix
Reference Data
23Accuracy Assessment Measures
24Accuracy Assessment Measures
25Accuracy Assessment Measures
26Accuracy Assessment Measures
- Kappa coefficient provides a difference
measurement between the observed agreement of two
maps and agreement that is contributed by chance
alone - A Kappa coefficient of 90 may be interpreted as
90 better classification than would be expected
by random assignment of classes - Whats a good Kappa? General range
K lt 0.4
poor 0.4 lt K lt 0.75 good K gt 0.75
excellent - Allows for statistical comparisons between
matrices (Z statistic) useful in comparing
different classification approaches to
objectively decide which gives best results - Alternative statistic Tau coefficient
27Kappa coefficient
Khat (n SUM Xii) - SUM (Xi Xi)
n2 - SUM (Xi Xi) where SUM sum across all
rows in matrix Xii diagonal Xi
marginal row total (row i) XI marginal
column total (column i) n of
observations Takes into account the off-diagonal
elements of the contingency matrix (errors of
omission and commission)
28Kappa coefficient Example
(SUM Xii) 308 279 372 26 10 93
176 48 1312 SUM (Xi Xi) (348315)
(295305) (379408) (2729) (1813)
(9997) (194189) (5155) Khat
1411(1312) 404,318
(1411)2 404,318 Khat 1851232 404,318
1,446,914 .912 1990921 404,318
29Accuracy Assessment Measures
30Case StudyMulti-scale segmentation approach to
mapping seagrass habitats using airborne digital
camera imaging
- Richard G. Lathrop¹, Scott Haag¹² , and Paul
Montesano¹. - ¹Center for Remote Sensing Spatial Analysis
- Rutgers University
- New Brunswick, NJ 08901-8551
- ²Jacques Cousteau National Estuarine Research
Reserve - 130 Great Bay Blvd
- Tuckerton NJ 08087
31Methodgt Field Surveys
- All transect endpoints and individual check
points were first mapped onscreen in the GIS. - Endpoints were then loaded into a GPS (-
3meters) for navigation on the water. - A total of 245 points were collected.
32Methodgt Field Surveys
- For each field reference point, the following
data was collected - GPS location (UTM)
- Time
- Date
- SAV species presence/dominance Zostera marina or
Ruppia maritima or macroalgae - Depth (meters)
- cover (10 intervals) determined by visual
estimation - Blade Height of 5 tallest seagrass blades
- Shoot density ( of shoots per 1/9 m2 quadrat
that was extracted and counted on the boat) - Distribution (patchy/uniform)
- Substrate (mud/sand)
- Additional Comments
33Resultsgt Accuracy Assessment
Reference Reference
GIS Map Seagrass Absent Seagrass Present Users Accuracy
Seagrass Absent 67 32 68
Seagrass Present 10 136 93
Producers Accuracy 87 81 83
- The resulting maps were compared with the 245
field reference points. - All 245 reference points were used to support the
interpretation in some fashion and so can not be
truly considered as completely independent
validation - The overall accuracy was 83 and Kappa statistic
was 56.5, which can be considered as a moderate
degree of agreement between the two data sets.
34Resultsgt Accuracy Assessment
Reference Reference
GIS Map Seagrass Absent Seagrass Present Users Accuray
Seagrass Absent 14 3 82
Seagrass Present 9 15 62
Producers Accuracy 61 83 71
- The resulting maps were also compared with an
independent set of 41 bottom sampling points
collected as part of a seagrass-sediment study
conducted during the summer of 2003 (Smith and
Friedman, 2004). - The overall accuracy was 70.7 and Kappa
statistic was 43, which can be considered as a
moderate degree of agreement between the two data
35SAV Accuracy Assessment Issues
- Matching spatial scale of field reference data
with scale of mapping - Ensuring comparison of apples to apples
- Spatial accuracy of ground truth point
locations - Temporal coincidence of ground truth and image
36Fuzzy Accuracy Assessment
- Real world is messy natural vegetation
communities are a continuum of states, often with
one grading into the next - R.S. classified maps generally break up land
cover/vegetation into discrete either/or classes - How to quantify this messy world? R.S. classified
maps have still have some error while still
having great utility - Fuzzy Accuracy Assessment doesnt quantify
errors as binary correct or incorrect but
attempts to evaluate the severity of the error
37Fuzzy Accuracy Assessment
- Fuzzy rating severity of error or conversely the
similarity between map classes is defined from a
user standpoint - Fuzzy rating can be developed quantitatively
based on the deviation from a defined class based
on a difference (i.e., within /- so many ) - Fuzzy set matrix fuzzy rating between each map
class and every other class is developed into a
fuzzy set matrix
For more info, see Gopal Woodcock, 1994.
38Fuzzy Accuracy Assessment
Level Description
5 Absolutely right Exact match
4 Good minor differences species dominance or composition is very similar
3 Acceptable Error mapped class does not match types have structural or ecological similarity or similar species
2 Understandable but wrong general similarity in structure but species/ecological conditions are not similar
1 Absolutely wrong no conditions or structural similarity
39Fuzzy Accuracy Assessment
- Each user could redefine the fuzzy set matrix on
an application-by-application basis to determine
what percentage of each map class is acceptable
and the magnitude of the errors within each map
class - Traditional map accuracy measures can be
calculated at different levels of error
Exact only level 5
(MAX) Acceptable
level 5, 4, 3 (RIGHT) - Example from USFS
Label Sites MAX(5 only) RIGHT (3,4,5) CON 88
71 81 82 93
40Fuzzy Accuracy Assessment example from USFS
Confusion Matrix based on Level 3,4,5 as Correct
- Label Sites CON MIX HDW SHB HEB NFO Total
- CON 88 X 0 1 5
0 0 6 - MIX 14 2 X 1 1
0 0 4 - HDW 6 1 1 X 0
0 0 2 - SHB 8 1 0 0 X
0 0 1 - HEB 1 0 0 0 1
X 0 1 - NFO 4 3 0 0 3
0 X 6 - Total 121 7 1 2 10
0 0 20
41Fuzzy Accuracy Assessment
- Ability to evaluate the magnitude or seriousness
of errors - Difference Table error within each map class
based on its magnitude with error magnitude
calculated by measuring the difference between
the fuzzy rating of each ground reference point
and the highest rank assigned to all other
possible map classes - All points that are Exact matches have
Difference values gt 0 all mismatches are
negative. Values -1 to 4 generally correspond to
correct map labels. Values of -2 to -4 correspond
to map errors with -4 representing a more serious
error than -1
42Fuzzy Accuracy Assessment Difference Table
example from USFS
Label Sites Mismatches Matches -4
-3 -2 -1 0 1 2 3
4 CON 88 4 2 0 11
3 0 12 23 33 Higher
positive values indicate that pure conditions are
well mapped while lower negative values show pure
conditions to be poorly mapped. Mixed or
transitional conditions, where a greater number
of class types are likely to be considered
acceptable, will fall more in the middle
43Fuzzy Accuracy Assessment
- Ambiguity Table tallies map classes that
characterize a reference site as well as the
actual map label - Useful in identifying subtle confusion between
map classes and may be useful in identifying
additional map classes to be considered - Example from USFS
88 X 11 6 15 0
0 32 15 out of 88 reference sites mapped
as conifer could have been equally well labeled
as shrub
44Alternative Ways of Quantifying Accuracy Ratio
- Method of statistically adjusting for over- or
underestimation - Randomly allocate test areas, determine area
from map and reference data - Ratio estimation uses the ratio of Reference/Map
area to adjust the mapped area estimate - Uses the estimate of the variance to develop
confidence levels for land cover type area
Shiver Border, 1996. Sampling Techniques for
Forest Resource Inventory, Wiley, NY, NY. Pp.
45Example NJ 2000 Land Use Update Comparison of
urban/transitional land use as determined by
photo-interpretation of 1m BW photography vs.
1 m BW 10 m SPOT PAN
46Above 1-to-1 line underestimate
Below 1-to-1 line overestimate
47Example NJ Land Use Change
Land Use Change Category Mapped Estimate (Acres) Statistically Adjusted Estimate with 95 CI (acres)
Urban 73,191 77,941 /- 17,922
Transitional/Barren 20,861 16,082 /- 7,053
Total Urban Barren 94,052 89,876 /- 16,528
48Case Study Sub-pixel Un-mixing
Urban/Suburban Mixed Pixels varying proportions
of developed surface, lawn and trees
30m TM pixel grid on IKONOS image
49Objective Sub-pixel Unmixing
False Color Composite Image R Forest G Lawn B
Impervious Surface Estimation
Woody Estimation
Grass Estimation
50Validation Data
- For homogenous 90mx90m test areas
- interpreted DOQ
- -DOQ pixels scaled to match TM
- For selected sub-areas
- IKONOS multi-spectral image
- 3 key indicator land use classified map
- impervious surface, lawn, and forest
- -IKONOS pixels scaled to match TM
51Egg Harbor City Egg Harbor City
Landsat SOM-LVQ
Landsat LMM
52Hammonton Hammonton
Landsat SOM-LVQ
53Root Mean Square Error 90m x 90m test plots
Impervious Grass Tree
IKONOS 7.4 8.2 7.1
LMM 10.8 13.6 20.7
SOM_LVQ 12.0 10.3 11.0
Egg Harbor City
Impervious Lawn Urban Tree
IKONOS 5.6 5.8 6.1
LMM 7.7 12.5 19.6
SOM_LVQ 6.8 6.0 5.0
54Hammonton Egg Harbor City
I m p e r v i o u s
G r a s s
T r e e s
SOM-LVQ vs. IKONOS Study sub-area comparison 3x3
TM pixel zonal
RMSE 13.5
RMSE 17.6
RMSE 15.0
RMSE 14.4
RMSE 21.6
RMSE 17.6
55Comparison of Landsat TM vs. NJDEP IS estimates
56Summary of Results
- Impervious surface estimation compares favorably
to DOQ and IKONOS - 10 to 15 for impervious surface
- 12 to 22 for grass and tree cover.
- Shows strong linear relationship with IKONOS in
impervious surface and grass estimation
- Greater variability in forest fraction due to
variability in canopy shadowing and understory
- 1 Majority filter remove salt pepper and/or
eliminate clump-like pixels. - 2 Sampling methods of reference points
- 3 Contingency matrix and Accuracy assessment
measures overall accuracy, producers accuracy
and users accuracy, and kappa coefficient. - 4 Fuzzy accuracy assessment Fuzzy rating, set
matrix, and ratio estimators.
58Thank you