Title: OUTLINE
1(No Transcript)
2- OUTLINE
- Definitions of error
- Errors associated with log-log regressions
- Relationship between log errors and relative
errors - Empirical error statistics
- The lognormal assumption
- Predicted vs. measured chlorophyll
- Mistakes not to make
- Adjusting for the distribution of chlorophyll
3Error Definitions
The zero-th order definition of error is the
log-error d The OC4/OC3M polynomials
minimize the mean square d
4Error Definitions
Log error d
Relative error relerr or
percentage error relerr 100 is often
desired. The log error and relative error are
directly related
d
5For every point in NOMAD, you can calculate di
and relerri (i 1,,N).
di
6Characterize the distribution of log errors in
terms of their mean, standard deviation, and
root-mean-square.
Table 1. Error statistics for two chlorophyll algorithms. Table 1. Error statistics for two chlorophyll algorithms. Table 1. Error statistics for two chlorophyll algorithms. Table 1. Error statistics for two chlorophyll algorithms. Table 1. Error statistics for two chlorophyll algorithms.
algorithm N bias stdev RMSE
OC4.v4 2208 -0.047 0.252 0.256
OC4.v5 2208 0.000 0.245 0.245
The histogram of di is symmetric and
normally distributed.
7(No Transcript)
8In the same manner that you derived statistics of
di, you can also derive the statistics of
relerri. Or, if the di errors are normally
distributed, you can derive the statistics of the
relative error from the log error statistics
(see text).
Table 2. Statistics of the percentage errors associated with the chlorophyll algorithms relative to the NOMAD data. Table 2. Statistics of the percentage errors associated with the chlorophyll algorithms relative to the NOMAD data. Table 2. Statistics of the percentage errors associated with the chlorophyll algorithms relative to the NOMAD data. Table 2. Statistics of the percentage errors associated with the chlorophyll algorithms relative to the NOMAD data. Table 2. Statistics of the percentage errors associated with the chlorophyll algorithms relative to the NOMAD data.
Empirical Empirical Lognormal Lognormal
relerr () OC4.v4 OC4.v5 OC4.v4 OC4.v5
mean 6 17 6 17
median -7 4 -10 0
std dev 66 68 67 72
9- OUTLINE
- Definitions of error
- Errors associated with log-log regressions
- Relationship between log errors and relative
errors - Empirical error statistics
- The lognormal assumption
- Predicted vs. measured chlorophyll
- Mistakes not to make
- Adjusting for the distribution of chlorophyll
- Other performance criteria
v
v
v
10In this plot, the measured and predicted
(algorithm) chlorophylls are both represented on
the vertical axis and plotted against the
variable X log (max Rrs/Rrs555). In this case,
the R2 statistic is a measure of the performance
of the algorithm.
11In the case where the predicted chlorophyll is
plotted against the measured chlorophyll, the R2
statistic is not a measure of the performance of
the algorithm. Such plots are useful to reveal
systematic errors in the algorithm, but the R2
statistic is misleading. Conditions for good
agreement are a slope of 1 and an intercept of 0.
n 2189 int -0.330 slope 0.614 R2 0.808 rms
0.395 bias -0.235
12Here is an example of a better performing
algorithm, even though the R2 statistic is lower
(0.795) compared with the one shown on the
previous slide (0.808).
13- OUTLINE
- Definitions of error
- Errors associated with log-log regressions
- Relationship between log errors and relative
errors - Empirical error statistics
- The lognormal assumption
- Predicted vs. measured chlorophyll
- Mistakes not to make
- Adjusting for the distribution of chlorophyll
- Other performance criteria
v
v
v
v
14Mistakes not to Make
- In a regression of predicted vs. measured, the R2
is not a measure of the performance of your
algorithm. - 100 (1- R2) is not the relative error in
chlorophyll. - 100 di is not the relative error in chlorophyll.
15Mistakes not to Make
- In a regression of predicted vs. measured, the R2
is not a measure of the performance of your
algorithm. - 100 (1- R2) is not the relative error in
chlorophyll. - 100 di is not the relative error in chlorophyll.
- Dont plot relative (or percentage) errors on
linear scales unless the errors are small.
Negative errors cannot be lt 100 (if gt 0),
but positive errors can be arbitrarily large.
This gives false impression that the errors are
highly skewed whereas a plot of the log errors is
often symmetric.
16- OUTLINE
- Definitions of error
- Errors associated with log-log regressions
- Relationship between log errors and relative
errors - Empirical error statistics
- The lognormal assumption
- Predicted vs. measured chlorophyll
- Mistakes not to make
- Adjusting for the distribution of chlorophyll
- Other performance criteria
v
v
v
v
v
17The performance measures derived from NOMAD or
any database are influenced by the distribution
of the stations in the database. Often theres
an over abundance of high chlorophyll stations.
Tables 1 2. Error statistics associated with the OC4 chlorophyll algorithms based on the NOMAD data. Tables 1 2. Error statistics associated with the OC4 chlorophyll algorithms based on the NOMAD data. Tables 1 2. Error statistics associated with the OC4 chlorophyll algorithms based on the NOMAD data. Tables 1 2. Error statistics associated with the OC4 chlorophyll algorithms based on the NOMAD data. Tables 1 2. Error statistics associated with the OC4 chlorophyll algorithms based on the NOMAD data.
Empirical Empirical Lognormal Lognormal
OC4.v4 OC4.v5 OC4.v4 OC4.v5
RMSE 0.256 0.245
mean 6 17 6 17
std dev 66 68 67 72
18Distribution of chlorophyll in the SeaWiFS
climatology (1997-2005) (red) compared with that
in NOMAD (blue).
Chlorophyll, mg m-3
19Log errors (di) for OC4.v4 vs. measured
chlorophyll. The red curve is the SeaWiFS
distribution shown on the previous slide.
20Blue and green cumulative RMSEs for all stations
having chlorophyll lt In situ Chl Brown
cumulative distribution of the SeaWiFS
chlorophyll climatology (1997-2005).
21Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds. Table 3. Cumulative error statistics of the chlorophyll algorithms relative to the NOMAD data below thresholds.
Chl lt 0.2 mg m-3 (65 of ocean) Chl lt 0.2 mg m-3 (65 of ocean) Chl lt 1.0 mg m-3 (93 of ocean) Chl lt 1.0 mg m-3 (93 of ocean) All NOMAD data All NOMAD data
OC4.v4 OC4.v5 OC4.v4 OC4.v5 OC4.v4 OC4.v5
RMSE 0.149 0.166 0.201 0.206 0.256 0.245
mean 10 21 11 28 6 17
std dev 42 50 60 66 66 68
22One way to adjust for differences in the
distributions of the data (vs. real world) is to
bin the data and characterize errors within bins.
Then weight the statistics according to the
frequency of the global real-world distribution.
23- OUTLINE
- Definitions of error
- Errors associated with log-log regressions
- Relationship between log errors and relative
errors - Empirical error statistics
- The lognormal assumption
- Predicted vs. measured chlorophyll
- Mistakes not to make
- Adjusting for the distribution of chlorophyll
- Other performance criteria
v
v
v
v
v
v
24Other performance criteria
- How quick is the algorithm when applied to a
satellite image? - How often does the algorithm fail to converge to
a solution? - How sensitive is the algorithm to errors in the
water-leaving radiance (atmospheric correction)? - Others?
25Conclusions / Comments
- The operational chlorophyll algorithms used by
SeaWiFS (OC4) and MODIS (OC3M) are not accurate
to within 35. (This is a myth) - The log error is the basic
measure of performance. Its statistics can easily
be converted to relative error statistics. - The OC4 and OC3M algorithms have a 60-70
uncertainty. - Given that chlorophyll varies by 4 orders of
magnitude (10000) globally, that aint bad!