Title: From: McCune, B.
1Tables, Figures, and Equations
From McCune, B. J. B. Grace. 2002. Analysis
of Ecological Communities. MjM Software Design,
Gleneden Beach, Oregon http//www.pcord.com
2Basic procedure 1. Calculate dissimilarity matrix
D.
3Basic procedure 1. Calculate dissimilarity matrix
D. 2. Assign sample units to a starting
configuration in the k-space (define an initial
X). Usually the starting locations (scores on
axes) are assigned with a random number
generator. Alternatively, the starting
configuration can be scores from another
ordination.
4Basic procedure 1. Calculate dissimilarity matrix
D. 2. Assign sample units to a starting
configuration in the k-space (define an initial
X). Usually the starting locations (scores on
axes) are assigned with a random number
generator. Alternatively, the starting
configuration can be scores from another
ordination. 3. Normalize X by subtracting the
axis means for each axis l and dividing by the
overall standard deviation of scores
54. Calculate D, containing Euclidean distances
between sample units in k-space. 5. Rank
elements of D in ascending order. 6. Put the
elements of D in the same order as D.
67. Calculate (containing elements ,
these being the result of replacing elements of D
which do not satisfy the monotonicity
constraint).
Figure 16.1. Moving a point to achieve
monotonicity in a plot of distance in the
original p-dimensional space against distances in
the k-dimensional ordination space.
78. Calculate raw stress, S
Note that S measures the departure from
monotonicity. If S 0, the relationship is
perfectly monotonic.
8Figure 16.2. Plot of distance in ordination
space (dij, horizontal axis) vs. dissimilarity in
original p-dimensional space (dij, vertical
axis), analogous to Fig. 16.1. Points are
labeled with the ranked distance (dissimilarity)
in the original space.
9Figure 16.2. Plot of distance in ordination
space (dij, horizontal axis) vs. dissimilarity in
original p-dimensional space (dij, vertical
axis), analogous to Fig. 16.1. Points are
labeled with the ranked distance (dissimilarity)
in the original space.
109. Because raw stress would be altered if the
configuration of points were magnified or
reduced, it is desirable to standardize
("normalize") stress. Kruskals stress
formula one
119. Because raw stress would be altered if the
configuration of points were magnified or
reduced, it is desirable to standardize
("normalize") stress. Kruskals stress
formula one
PC-ORD reports stress as SR, the square root of
the scaled stress, analogous to a standard
deviation, then multiplied by 100 to rescale the
result from zero to 100
12stress formula two is
The two formulas yield very similar
configurations.
1310. Now we try to minimize S by changing the
configuration of the sample units in the k-space.
For the method of steepest descent, we first
calculate the "negative gradient of stress" for
each point i. The gradient vector (a vector of
length k containing the movement of each point h
in each dimension l) is calculated by
i indexes one of the n points, j indexes another
of the n points, l indexes a particular
dimension, and h indexes a third point, the
point of interest which is being moved on
dimension l. So ghl indicates a shift for point
h along dimension l. The Dhi and Dhj are
Kronecker deltas which have the value 1 if i and
h or h and j indices are equal, otherwise they
have the value 0. If the strict lower triangles
of the distance matrices have been used then the
Kronecker deltas can be omitted.
1411. The amount of movement in the direction of
the negative gradient is set by the step length,
a, which is normally about 0.2 initially. The
step size is recalculated after each step such
that the step size gets smaller as reductions in
stress become smaller. 12. Iterate (go to step
3) until either (a) a set maximum number of
iterations is reached or (b) a criterion of
stability is met.
15At the stopping point, the stress should be
acceptably small. One way to assess this is by
calculating the relative magnitude of the
gradient vector g
Mather suggested that mag(g) should reach a value
of 2-5 of its initial value for an arbitrary
configuration. This is often not achievable.
16Randomization (Monte Carlo) test p
(1n)/(1N) n number of randomized runs with
final stress less than or equal to the observed
minimum stress N number of randomized runs
17Figure 16.3. A scree plot shows stress as a
function of dimensionality of the gradient model.
"Stress" is an inverse measure of fit to the
data. The "randomized" data are analyzed as a
null model for comparison. See also Table 16.1.
18(No Transcript)
19(No Transcript)
20Choosing the best solution 1. Select an
appropriate number of dimensions.
Figure 16.3. A scree plot shows stress as a
function of dimensionality of the gradient model.
"Stress" is an inverse measure of fit to the
data. The "randomized" data are analyzed as a
null model for comparison. See also Table 16.1.
212. Seek low stress.
22Figure 16.4. Dependence of stress on sample
size, illustrated by subsampling rows of a matrix
of 50 sample units by 29 species.
23Figure 16.5. Dependence of final stress on
progressive removal of rare species from a data
set of 50 sample units and 29 species. For
example, when only species occurring in 19 or
more of the sample units were retained, 16
species remained in the data set and the final
stress was about 15.
243. Use a Randomization (Monte Carlo) test.
- Helpful but not foolproof.
- The most common problems are
- Strong outliers caused by one or two extremely
high values can result in randomizations with
final stress values similar to the real data. - B. The same can be true for data dominated by a
single super-abundant species. - C. With very small data sets (e.g., less than 10
SUs), the randomization test can be too
conservative. - D. If a data set contains very many zeros, then
most randomizations will produce multiple empty
sample units, making applications of many
distance measures impossible.
254. Avoid unstable solutions.
26Figure 16.7. NMS seeks a stable solution.
Instability is the standard deviation in stress
over the preceding 10 steps (iterations) step
length is the rate at which NMS moves down the
path of steepest descent it is based on mag(g),
the magnitude of the gradient vector (see text).
27Figure 16.8. NMS finds an unstable solution
normally this would be unacceptable instability.
In this case it was induced by overfitting the
model (fitting it to more dimensions than
required). The plotted variables are described
in Figure 16.7.
28Figure 16.9. NMS finds a fairly stable solution,
ending with a periodic but low level of
instability. The instability is so slight that
the stress curve appears nearly flat after about
45 steps. The plotted variables are described in
Figure 16.7.
29Example
Table 16.5. Abundance of six species in each of
five sample units.
30Table 16.6. Sørensen distances among the five
sample units from Table 16.5.
31Figure 16.10. Migration of points from the
starting configuration through 20 steps to the
final configuration using NMS. Each point in the
ordination represents one of five sample units.
The first five graphs show the migration of
individual points (arrows indicate starting
points), followed by the graph of the decline in
stress as the migration proceeds.
32Figure 16.10. (cont.) Migration of points from
the starting configuration through 20 steps to
the final configuration using NMS. Each point in
the ordination represents one of five sample
units.
33Table 16.7. Focusing parameters for fitting new
points to an existing NMS ordination. The units
for interval width and range are always in the
original axis units.
34Figure 16.11. Successive approximation of the
score providing the best fit in predictive-mode
NMS. The method illustrated is for one axis at a
time. Multiple axes can be fit similarly by
simultaneously varying two or more sets of
coordinates.
35Table 16.8. Flags for poor fit of new points
added to an existing NMS ordination.
36The following options were selected
Distance measure Sørensen Method for
fitting one axis at a time Flag for poor
fit autoselect (2 stand.dev. gt mean stress)
Extrapolation limit, of axis length
5.000 Calibration data set D\FHM\DATA\94\SEMode
l.wk1 Contents of data matrices 85 Plots
and 87 Species in calibration data set 34
plots and 140 species in test data set
85 scores from calibration data set Title from
file with calibration scores SEModel.wk1,
rotated 155 degrees clockwise 63 species in
test data set are not present in calibration
data AXIS SUMMARY STATISTICS --------------------
------------------------------------------------
Axis
1 2 Mean stress
49.833 50.251 Standard
deviation of stress 0.351
0.391 Stress used to screen poor fit (flag
1) 50.536 51.033 Cutoff score, low
end-of-axis warning flag (2) -6.473
-12.895 Cutoff score, high end-of-axis warning
flag (3) 106.473 112.895 Minimum score from
calibration data 6.559
1.619 Maximum score from calibration data
93.441 98.381 Range in scores from
calibration data 86.881
96.761 -------------------------------------------
------------------------- BEST-FIT SCORES ON
EACH AXIS ----------------------------------------
---------- Axis 1
2 Item (SU) Score Fit Flag Score Fit
Flag --------------------------------------------
------ 3008281 25.673 49.713 0 90.253
50.085 0 3008367 44.440 50.058 0 74.190
50.582 0 3008372 36.099 50.141 0
80.964 50.437 0 etc. 3108526 2.563
49.505 0 8.199 50.007 0 3108564 50.000
49.953 0 87.543 50.491 0 3108651
0.825 49.327 0 114.830 49.714 3 3108684
4.995 49.328 0 111.347 50.148 0 3108732
121.416 48.633 3 -17.539 48.957 2 3208432
14.379 49.840 0 16.714 50.243
0 3208436 56.429 49.988 0 66.062 50.480
0
Figure 16.12. Example output (PC-ORD) from one
axis at a time method of fitting new points in
an NMS ordination.