Title: introduction to numerical methods in community ecology
1introduction to numerical methods in community
ecology
2species environmental data
classification define groups
Indicator Species Analysis MRPP Discriminant
Analysis
3introduction to ordination
- Ordination - positioning of species or samples in
order along an abstracted gradient
A
D
B
C
E
F
LATENT GRADIENT
why would you want to do this??
4why ordination?
- reduced dimensionality/data reduction
- (so that reduced space similarity reflects full
dimensional/ecological similarity) - ecological data often do not meet assumptions of
traditional statistics. - explore community responses to environmental
gradients
5environmental gradient response
- who lives with whom and why?
- how are environmental gradients associated with
ordinal arrangement of sites?
6confusing vocabulary? eigenanalysis
- techniques resulting in a linear reduction in
dimensionality - aka singular value decomposition
- PCA, CA, DCA, CCA
- eigenvalue strength of an axis
- eigenvector set of sample scores
7confusing vocabulary? Three Cs
- Canonical - refers to simultaneous analysis of
two or more related data matrices - Correspondence repeating a weighted averaging
of site scores to yield species scores and vice
versa - Correlation test of relationship between two
matrices
McCune and Grace 2002
8confusing vocabulary? site, sample, and species
scores
- Sitesample scores - refers to the coordinate
along an ordination axis specifying the location
of a site/sample - Species scores refers to the coordinate along
an ordination axis specifying the location of a
species
Mike Palmer 2004
9history of ordination and methods
- 1957 (Bray and Curtis) - Polar Ordination (BC)
- 1954 (Goodall) Principal Components analysis
(PCA) - 1964 (Kruskal) Nonmetric Multidimensional
Scaling (NMS, NMDS) - 1973 (Hill) Correspondence Analysis (CA)
- 1980 (Hill and Gauch) the very popular
Detrended Correspondence Analysis (DCA) - 1986 (ter Braak) Canonical Correspondence
Analysis (CCA) - 2000s revival of NMS probably current method
of choice for peer-review
10direct vs. indirect
- choice depends on data collected/available
- Direct requires environmental data matrix for
ordination (samples environmental variables) - Indirect ordination independent of
environmental variables (samples only)
11direct gradient analysis
- requires full set of environmental variables
- mapping of species into a measured environmental
space. - knowledge of relevant and important variables
(somewhat subjective?) - weighted averaging/CCA? (CCA slightly different
bird because it does not assume relative
importance of env. vars)
12indirect gradient analysis
- no environmental data required, but can be
overlaid after the fact. - Gradients assumed from species data
- no assumptions regarding the specific
species?environment responses in initial
ordination
13indirect vs. direct
does your dataset include relevant and dependable
weights of species and environmental vars?
no
yes
indirect gradient analysis current methods of
choice
direct gradient analysis (often best to confirm
results with indirect method)
14weighted averaging (Whittaker 1967)
- direct gradient analysis
- a set of previously assigned species weights is
used to calculate scores for the sites. The
calculation is a weighted averaging for species
present in the sample unit.
15weighted averaging (Whittaker 1967)
- Algorithm
- Where
- vi ordination score for site i
- aij abundance of species j at site i
- wj weight of species j
16weighted averaging (Whittaker 1967)
- Assumptions knowledge of species weights
- Advantages - simple, easy to use, understand,
communicate good for regulators and
nonscientists - Disadvantages - focuses on a single gradient,
potential for species optimum to occur outside of
sampled range
17Bray Curtis - Sorenson(Bray Curtis 1957)
- indirect gradient analysis method
- polar ordination - two end point samples are
used as the poles of the natural gradient
18Bray Curtis - Sorenson(Bray Curtis 1957)
- Algorithm (method of scaling)
- end points (poles) selected
- distance matrix applied to position sites along
axes with respect to poles
19Bray Curtis - Sorenson(Bray Curtis 1957)
- Assumptions end points are true gradient ends?
- Advantages
- - no assumption of linearity among species
- - flexible distance measure
- Disadvantages
- - dependent on endpoint selection
- - sites ordered according to relation to
endpoints
20Bray Curtis - Sorenson(Bray Curtis 1957)
21principal components analysis (PCA) (Pearson
1901 Goodall 1954)
- indirect gradient analysis
- performed to reduce data to a smaller set of
synthetic variables - based on principal that strongest covariation
among variables emerges in the first few axes, or
components - ideal technique for data with approximately
linear relationships among variables (infrequent
with ecological data) -
22principal components analysis (PCA) (Pearson
1901 Goodall 1954)
- Algorithm
- Calculate variance/covariance matrix (cross
products among columns) related to the
correlation matrix - Calculate eigenvalues
- Determine eigenvectors
- Find scores for each site (original data matrix
matrix of eigenvectors) - Calculate loading matrix (eigenvector matrix
sqrt of eigenvalue matrix) gives sensitivities
to changes in principal components
23principal components analysis (PCA) (Pearson
1901 Goodall 1954)
- Assumptions
- - component variables are normally distributed
and linearly related - - component variables are uncorrelated
(infrequently true for ecological data) - - linear, monotonic response of species to
environmental gradients
24principal components analysis (PCA) (Pearson
1901 Goodall 1954)
- Disadvantages
- - linear response model
- - often difficult to interpret
- - even moderately heterogeneous data sets will
be severely distorted by PCA - - horseshoe effect
- - implicit use of Euclidean distance
- - strongly affected by outliers
25linear responses?
26Horseshoe effect of principal components analysis
(PCA)
- The second axis is curved and twisted relative
to the first and does not represent a true
secondary gradient occurs with very long
gradients
27(No Transcript)
28principal components analysis (PCA) (Pearson
1901 Goodall 1954)
29reciprocal averaging (RA)/ correspondence
analysis (CA)(Hill 1973)
- similar to PCA, but implicitly uses chi-square
distance measure - double weighting of species (rows) and stands
(columns) in CA distinguishes from PCA
30reciprocal averaging (RA)/ correspondence
analysis (CA)
- Algorithm (eigenanalysis problem)
- Arbitrary position of sites along axes
- Weighted average of species is taken using
abundance values as weights (species scores) - Calculate site scores by weighted averaging of
species score (step 2 used to locate site on
environmental gradient) - Center and standardize site scores
- Return to step 2 until convergence
31reciprocal averaging (RA)/ correspondence
analysis (CA) (Hill 1973)
- Assumptions
- Homogeneous distribution of sites along gradients
- Equal tolerances of species to environment
- Assumptions regarding species optima (homogeneity
and independence of abundance) - Advantages - very effective for detecting major
environmental gradients
McCune and Grace 2002
32reciprocal averaging (RA)/ correspondence
analysis (CA) (Hill 1973)
- Disadvantages
- arch effect on second and higher axes
- end compression of axes
- Chi-square distance measure exaggerates samples
with numerous rare species - Best applied when beta diversity is low
33arch effect of correspondence analysis (CA)
(Hill 1973)
- Second axis is an arched function of the first
axis, due to the unimodal distribution of the
species along gradients
34correspondence analysis (CA) (Hill 1973)
35detrended correspondence analysis (DCA)(Hill
Gauch 1980)
- Indirect gradient analysis
- Similar to CA but with two corrections
- One of the more popular methods, but subject to
recent criticism
36detrended correspondence analysis (DCA)(Hill
Gauch 1980)
- Solve eigenanalysis problem (like CA)
- Detrend axes
- Rescale axes
37detrended correspondence analysis (DCA)(Hill
Gauch 1980)
- rescaling corrects problems of axis end
compression in order to have species turn over
at uniform rate along gradient is this
appropriate? - accomplished by Hills method - the axes are
scaled so that species have unit dispersion along
axes (beta diversitygradient length constant)
McCune and Grace 2002
38Detrending by DCA(Hill Gauch 1980)
- arch effect removed by detrending data
- division of first axis into segments
- for each segment on axis 2, setting average score
to 0.
McCune and Grace 2002
39detrended correspondence analysis (DCA)(Hill
Gauch 1980)
McCune and Grace 2002
40detrended correspondence analysis (DCA)(Hill
Gauch 1980)
- Assumptions
- unimodal species response
- similar maxima and dispersions of species
- Advantages
- popular, well-known
- Disadvantages
- criticized for being a fix of a fix
- number of segments affects results (demonstrated
in lab) - implicitly use of chi-square distance measure
41detrended correspondence analysis (DCA)(Hill
Gauch 1980)
42canonical correspondence analysis (CCA) (ter
Braak 1986)
- direct gradient analysis example of constrained
ordination - relevant environmental variables are known, but
not the relative degree of influence - axes are constrained to be linear functions of
measured environmental variables
43canonical correspondence analysis (CCA) (ter
Braak 1986)
- 3 scores reported
- Species scores
- Sample scores as weighted averages of species
(WA) - Sample scores as Linear Combinations (LC) of
environmental variables
- which to interpret?? Good question!
- Palmer (1993) recommended using the LC-scores
because they are best fitted to the environmental
variables and are logically most consistent with
premise of CCA. - McCune (1997) argued that WA scores are more
robust because LC scores may be more sensitive to
noise in the environmental data.
44canonical correspondence analysis (CCA) (ter
Braak 1986)
- Same as CA, but before species scores are
re-calculated, sample weighted average (WA)
scores are further weighted by Linear Combination
(LC) scores from environmental variables
Figure from Dean Urban 2004
45canonical correspondence analysis (CCA) (ter
Braak 1986)
- Assumptions
- unimodal species response
- influential environmental variables are recorded
- Advantages
- ignores community structure that is unrelated to
environmental variables - Disadvantages
- Lots of outputs difficult to know what is
meaningful - Unimodal species response
46canonical correspondence analysis (CCA) (ter
Braak 1986)
47non-metric multidimensional scaling (NMS)
(Kruskal 1964)
- Scaling method (similar to Bray Curtis)
- iterative optimization based on rank order of
dissimilarities
48non-metric multidimensional scaling (NMS)
(Kruskal 1964)
- Algorithm
- calculate distance matrix
- assign random starting positions
- tweak positions to minimize the ranks of the
differences in the original, fully dimensional
distance matrix and the reduced dimension matrix - Remember the point is to find the best
positions of your sites on a to-be-determined
number of axes to duplicate the original distance
matrix as closely as possible
49non-metric multidimensional scaling (NMS)
(Kruskal 1964)
- Stress departure
- from monotonicity
- between the original
- distances and the
- reduced, ordinated
- distances.
50non-metric multidimensional scaling (NMS)
(Kruskal 1964)
- Assumptions?
- Advantages
- any distance measure (sort of.well demonstrate
this in lab) - no assumption of linearity between variables
- ranked distances good for reducing stress
- Disadvantages
- computationally intensive
- local vs. global minima
- solution dependent on the number of axes selected
51non-metric multidimensional scaling (NMS)
(Kruskal 1964)
52choosing the right method
- as demonstrated, each method has advantages and
disadvantages - consider method assumptions can you be
comfortable with making these statements about
your data? - run several different methodsdo the results
generally correspond?
53choosing the right method
- run several different methodsdo the results
generally correspond? - compare a direct and an indirect gradient method
Wentworths theory
54brief look at outputs
- more comprehensive discussion during lab
- lots of outputs and differences between methods
55another look at outputsbiplots
- closeness of samples (sites) on ordination plots
indicate similarity - vectors illustrate relationships of environmental
variables to species the relative length of the
vector indicates the relative strength (and
direction)
56some references
- Dean Urban (2004) - ENV358 lecture notes
- http//www.nicholas.duke.edu/landscape/classes/env
358 - McCune and Grace (2002) Analysis of Ecological
Communities - http//home.centurytel.net/mjm/
- Mike Palmer (OK State) The Ordination Website
- http//www.okstate.edu/artsci/botany/ordinate