Vamsi - PowerPoint PPT Presentation

About This Presentation

Title:

Vamsi

Description:

Canonical Correspondence Analysis (CCA) And Other techniques Vamsi Sundus Shawnalee NMS, NMDS Non-metric multidimensional scaling Ordinal scaling Square distance ... – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 43

Provided by: VamsiBala

Learn more at: http://webpages.math.luc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Vamsi

1
Canonical Correspondence Analysis (CCA)And Other
techniques

Vamsi
Sundus
Shawnalee

2
What is CCA?

Commonly used by researchers trying to
understand the relationship between community
composition and environmental factors.
Or, more generally, comparing/testing one
multivariate dataset against a second one.
Like DECORANA (the last presentation), its based
off of correspondence analysis (ordination
technique).

3
CCA Purpose?

To incorporate environmental data into the
ordination so that a better final ordination
diagram can be created.

4
Whats needed (Part I)

Dependent matrix contains data to be ordinated,
usually composed of population estimates for a
bunch of species)
Environmental matrix describes environmental
conditions. Must contain the same number of rows
(observations) as the species data, but must have
fewer columns than the number of observations.

5
Problems

Just like correspondence analysis, an arching
effect may be found resulting in the second
ordination axis being a distortion of the
first.
We eliminated this previously using a detrended
technique.

6
(No Transcript)
7
DCCA

In the same manner, CCA has detrended canonical
correspondence analysis (DCCA) that uses
essentially the same algorithm to terminate the
second ordination axis and eliminate the arch
effect.

8
Complicated

Canonical correspondance analysis can be
considered to be a form of direct ordination,
although it is so much more complicated than
conventional examples of direct ordinationbeing
a hybrid of direct and indirect ordination.

9
Whats needed (Part II)

Data must be collected from the same place at the
same time.
Autoregressive error?
If not collected together ? error of
pseudoreplication.

10
Pseudoreplication (Reteaching)

I forgot.
Lets say we want to observe the effects of a
drug on estrus (monthly period cycle).
Let n100. n1 50, n2 50, n n1 n2
Trt A, Trt B
Have all mice in same room.

11
Problems with this design

Inherent in this design are problems
Chemical cues for setting cycle.
One mice influences the next.
Like in colleges.
Pseudoreplication, apparently independent, but
not really, data.

12
Back to CCA

End divergence.

13
Canonical

Definition
Whenever used in this field (multivariate
analysis), means something is being optimized
against some other constraint.

14
The Steps

The only major difference between (regular)
correspondance analysis and canonical is the
addition of two steps.

15
Step 1 - CA

Start with a random weighting. Its pretty kosher
to start from 0.0 ? 100.0 in whatever increments
are needed.
In our case, well do (0,50,100) for (A, B, C)
Use this formula for nth species rank

16
Step 2 - CA

Use the starter weights (which are arbitrary
essentially) and compute a weighting for each of
the years

Year Counts Counts Counts Y1
1 100 0 0 --gt 0.0
2 90 10 0 --gt 5.0
3 80 20 5 --gt 14.3
4 60 35 10 --gt 26.2
5 50 50 20 --gt 37.5
6 40 60 30 --gt 46.2
7 20 30 40 --gt 61.1
8 5 20 60 --gt 82.4
9 0 10 75 --gt 94.1
10 0 0 90 --gt 100.0
17
Step 3

We can now calculate a new weighting for each
species using these new year weightings.
Calculate similarly for B, C

A
Old weightings for species
S10 0 50 100
S1a 19.1 43.9 78.5
New calculated weightings for species
18
Step 4

These new weightings for each species though
arent that useful, so we need to rescale them
back to 0 ? 100, instead of currently 19.1 ?
78.5.
So, to do this, simply use a logical rescaling
method.

S1a 19.1 43.9 78.5
19
Step 4 cont.

So, after computing the rescaled values, we find
the following

S10 0 50 100
S1a 19.1 43.9 78.5
S1b 0.00 41.75 100.00
20
Step 5

This is now one cycle of the CA completed.
Weightings for each year are recalculated using
the new, rescaled weightings for the species.
Eventually a stable patter will emerge.
10-20 iterations.

21
CA vs. CCA
Start with arbitrary but unequal site scores

Start with arbitrary but unequal site scores
Calculate species scores as weighted average of
site scores
Calculate new site scores as weighted average of
species scores.
Standardize
Stop if acceptable otherwise iterate from step 2

Calculate species scores as weighted average of
site scores
Calculate new site scores as weighted average of
species scores.
Perform multiple regression of site scores on
environmental variables Use multiple regression
to derive new predicted values.
Standardize
Stop if acceptable, else iterate from 2.
22
(No Transcript)
23
Other Techniques

There are many other techniques that are
available for multivariate analysis.
COR
CVA
FA
MDS
MRPP
MANCOVA
MANOVA
NMS
NMDS
Procustes Rotation
RDA
PRC

24
COR

Canonical Correlation Analysis
Similar to CCA.
Continuation of the progression from bivariate to
multiple linear regression.
Bivariate 1 independent to explain 1 dependent
Multivariate n independent to explain 1
dependent
Canonical n independent to explain m dependent

25
COR (cont.)

Major difference in limitations
(Number of species environmental variables) lt
number of sites. //COR
Weaker requirement for CCA
(Number of environmental variables alone lt number
of observations. //CCA
Both result in similar outputs. CCA is preferred.
(easier limitations to meet on allowable number
of variables).

26
CVA

Canonical Variates Analysis
Purpose generate a score for each inidvidual,
which, using a 1 way anova by category would
return the highest possible F value
Maximize variance within dataset ? hence
canonical.
Limitations multivariate normality, categories
need to be known a priori.

27
FA

Factor Analysis is used as a synonym for PCA
(Principal component analysis) in the US
How it began
School students scores in Classics, French,
English, Math, Discrimination of Pitch, and Music
Abilities in each due to smaller number of
fundamental skills (factors).
Derive absolute parameter estimates.

28
FA (cont.)
Fn value of nth factor Lamdajn loading
variable j on factor n ej residual for variable
j P number of variables M number of factors
29
FA (cont)

FA becomes an eigenvector problem hence Similar
to PCA (eigenanalysis of correlation matrix).
the results aredifficult to interpret and
based on assumptions that are probably invalid.
FA is not worth the time necessary to understand
and perform it. (Hills 1977)

30
MDS

Multidimensional Scaling
Takes square matrix of distances between
individuals and recreates maps
Discussed previously

31
MRPP

Multiresponse Permutation Procedure
Assesses the probability that two or more groups
consisting of multivariate data differ
Different from normal mulivariate ANOVA in that
its non-parametric ? can be used on biological
data without worrying about multivariate
normality

32
MANCOVA

Multivariate Analysis of Covariance
Multivariate equivlent of ANOVA
Assumption of normality
Lacks non-parametric test though

33
MANOVA

Multivariate ANOVA
Analagous to univariate ANOVA ? provides estimate
of the probability that the observed patter
arises from random data.
Each mean is treated as a coordinate in
multivariate space.
Used specifically in assessing whether an
overall response has occurred, but will not
identify which variables contributed to
treatments if significance is found.
Requires normality, or else.
Or else use MRPP

34
NMS, NMDS

Non-metric multidimensional scaling
Ordinal scaling
Square distance matrix ? map reconstructed
Differs from other multivariate techniques

35
NMS, NMDS (cont)

Differs from other multivariate techniques
Uses only one distance measure derived from
ranked differences between individuals.
So, can be used with non-normal, discontinuous or
questionable distributions.
Ordinations axes will differ according to how
many axes are requested.
Where two or more ordination axes are requested,
the first axis need not be more important than
the second or higher axes. ? axis numbering is
arbitrary.
A lot of subjectivity in the technique in choice
of axis, hence not used that often.

36
Procrustes Rotation

Compares two different ordinations applied to the
same data.
Has m2 statistic (residual sum of squares) to
assess after Procrustes operations have been
applied.
No significance test
No clear guildelines to interpret m2 values

37
Procrustes Rotation

Named is derived from Greek mythology.
Inn keeper who ensured al his customers fittyed
perfectly to his bed by stretching them or
chopping their feet off.

38
RDA

Redundancy Analysis
Derivative or PCA with bonus feature
Values entered into analysis arent original data
but the best-fit values estimated from a multiple
linear regression between each variable and
second matrix of environmental data.
Thus, this is a canonical version of PCA
Constrained to optimally correlate with another
dataset.
Interpretation is by biplot
Collinearity, which is likely in biological data,
makes canonical coefficients unreliable.
RDA technique that underlies PRC

39
PRC

Principal response curves
1999, New technique
Derived from RDA and specfically intended to help
interpret planned experiements on biological
communities.
Two treatments, one is a control
Reapeated sampling
ltnot enough detailsgt

40
(No Transcript)
41
(No Transcript)
42
END

Write a Comment

User Comments (0)