Vamsi - PowerPoint PPT Presentation

About This Presentation
Title:

Vamsi

Description:

Canonical Correspondence Analysis (CCA) And Other techniques Vamsi Sundus Shawnalee NMS, NMDS Non-metric multidimensional scaling Ordinal scaling Square distance ... – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 43
Provided by: VamsiBala
Category:
Tags: analysis | vamsi

less

Transcript and Presenter's Notes

Title: Vamsi


1
Canonical Correspondence Analysis (CCA)And Other
techniques
  • Vamsi
  • Sundus
  • Shawnalee

2
What is CCA?
  • Commonly used by researchers trying to
    understand the relationship between community
    composition and environmental factors.
  • Or, more generally, comparing/testing one
    multivariate dataset against a second one.
  • Like DECORANA (the last presentation), its based
    off of correspondence analysis (ordination
    technique).

3
CCA Purpose?
  • To incorporate environmental data into the
    ordination so that a better final ordination
    diagram can be created.

4
Whats needed (Part I)
  1. Dependent matrix contains data to be ordinated,
    usually composed of population estimates for a
    bunch of species)
  2. Environmental matrix describes environmental
    conditions. Must contain the same number of rows
    (observations) as the species data, but must have
    fewer columns than the number of observations.

5
Problems
  • Just like correspondence analysis, an arching
    effect may be found resulting in the second
    ordination axis being a distortion of the
    first.
  • We eliminated this previously using a detrended
    technique.

6
(No Transcript)
7
DCCA
  • In the same manner, CCA has detrended canonical
    correspondence analysis (DCCA) that uses
    essentially the same algorithm to terminate the
    second ordination axis and eliminate the arch
    effect.

8
Complicated
  • Canonical correspondance analysis can be
    considered to be a form of direct ordination,
    although it is so much more complicated than
    conventional examples of direct ordinationbeing
    a hybrid of direct and indirect ordination.

9
Whats needed (Part II)
  • Data must be collected from the same place at the
    same time.
  • Autoregressive error?
  • If not collected together ? error of
    pseudoreplication.

10
Pseudoreplication (Reteaching)
  • I forgot.
  • Lets say we want to observe the effects of a
    drug on estrus (monthly period cycle).
  • Let n100. n1 50, n2 50, n n1 n2
  • Trt A, Trt B
  • Have all mice in same room.

11
Problems with this design
  • Inherent in this design are problems
  • Chemical cues for setting cycle.
  • One mice influences the next.
  • Like in colleges.
  • Pseudoreplication, apparently independent, but
    not really, data.

12
Back to CCA
  • End divergence.

13
Canonical
  • Definition
  • Whenever used in this field (multivariate
    analysis), means something is being optimized
    against some other constraint.

14
The Steps
  • The only major difference between (regular)
    correspondance analysis and canonical is the
    addition of two steps.

15
Step 1 - CA
  • Start with a random weighting. Its pretty kosher
    to start from 0.0 ? 100.0 in whatever increments
    are needed.
  • In our case, well do (0,50,100) for (A, B, C)
  • Use this formula for nth species rank

16
Step 2 - CA
  • Use the starter weights (which are arbitrary
    essentially) and compute a weighting for each of
    the years

Year Counts Counts Counts Y1
1 100 0 0 --gt 0.0
2 90 10 0 --gt 5.0
3 80 20 5 --gt 14.3
4 60 35 10 --gt 26.2
5 50 50 20 --gt 37.5
6 40 60 30 --gt 46.2
7 20 30 40 --gt 61.1
8 5 20 60 --gt 82.4
9 0 10 75 --gt 94.1
10 0 0 90 --gt 100.0
17
Step 3
  • We can now calculate a new weighting for each
    species using these new year weightings.
  • Calculate similarly for B, C

A
Old weightings for species
S10 0 50 100
S1a 19.1 43.9 78.5
New calculated weightings for species
18
Step 4
  • These new weightings for each species though
    arent that useful, so we need to rescale them
    back to 0 ? 100, instead of currently 19.1 ?
    78.5.
  • So, to do this, simply use a logical rescaling
    method.

S1a 19.1 43.9 78.5
19
Step 4 cont.
  • So, after computing the rescaled values, we find
    the following

S10 0 50 100
S1a 19.1 43.9 78.5
S1b 0.00 41.75 100.00
20
Step 5
  • This is now one cycle of the CA completed.
  • Weightings for each year are recalculated using
    the new, rescaled weightings for the species.
  • Eventually a stable patter will emerge.
  • 10-20 iterations.

21
CA vs. CCA
Start with arbitrary but unequal site scores
  • Start with arbitrary but unequal site scores
  • Calculate species scores as weighted average of
    site scores
  • Calculate new site scores as weighted average of
    species scores.
  • Standardize
  • Stop if acceptable otherwise iterate from step 2

Calculate species scores as weighted average of
site scores
Calculate new site scores as weighted average of
species scores.
Perform multiple regression of site scores on
environmental variables Use multiple regression
to derive new predicted values.
Standardize
Stop if acceptable, else iterate from 2.
22
(No Transcript)
23
Other Techniques
  • There are many other techniques that are
    available for multivariate analysis.
  • COR
  • CVA
  • FA
  • MDS
  • MRPP
  • MANCOVA
  • MANOVA
  • NMS
  • NMDS
  • Procustes Rotation
  • RDA
  • PRC

24
COR
  • Canonical Correlation Analysis
  • Similar to CCA.
  • Continuation of the progression from bivariate to
    multiple linear regression.
  • Bivariate 1 independent to explain 1 dependent
  • Multivariate n independent to explain 1
    dependent
  • Canonical n independent to explain m dependent

25
COR (cont.)
  • Major difference in limitations
  • (Number of species environmental variables) lt
    number of sites. //COR
  • Weaker requirement for CCA
  • (Number of environmental variables alone lt number
    of observations. //CCA
  • Both result in similar outputs. CCA is preferred.
    (easier limitations to meet on allowable number
    of variables).

26
CVA
  • Canonical Variates Analysis
  • Purpose generate a score for each inidvidual,
    which, using a 1 way anova by category would
    return the highest possible F value
  • Maximize variance within dataset ? hence
    canonical.
  • Limitations multivariate normality, categories
    need to be known a priori.

27
FA
  • Factor Analysis is used as a synonym for PCA
    (Principal component analysis) in the US
  • How it began
  • School students scores in Classics, French,
    English, Math, Discrimination of Pitch, and Music
  • Abilities in each due to smaller number of
    fundamental skills (factors).
  • Derive absolute parameter estimates.

28
FA (cont.)
Fn value of nth factor Lamdajn loading
variable j on factor n ej residual for variable
j P number of variables M number of factors
29
FA (cont)
  • FA becomes an eigenvector problem hence Similar
    to PCA (eigenanalysis of correlation matrix).
  • the results aredifficult to interpret and
    based on assumptions that are probably invalid.
  • FA is not worth the time necessary to understand
    and perform it. (Hills 1977)

30
MDS
  • Multidimensional Scaling
  • Takes square matrix of distances between
    individuals and recreates maps
  • Discussed previously

31
MRPP
  • Multiresponse Permutation Procedure
  • Assesses the probability that two or more groups
    consisting of multivariate data differ
  • Different from normal mulivariate ANOVA in that
    its non-parametric ? can be used on biological
    data without worrying about multivariate
    normality

32
MANCOVA
  • Multivariate Analysis of Covariance
  • Multivariate equivlent of ANOVA
  • Assumption of normality
  • Lacks non-parametric test though

33
MANOVA
  • Multivariate ANOVA
  • Analagous to univariate ANOVA ? provides estimate
    of the probability that the observed patter
    arises from random data.
  • Each mean is treated as a coordinate in
    multivariate space.
  • Used specifically in assessing whether an
    overall response has occurred, but will not
    identify which variables contributed to
    treatments if significance is found.
  • Requires normality, or else.
  • Or else use MRPP

34
NMS, NMDS
  • Non-metric multidimensional scaling
  • Ordinal scaling
  • Square distance matrix ? map reconstructed
  • Differs from other multivariate techniques

35
NMS, NMDS (cont)
  • Differs from other multivariate techniques
  • Uses only one distance measure derived from
    ranked differences between individuals.
  • So, can be used with non-normal, discontinuous or
    questionable distributions.
  • Ordinations axes will differ according to how
    many axes are requested.
  • Where two or more ordination axes are requested,
    the first axis need not be more important than
    the second or higher axes. ? axis numbering is
    arbitrary.
  • A lot of subjectivity in the technique in choice
    of axis, hence not used that often.

36
Procrustes Rotation
  • Compares two different ordinations applied to the
    same data.
  • Has m2 statistic (residual sum of squares) to
    assess after Procrustes operations have been
    applied.
  • No significance test
  • No clear guildelines to interpret m2 values

37
Procrustes Rotation
  • Named is derived from Greek mythology.
  • Inn keeper who ensured al his customers fittyed
    perfectly to his bed by stretching them or
    chopping their feet off.

38
RDA
  • Redundancy Analysis
  • Derivative or PCA with bonus feature
  • Values entered into analysis arent original data
    but the best-fit values estimated from a multiple
    linear regression between each variable and
    second matrix of environmental data.
  • Thus, this is a canonical version of PCA
  • Constrained to optimally correlate with another
    dataset.
  • Interpretation is by biplot
  • Collinearity, which is likely in biological data,
    makes canonical coefficients unreliable.
  • RDA technique that underlies PRC

39
PRC
  • Principal response curves
  • 1999, New technique
  • Derived from RDA and specfically intended to help
    interpret planned experiements on biological
    communities.
  • Two treatments, one is a control
  • Reapeated sampling
  • ltnot enough detailsgt

40
(No Transcript)
41
(No Transcript)
42
END
Write a Comment
User Comments (0)
About PowerShow.com