Title: Correlation
1Correlation
2The sample covariance matrix
where
3The sample correlation matrix
where
4Note
where
5Tests for IndependenceandNon-zero correlation
6Tests for Independence
Test for zero correlation (Independence between a
two variables)
The test statistic
If independence is true then the test statistic t
will have a t -distributions with n n 2
degrees of freedom.
The test is to reject independence if
7Test for non-zero correlation (H0 r r0 )
The test statistic
If H0 is true the test statistic z will have
approximately a Standard Normal distribution
We then reject H0 if
8Partial Correlation
9Recall
has p-variate Normal distribution
with mean vector
and Covariance matrix
Then the conditional distribution of given
is qi-variate Normal distribution
with mean vector
and Covariance matrix
10is called the matrix of partial variances and
covariances.
is called the partial covariance (variance if i
j) between xi and xj given x1, , xq.
is called the partial correlation between xi and
xj given x1, , xq.
11Let
denote the sample Covariance matrix
Let
is called the sample partial covariance (variance
if i j) between xi and xj given x1, , xq.
12Also
is called the sample partial correlation between
xi and xj given x1, , xq.
13Test for zero partial correlation correlation
(Conditional independence between a two variables
given a set of p Independent variables)
The test statistic
the partial correlation between yi and yj given
x1, , xp.
If independence is true then the test statistic t
will have a t -distributions with n n p - 2
degrees of freedom.
The test is to reject independence if
14Test for non-zero partial correlation
The test statistic
If H0 is true the test statistic z will have
approximately a Standard Normal distribution
We then reject H0 if
15The Multiple Correlation Coefficient
Testing independence between a single variable
and a group of variables
16Definition
has (p 1)-variate Normal distribution
with mean vector
and Covariance matrix
We are interested if the variable y is
independent of the vector
The multiple correlation coefficient is the
maximum correlation between y and a linear
combination of the components of
17Derivation
This vector has a bivariate Normal distribution
with mean vector
and Covariance matrix
We are interested if the variable y is
independent of the vector
The multiple correlation coefficient is the
maximum correlation between y and a linear
combination of the components of
18The multiple correlation coefficient is the
maximum correlation between y and
The correlation between y and
Thus we want to choose to maximize
Equivalently
19Note
20The multiple correlation coefficient is
independent of the value of k.
21We are interested if the variable y is
independent of the vector
The sample Multiple correlation coefficient
Then the sample Multiple correlation coefficient
is
22Testing for independence between y and
The test statistic
If independence is true then the test statistic F
will have an F-distributions with n1 p degrees
of freedom in the numerator and n1 n p 1
degrees of freedom in the denominator
The test is to reject independence if
23Canonical Correlation Analysis
24The problem
- Quite often when one has collected data on
several variables.
The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of
the other set.
In addition if it is found that the two sets of
variates are dependent, it is then important to
describe and understand the nature of this
dependence.
The appropriate statistical procedure in this
case is called Canonical Correlation Analysis.
25Canonical Correlation An Example
- In the following study the researcher was
interested in whether specific instructions on
how to relax when taking tests and how to
increase Motivation , would affect performance on
standardized achievement tests
- Reading,
- Language and
- Mathematics
26- A group of 65 third- and fourth-grade students
were rated after the instruction and immediately
prior taking the Scholastic Achievement tests on
- how relaxed they were (X1) and
- how motivated they were (X2).
In addition data was collected on the three
achievement tests
- Reading (Y1),
- Language (Y2) and
- Mathematics (Y3).
The data were tabulated on the next page
27(No Transcript)
28Definition (Canonical variates and Canonical
correlations)
have p-variate Normal distribution
and
with
Let
and
be such that U1 and V1 have achieved the maximum
correlation f1.
Then U1 and V1 are called the first pair of
canonical variates and f1 is called the first
canonical correlation coefficient.
29derivation ( 1st pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
Thus
30derivation ( 1st pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
Thus
hence
31Thus we want to choose
so that
is at a maximum
or
is at a maximum
Let
32Computing derivatives
and
33Thus
This shows that
is an eigenvector of
k is the largest eigenvalue of
and
is the eigenvector associated with the largest
eigenvalue.
34Also
and
35Summary
The first pair of canonical variates
, eigenvectors of the matrices
are found by finding
associated with the largest eigenvalue (same for
both matrices)
The largest eigenvalue of the two matrices is the
square of the first canonical correlation
coefficient f1
36Note
have exactly the same eigenvalues (same for both
matrices)
Proof
then
and
37The remaining canonical variates and canonical
correlation coefficients
The second pair of canonical variates
, so that
are found by finding
1. (U2,V2) are independent of (U1,V1).
2. The correlation between U2 and V2 is maximized
The correlation, f2, between U2 and V2 is called
the second canonical correlation coefficient.
38The ith pair of canonical variates
, so that
are found by finding
1. (Ui,Vi) are independent of (U1,V1), ,
(Ui-1,Vi-1).
2. The correlation between Ui and Vi is maximized
The correlation, f2, between U2 and V2 is called
the second canonical correlation coefficient.
39derivation ( 2nd pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
40Now
and maximizing
Is equivalent to maximizing
subject to
Using the Lagrange multiplier technique
41Now
and
also
gives the restrictions
42These equations can used to show that
are eigenvectors of the matrices
associated with the 2nd largest eigenvalue (same
for both matrices)
The 2nd largest eigenvalue of the two matrices
is the square of the 2nd canonical correlation
coefficient f2
43continuing
Coefficients for the ith pair of canonical
variates,
are eigenvectors of the matrices
associated with the ith largest eigenvalue (same
for both matrices)
The ith largest eigenvalue of the two matrices
is the square of the ith canonical correlation
coefficient fi
44Example
- Variables
- relaxation Score (X1)
- motivation score (X2).
- Reading (Y1),
- Language (Y2) and
- Mathematics (Y3).
45Summary Statistics
46Canonical Correlation statistics Statistics
47continued
48Summary
- U1 0.197 Relax 0.979 Mot
- V1 0.504 Read 0.900 Lang 0.565 Math
- f1 .592
-
- U2 0.980 Relax 0.203 Mot
- V2 0.391 Math - 0.361 Read - 0.354 Lang
- f2 .159