Correlation - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Correlation

Description:

... independence is true then the test statistic F will have an ... In addition data was collected on the three achievement tests. how relaxed they were (X1) and ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 49

Provided by: lave9

Category:

more less

Transcript and Presenter's Notes

Title: Correlation

1
Correlation
2
The sample covariance matrix
where
3
The sample correlation matrix
where
4
Note
where
5
Tests for IndependenceandNon-zero correlation
6
Tests for Independence
Test for zero correlation (Independence between a
two variables)
The test statistic
If independence is true then the test statistic t
will have a t -distributions with n n 2
degrees of freedom.
The test is to reject independence if
7
Test for non-zero correlation (H0 r r0 )
The test statistic
If H0 is true the test statistic z will have
approximately a Standard Normal distribution
We then reject H0 if
8
Partial Correlation

Conditional Independence

9
Recall
has p-variate Normal distribution
with mean vector
and Covariance matrix
Then the conditional distribution of given
is qi-variate Normal distribution
with mean vector
and Covariance matrix
10
is called the matrix of partial variances and
covariances.
is called the partial covariance (variance if i
j) between xi and xj given x1, , xq.
is called the partial correlation between xi and
xj given x1, , xq.
11
Let
denote the sample Covariance matrix
Let
is called the sample partial covariance (variance
if i j) between xi and xj given x1, , xq.
12
Also
is called the sample partial correlation between
xi and xj given x1, , xq.
13
Test for zero partial correlation correlation
(Conditional independence between a two variables
given a set of p Independent variables)
The test statistic
the partial correlation between yi and yj given
x1, , xp.
If independence is true then the test statistic t
will have a t -distributions with n n p - 2
degrees of freedom.
The test is to reject independence if
14
Test for non-zero partial correlation
The test statistic
If H0 is true the test statistic z will have
approximately a Standard Normal distribution
We then reject H0 if
15
The Multiple Correlation Coefficient
Testing independence between a single variable
and a group of variables
16
Definition
has (p 1)-variate Normal distribution
with mean vector
and Covariance matrix
We are interested if the variable y is
independent of the vector
The multiple correlation coefficient is the
maximum correlation between y and a linear
combination of the components of
17
Derivation
This vector has a bivariate Normal distribution
with mean vector
and Covariance matrix
We are interested if the variable y is
independent of the vector
The multiple correlation coefficient is the
maximum correlation between y and a linear
combination of the components of
18
The multiple correlation coefficient is the
maximum correlation between y and
The correlation between y and
Thus we want to choose to maximize
Equivalently
19
Note
20
The multiple correlation coefficient is
independent of the value of k.
21
We are interested if the variable y is
independent of the vector
The sample Multiple correlation coefficient
Then the sample Multiple correlation coefficient
is
22
Testing for independence between y and
The test statistic
If independence is true then the test statistic F
will have an F-distributions with n1 p degrees
of freedom in the numerator and n1 n p 1
degrees of freedom in the denominator
The test is to reject independence if
23
Canonical Correlation Analysis
24
The problem

Quite often when one has collected data on
several variables.

The variables are grouped into two (or more) sets
of variables and the researcher is interested in
whether one set of variables is independent of
the other set.
In addition if it is found that the two sets of
variates are dependent, it is then important to
describe and understand the nature of this
dependence.
The appropriate statistical procedure in this
case is called Canonical Correlation Analysis.
25
Canonical Correlation An Example

In the following study the researcher was
interested in whether specific instructions on
how to relax when taking tests and how to
increase Motivation , would affect performance on
standardized achievement tests

Reading,
Language and
Mathematics

A group of 65 third- and fourth-grade students
were rated after the instruction and immediately
prior taking the Scholastic Achievement tests on

how relaxed they were (X1) and
how motivated they were (X2).

In addition data was collected on the three
achievement tests

Reading (Y1),
Language (Y2) and
Mathematics (Y3).

The data were tabulated on the next page
27
(No Transcript)
28
Definition (Canonical variates and Canonical
correlations)
have p-variate Normal distribution
and
with
Let
and
be such that U1 and V1 have achieved the maximum
correlation f1.
Then U1 and V1 are called the first pair of
canonical variates and f1 is called the first
canonical correlation coefficient.
29
derivation ( 1st pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
Thus
30
derivation ( 1st pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
Thus
hence
31
Thus we want to choose
so that
is at a maximum
or
is at a maximum
Let
32
Computing derivatives
and
33
Thus
This shows that
is an eigenvector of
k is the largest eigenvalue of
and
is the eigenvector associated with the largest
eigenvalue.
34
Also
and
35
Summary
The first pair of canonical variates
, eigenvectors of the matrices
are found by finding
associated with the largest eigenvalue (same for
both matrices)
The largest eigenvalue of the two matrices is the
square of the first canonical correlation
coefficient f1
36
Note
have exactly the same eigenvalues (same for both
matrices)
Proof
then
and
37
The remaining canonical variates and canonical
correlation coefficients
The second pair of canonical variates
, so that
are found by finding
1. (U2,V2) are independent of (U1,V1).
2. The correlation between U2 and V2 is maximized
The correlation, f2, between U2 and V2 is called
the second canonical correlation coefficient.
38
The ith pair of canonical variates
, so that
are found by finding
1. (Ui,Vi) are independent of (U1,V1), ,
(Ui-1,Vi-1).
2. The correlation between Ui and Vi is maximized
The correlation, f2, between U2 and V2 is called
the second canonical correlation coefficient.
39
derivation ( 2nd pair of Canonical variates and
Canonical correlation)
Now
has covariance matrix
40
Now
and maximizing
Is equivalent to maximizing
subject to
Using the Lagrange multiplier technique
41
Now
and
also
gives the restrictions
42
These equations can used to show that
are eigenvectors of the matrices
associated with the 2nd largest eigenvalue (same
for both matrices)
The 2nd largest eigenvalue of the two matrices
is the square of the 2nd canonical correlation
coefficient f2
43
continuing
Coefficients for the ith pair of canonical
variates,
are eigenvectors of the matrices
associated with the ith largest eigenvalue (same
for both matrices)
The ith largest eigenvalue of the two matrices
is the square of the ith canonical correlation
coefficient fi
44
Example