Title: Factor Analysis
1Factor Analysis
- An Alternative technique for studying correlation
and covariance structure
2Let
have a p-variate Normal distribution
with mean vector
The Factor Analysis Model
Let F1, F2, , Fk denote independent standard
normal observations (the Factors)
Let e1, e2, , ep denote independent normal
random variables with mean 0 and var(ei) yp
Suppose that there exists constants lij (the
loadings) such that
x1 m1 l11F1 l12F2 l1k Fk e1 x2 m2
l21F1 l22F2 l2k Fk e2 xp mp lp1F1
lp2F2 lpk Fk ep
3Factor Analysis Model
where
and
4Note
hence
and
i.e. the component of variance of xi that is due
to the common factors F1, F2, , Fk.
i.e. the component of variance of xi that is
specific only to that observation.
5F1, F2, , Fk are called the common factors
e1, e2, , ep are called the specific factors
the correlation between xi and Fj.
if
6Rotating Factors
Recall the factor Analysis model
This gives rise to the vector
having covariance
matrix
Let P be any orthogonal matrix, then
and
7Hence if
with
is a Factor Analysis model
then so also is
with
where P is any orthogonal matrix.
8(No Transcript)
9Rotating the Factors
10The process of exploring other models through
orthogonal transformations of the factors is
called rotating the factors
- There are many techniques for rotating the
factors - VARIMAX
- Quartimax
- Equimax
VARIMAX rotation attempts to have each individual
variables load high on a subset of the factors
11Extracting the Factors
12- Several Methods we consider two
- Principal Component Method
- Maximum Likelihood Method
13Principle Component Method
Recall
where
are eigenvectors of S of length 1 and
are eigenvalues of S.
14Hence
Thus
This is the Principal Component Solution with p
factors
Note The specific variances, yi, are all zero.
15The objective in Factor Analysis is to explain
the correlation structure in the data vector with
as few factors as necessary
It may happen that the latter eigenvalues of S
are small.
16In addition let
In this case
where
17Maximum Likelihood Estimation
Let
denote a sample from
where
The joint density of
is
18The Likelihood function is
The maximum likelihood estimates
Are obtained by numerical maximization of
19Example Olympic decathlon Scores
Data was collected for n 160 starts (139
athletes) for the ten decathlon events (100-m
run, Long Jump, Shot Put, High Jump, 400-m run,
110-m hurdles, Discus, Pole Vault, Javelin,
1500-m run). The sample correlation matrix is
given on the next slide
20Correlation Matrix
21(No Transcript)
22Identification of the factors