Title: Exploratory Factor Analysis
1Chapter_15
- Exploratory Factor Analysis
- Field_2005
2What is factor analysis?
- Factor analysis (and principal component
analysis, which you know already from MANOVA) is
a technique for identifying groups or clusters of
variables underlying a set of measures. - Those variables are called 'factors', or 'latent
variables' since they are not directly
observable, e.g., 'intelligence'. - A 'latent variable' is a variable that cannot be
directly measured, but is assumed to be related
to several variables that can be measured.
(Glossary, p 736)?
3What is factor analysis used for?
- Factor analysis has 3 main uses
- To understand the structure of a set of
variables, e.g., intelligence - To construct a questionnaire to measure an
underlying variable - To reduce a large data set to a more manageable
size
4Where is factor analysis used?
- Factor analysis is popular in the Social
Sciences, Economy, and in Psychology - In personality psychology finding out
personality 'traits' such as 'extraversion-introv
ersion', 'neuroticism', etc. Questionnaires
typically are based in such factors - In economy finding out latent variables
underlying productivity, profits, and workforce
5The most basic data basisR-matrix
- An R-matrix is simply a correlation matrix with
Pearson r-coefficients between pairs of variables
as the off-diagonal elements. - In factor analysis one tries to find latent
variables that underlie clusters of correlations
in such an R-matrix.
6Example What makes a person popular?
These measures all tap different aspects
of 'popularity' of a person. Are there a few
underlying factors that can account for
them? Factor 1 sociability Factor 2
consideration to others
These measures all tap different aspects
of 'popularity' of a person. Are there a few
underlying factors that can account for them?
Factor 1
Factor 2
7Graphical representations of factors
- Factors can be visualized as axes along which we
can plot variables. - The coordinates of variables along each axis
represents the strength of the relationship
between that variable and each factor. In our
expl., we have 2 underlying factors. - The axis line ranges from -1 to 1, which is the
range of possible correlations r. - The position of a variable depends on its
correlation coefficient with the 2 factors.
82-D Factor plot
1 0,75 0,50 0,25 0 -0,25 -0,50
-0,75 -1
Selfish
Talk 2
Liar
The coordinate of a variable along a
classification axis is called 'factor loading' .
It is the Pearson correlation r between a factor
and a variable.
Interest
Talk 1
-1 -0,75 -0,5 -0,25 0 0,25 0,50 0,75
1
Soc Skills
Sociability
In this 2-dimensional factor plot, there are only
2 latent variables. Variables either load high on
'Sociability' or on 'Consideration to
others'. With 3 variables, we would have a
3D-factor plot. With gt3 factors, no graphical
factor plots are available any more.
Consideration
9Mathematical representation of factors
- The factors can be represented by a linear
equation - Yi b1X1 b2X2 ... bnXn ?i
- Factori b1Variable1 b2Variable2 ...
bnVariablen ?i - Note b0, the intercept, is missing, since the
axes intersect at 0. The b's represent the factor
loadings.
10Mathematical representation of factors
- Describing the factors in terms of the underlying
variables - Yi b1X1 b2X2 ... bnXn ?i
- Sociabilityi b1Talk 1i b2SocSkillsi
b3Interesti - b4Talk 2i b5Selfishi b6Liari ?i
- Considerationi b1Talk 1i b2SocSkillsi
b3Interesti - b4Talk 2i b5Selfishi b6Liari ?i
- Note Both equations are identical in form. They
include all variables measured. The values of b
will differ, though.
11Mathematical representation of factors
- Yi b1X1 b2X2 ... bnXn ?i
- In the 2 equations for 'Sociability' and
'Consideration', we can substitute the factor
loadings for the b-coefficients. - Sociabilityi .87 Talk 1i .96 SocSkillsi .92
Interesti - .00 Talk 2i -.10 Selfishi .09Liari ?i
- Considerationi .01Talk 1i -.03 SocSkillsi
.04 Interesti .82 Talk 2i .75 Selfishi
.70 Liari ?i - Note The first 3 var load high on 'Sociability',
the last 3 on 'consideration'. The b's are,
however, NOT the correlations from the R-Matrix.
12Factor matrix or component matrix
- The factor-loadings (b-values) can be arranged in
a matrix A, called the 'Factor matrix' or
'component matrix'. - Each column represents one factor
- Each row represents one variable
- Ideally, variables that load high on 1 factor
should load low on the other.
A
13Factors statistical or real-world phenomena?
- It is a matter of debate whether factors are
real-world phenomena, e.g., traits, abilities,
social classes, etc., or if they have only
statistical reality. - We will learn the statistics of factor analysis
here. We should keep in mind that there may not
be anything more except a statistical correlation
and if we find some it depends on our ingenuity
to give them a proper interpretation.
14Pattern matrix vs. Structure matrix
- Generally, factor loadings inform us about the
relative contribution a variable makes to a
factor. - But what are factor loadings, exactly?
- Are they correlations between a variable and a
factor or are they regression coefficients (b)? - When factors are unrelated, they can be rotated
orthogonally, and the correlation coefficient is
the same as the regression coefficient. - When factors are related underlyingly, they can
be rotated obliquely. Then, there are two kinds
of factor loadings the correlation coefficients
between each variable and factor (in the factor
structure matrix) and the regression coefficients
(b) for each variable on each factor (in the
factor pattern matrix). Both coefficients have a
different interpretation.
15Factor scores
- Knowing the scores of an individual in the
- variables, we can also calculate the factor
- scores for that person.
- A simplistic method for that is the weighted
average - The scores of subject i are weighted by the
regression coefficients of the regression
equations - Sociabilityi .87 Talk 1i .96 SocSkillsi .92
Interesti - .00 Talk 2i -.10 Selfishi .09Liari ?i
- Sociabilityi (.87 x 4) (.96 x 9) (.92 x 8)
(.00 x 6) (-.10 x 8) (.09 x 6) - 19.22
16Factor scores
- Knowing the scores of an individual in the
- variables, we can also calculate the factor
- scores for that person.
- A simplistic method for that is the weighted
average - The scores of subject i are weighted by the
regression coefficients of the regression
equations - Considerationi .01Talk 1i -.03 SocSkillsi
.04 Interesti .82 Talk 2i .75 Selfishi
.70 Liari ?i - Considerationi (.01 x 4) (-.03 x 9) (.04 x
8) - (.82 x 6) (.75 x 8) (.70 x 6)
-
- 15.21
17The problem with weighted averages
- When variables are measured with different
measurement scales, the resulting factor scores
cannot be compared. - Therefore, a better method is the regression
method which takes into account the initial
correlations between variables.
18The regression method
- In the regression method, the regression
coefficients b are substituted by factor scores
coefficients. - In order to obtain the factor score coefficient
matrix (B), the factor matrix (A) is divided by
the (R)-Matrix of the original correlations. - Matrices are divided by multiplying with their
inverse, R-1.
19Obtaining the factor score coefficient matrix (B)
with the regression methodB R-1 A
A
R-1
Factor 1
- Note The factor score coefficients in the Factor
Coefficient Matrix B preserve the loadings of the
original factor scores
Factor 2
20Factor score coefficients in the regression
equation
B
- The factor score coefficients are inserted into
the regression equation instead of the
b-regression coefficients - Sociability .343 Talk 1 .376 SocSkills .362
Interest - .000Talk 2 -.037 Selfish .039 Liar
- Sociability (.343 x 4) (.376 x 9) (.362 x
8) - (.000 x 6) -( .037 x 8) (.039 x 6)?
- 7.59
Factor score for subject i on 'sociability'
21Factor score coefficients in the regression
equation
B
- The factor score coefficients are inserted into
the regression equation instead of the
b-regression coefficients - Consideration .006 Talk 1 - .020 SocSkills
.020 Interest .473 Talk 2 .473 Selfish
.405 Liar - Consideration (.006 x 4) - (.020 x 9) (.020 x
8) (.473 x 6) (.473 x 8) (.405 x 6)? -
- 8.768
Factor score for subject i on 'consideration'
22Comparison
- Factor score coefficients
- Sociabilityi 7.59
- Considerationi 8.768
- The factor scores of subject i on the two factors
'sociability' and 'consideration' are very
similar. - Factor score coefficients have a mean0 and a
variance equal to the squared multiple
correlation between the estimated factor scores
and the true factor values.
- Weighted average
- Sociabilityi 19.22
- Considerationi 15.21
- The factor scores of subject i on the two factors
'sociability' and 'consideration' are quite
different sociability is higher than
consideration. -
? The regression method provides more accurate
factor scores than the weighted average method
does. However, the scores can correlate not only
with the factor under consideration but also with
another factor.
23Other methodsthat readjust the regression method
- Bartlett method
- produces unbiased factor scores that correlate
only with their own factor - but still factor scores can correlate with each
other.
- Anderson-Rubin method
- modifies the Bartlett method and produces
uncorrelated and standardized factor scores (mean
0 SD1)?
No matter what method for calculating the factor
scores we choose, in general, factor scores
represent a composite score for each individual
on a particular factor. The Anderson-Rubin method
is advised.
24Two uses of factor scores
- Reduction of a large set of data into a smaller
subset of measurement variables. - Then factor scores tell us each individual's
score on those variables. - With these scores further computations can be
conducted - (e.g., t-test between males and females on
sociability)?
- Overcoming collinearity
- When 2 variables (in a multiple regression
analysis) are correlated, we can run a factor
analysis and combine the 2 variables into a
single one. - The regression is rerun with this new factor as
predictor. - Multicollinearity will have vanished.
25Discovering factors
- There are different methods for discovering
factors, the main one being 'principal components
analysis (PCA)'. - Besides choosing a discovery method, we also have
to judge the importance of factors and interpret
them.
26Choosing a method for factor discoveryBefore
choosing a method, we have to decide what it is
that we want to do with our factor analysis
- Testing a specific hypothesis?
- ? Confirmatory factor analysis
- Confirmatory factor analysis (CFA) seeks to
determine if the number of factors and the
loadings of measured (indicator) variables on
them conform to what is expected on the basis of
pre-established theory.
- Exploring your data?
- ? Exploratory factor analysis
- Do you want to generalize to the population?
- ??inferential method or
- Do you only want to describe your sample
- ??descriptive method
- Principal component analysis and principal factor
analysis find factors for a particular sample.
http//faculty.chass.ncsu.edu/garson/PA765/factor.
htm
27Communality
- The common variance h2 of a variable is the part
of the reliable (non-error) variance that it
shares with other variables. This variance part
is called 'communality' as opposed to the
'specific' variance which is characteristic of
only this variable. - In terms of factor analysis, it is the proportion
of a variable's variance explained by a factor
structure. - In factor analysis, we are primarily interested
in the common variance.
http//www.siu.edu/epse1/pohlmann/factglos/
28CommunalityParadoxical situation Before running
a factor analysis, we have to know how much
common variance a variable has. However, we can
find this out only by running a factor analysis!
- Solution1 Principal component analysis
- Assuming that all variance is common variance.
We then run a principal component analysis on the
original data.
- Solution 2 Factor analysis
- Estimating the amount of common variance for each
variable. - The most popular method is to use the squared
multiple correlation (SMC) of each variable with
all others. - Once the factors are extracted, new communalities
can be calculated that represent the multiple
correlation between each variable and the
factors.
29Differences between PCA and FA
- Principal component analysis (PCA)?
- decomposes the original data into a set of linear
variates - it only identifies linear components in the data
- Here we will only be concerned with PCA!
- Factor analysis (FA)?
-
- derives a mathematical model from which factors
are estimated - only Factor Analysis is said to be capable of
truly identifying underlying factors
Both PCA and FA yield similar results for large
numbers of variables (gt30) with high communality
(gt.70) whereas they differ for smaller numbers of
variables (lt20) with low communality (lt.40).
30Theory behind principal component analysisPCA is
similar to MANOVA and discriminant analysis.
- MANOVA
- The Sum of Squares and Cross-product matrix
(SSCP) represents the variance and co-variance of
the multiple variables - MANOVA tries to find linear combinantions of the
dependent variables that can discriminate groups
of subjects - The eigenvectors of the SSCP represent these
linear variates
- PCA
- The correlation matrix is similar to the SSCP in
that it is an averaged and standardized version
(from -1 to 1) of the SSCP. - In PCA, the linear variates (eigenvectors) are
calculated from the correlation matrix. The
number of variates is always the number of
variables measured.
31Theory behind principal component analysis-
continued
- PCA
- The linear components of the correlation matrix
are caluclated by determining the eigenvalues of
the matrix. From the eigenvalues, eigenvectors
are calculated whose elements represent the
loading of a particular variable on a particular
factor (b-values) . - The eigenvalue also indicate the substantive
importance of its associated eigenvector.
- MANOVA
- The elements of the eigenvectors are the weights
of each variable on the variate. These are the
factor loadings. - The largest eigenvalue is a single indicator of
the importance of each variate. The idea is to
only retain factors with large eigenvalues.
32Principle component analysis (PCA)- Summary -
- By far the most common form of factor analysis,
PCA seeks a linear combination of variables such
that the maximum variance is extracted from the
variables. It then removes this variance and
seeks a second linear combination which explains
the maximum proportion of the remaining variance,
and so on. This is called the principal axis
method and results in orthogonal (uncorrelated)
factors. PCA analyzes total (common and unique)
variance.
http//faculty.chass.ncsu.edu/garson/PA765/factor.
htm
33Factor extraction the scree plot
Scree 'the loose stones or debris at the
base of a hill or cliff'
- How many factors should be extracted?
- ??We should only retain factors with large
eigenvalues - What is the statistical criterion for a
substantive factor? - ? The scree-plot graphs each eigenvalue (Y-axis)
against the factor with which it is associated
(X-axis). By graphing the eigenvalues, the
relative importance of each factor becomes
apparent. Only factors with eigenvalues gt1 should
be retained.
Top of hill high eigen- values
Base of hill 'scree'
http//janda.org/workshop/factor20analysis/SPSS2
0run/SPSS08.htm
34Criteria for selecting factors
- 1. Scree-plot only those factors that are no
'scree'. However, the scree-plot method is only a
'rule-of-the-thumb' - 2. Cattell recommends to select only factors
whose eigenvalues are gt1 - 3. Kaiser's criterion eigenvalues gt.7. SPSS uses
both Cattell's and Kaiser's criterion. - In principal component analysis we start with a
communality of 1. Then we have to discard some
information since otherwise we could not reduce
the data. Therefore, after PCA the communality
will always be lt1.
35Improving interpretation Factor rotation
- After the factors have been extracted, we can
determine how high the variables load on them. - Factors usually load high on one factor and low
on the others. In order to get rid of the low
factor loadings as much as possible, the factors
represented as axes can be rotated so that
the axes will cut through the variable cloud
which loads highest on this factor. - If factor rotation maintains an orthogonal
relation between the factors (90 angle, meaning
'independence'), we speak of orthogonal rotation.
Factors remain uncorrelated. - If factor rotation is done separately for each
factor, (resulting in a non 90 angle) we speak
of oblique rotation. The factors are allowed to
correlate with each other.
36Example of 2 factorsClassifying university
lecturers
- The demographic group of university lecturers is
characterized by two underlying factors - First factor alcoholism
- Second factor achievement
- Sets of variables load high on either of the 2
factors.
37Factor rotation
Orthogonal rotation
Oblique rotation
??is the rotation angle
?
?
?
Factor 1
Factor 1 Alcohol
Factor 2 Achievement
Factor 2
38Example of 2 factorsClassifying university
lecturers
- Which method should we choose orthogonal or
oblique rotation for the two factors? - 1. For theoretical reasons, we might choose
oblique rotation since achievement and alcohol
seem to be correlated in real-life. - 2. For statistical reasons, oblique rotation
might be preferable also. Note that for the 2nd
graph on the previous slide oblique rotation
would not cut through the 2nd variable cloud. - Both kinds of rotations should be run. If an
oblique factor relations turns out, orthogonal
rotation should be discarded. However, oblique
rotation should always be theoretically motivated.
39Oblique rotation the factor transformation matrix
- In oblique rotation, each factor can be rotated
by a different angle. These angles are
represented in a factor transformation matrix. - A factor transformation matrix is a square matrix
? with sines and cosines of the angles of axis
rotation (?). The square matrix has as many rows
and columns as there are factors. In our example - The matrix is multiplied by the matrix of
unrotated factor loadings A, to obtain a matrix
of rotated factor loadings.
40Choosing a method of orthogonal factor rotation
- SPSS has 3 methods of orthogonal rotation
- Quartimax maximizes the spread of factor
loadings for a variable across all factors, i.e.,
it maximizes the variance of the rows of a factor
matrix. Easy to interpret because it concentrates
on the variables and how they load on factors. - Varimax maximizes the dispersion of loadings
within factors, i.e., it attempts to load a
smaller number of variables highly onto each
factor so that factor clusters become easier to
interpret. - Equamax Hybrid between Quarti- and Varimax.
Recommended for beginners easy variable
interpretation. Recommended overall good factor
interpretation.
41Choosing a method of oblique factor rotation
- SPSS has 2 methods of oblique rotation
- Direct oblimin The degree to which factors are
allowed to correlate depends on the value of a
constant, delta . Per default, delta is 0 in SPSS
so that high correlations are ruled out. - Promax a faster version for very large data sets.
42 Orthogonal vs. Oblique rotation in psychology
- Some advices
- If there are theoretical reasons to assume that
any factors are inter-related, oblique rotation
should be used. - There is good reason to believe that in the human
psychological domain there are no orthogonal
factors at all. Somehow, everything depends on
everything else.
43Substantive importance of factor loadings
- When you have found a factor structure, you have
to decide which variable makes up which factors.
The factor loadings tell us this. They can be
tested for significance.
The loading of an absolute value of gt.3 is
considered important. However, the significance
depends on the sample size. The loads in the
table can be considered significant on the 0.001
level (1-tailed).
Note in large samples small factor loads can
still be meaningful.
44Substantive importance of factor loadings
- The amount of variance in a factor accounted for
by a variable can be found by squaring the
variable's factor load (R2). - Only factor loadings with a R2 gt .40 (i.e., 16
of variance explained) should be considered
meaningful.
45Research exampleThe 'SPSS-Anxiety
Questionnaire' SAQ
- One use of Factor Analysis is constructing
questionnaires. - With the SAQ, students' anxiety towards SPSS
shall be measured, using 23 questions. - The questionnaire can be used to predict
individuals' anxiety towards learning SPSS. - Furthermore, the factor structure behind 'anxiety
to use SPSS' shall be explored which latent
variables contribute to anxiety about SPSS?
46 The SAQ
47The SAQ data(using SAQ.sav)?
- There are 23 questions (q01-q23), organized in
columns. - There are n2571 subjects, organized in rows.
- The questions are rated on a 5-point
Likert-scale
q01-q23
n1- n2571
48Initial considerations sample size
- The reliability of factor analysis relies on the
sample size. - As a 'rule of thumb', there should be 10-15
subjects per variable. - The stability of a factor solution depends on
- 1. Absolute sample size
- 2. Magnitude of factor loading (gt.6)
- 3. Communalities (gt.6 the higher the better)?
- The KMO-measure is the ratio of the squared
correlation between variables to the squared
partial correlation between variables. It ranges
from 0-1. Values between .7 and .8 are good. They
suggest a factor analysis. - KMO Kaiser-Meyer-Olkin measure of sampling
adequacy
49Data screening
- The variables in the questionnaire should
intercorrelate if they measure the same thing.
Questions that tap the same sub-variable, e.g.,
worry, intrusive thoughts, or physiological
arousal, should be highly correlated. - If there are questions that are not
intercorrelated with others, they should not be
entered into the factor analysis. - If questions correlate too highly, extreme
multi-collinearity or even singularity (perfectly
correlated variables) result. - ? Too low and too high intercorrelations should
be avoided. - Finally, variables should be roughly normally
distributed.
50Running the analysis(using SAQ.sav)?
- Analyze ? Data Reduction ? Factor ...
Main dialog box
Transfer all questions to the variables window
51Descriptives
Means and SD for each variable Produces the
R-matrix Signif values of each correlation For
multicollinearity and singularity KMO and
Bartlett's test
Correlation matrix based on the model In the
Anti-image matrix the relation between two
variables with the influence of all other
variables having been eliminated is given
The determinant should be gt .00001
52Extraction
Choose Principal components Other options
Analyze the Corr matrix OR the covariance matrix
Two plots can be displayed Unrotated
factors Scree plot
Cattel's (gt1) or Kaiser's (gt.7)? recommendation
53Rotation
Choose Varimax
Normally, 25 iterations are enough.
However, here, we have a huge sample
Helps interpret the final rotated analysis
54Scores
Factor scores for each subject will be saved in
the data editor
Best method of obtaining factor
scores Anderson-Rubin
Produces matrix B with the b-values
55Options
Subjects with missing data for any variable are
excluded
Variables are sorted by size of their factor
loadings Too small variables should not be
displayed
Variables are sorted by size of their factor
loadings Too small variables should not be
displayed
56Run the Factor AnalysisThen rerun it again,
this time changing the rotation to oblique
rotation 'Direct Oblimin'
Choose 'Direct Oblimin' this time
The output will be the same except for the
rotation.
57Interpreting output from SPSS
- Preliminary analysis
- data screening
- assumption testing
- sampling adequacy
58'Univariate Descriptives' withMean, SD, and n of
sample
59Correlation Matrix R
Selected output for Q-5 19-23 Labels of
questions omitted
These are the Pearson corr coefficients
between all pairs of variables
These are the Significance levels for all
correlations. Note they are almost all
significant!
Determinant .0005271 OK!
60Scanning theCorrelation Matrix
2. Then scan the corr coefficients for gt.9 ?
none! ? no problem with multicollinearity
All Q seem to be fine!
1. Look for many low correlations (p gt
.05)? for a single variable ? none!
61Inverse of correlation matrix R-1- for your
attention...
62Bartlett's test of sphericityKMO statistics
KMO-measures gt.9 are superb! KMO measures the
ratio of the squared correlation between
variables to the squared partial
correlation between variables.
KMO measures for individual factors are produced
on the diagonal of the anti-image corr matrix
? The KMO-measures give us a hint at which
variables should be excluded from the factor
analysis
Bartlett's test tests if the R-matrix is
an identity matrix (matrix with only 1's in
the diagonal and 0's off-diagonal). However, we
want to have correlated variables, so the
off-diagonal elements should NOT be 0. Thus, the
test should be significant, i.e., the R-matrix
should NOT be an identity matrix.
63(2nd part of the) Anti-Images Matrices
Red underlined are the KMO-measures for the
individual variables They are all high
The off-diagonal numbers are the partial corr
between variables. They should all be very small,
which they are.
Anti-Image Correlation
Q1 Q2 Q3 Q4 Q5....
Q19 Q20 Q21 Q22 Q23
64Factor extraction
Before extraction, there are as many factors as
there are variables, n23
Initial eigenvalues and explained variances
are ordered in decreasing magnitude
Before extraction
After extraction
After rotation
Rotation optimizes factor structure
(Varimax). The relative impor- tance of factors
is equalized. The explained variance of the 4
factors is more similar after rotation.
Only 4 factors with an eigenvalue gt 1 are
retained (Fisher's criterion)?
65Communalities
Before and after extraction
Before and after extraction
E.g. 43,5 of variance in Q1 is common, shared
variance
- Communality is the proportion of common
- variance within a variable.
- Initially, communality is assumed to be 1 ('all
variance is common'). After extraction, the true
communalities can be judged better.
Before extraction, there are as many factors as
there are variables, n23, so that all variance
is explained by the factors and communality is 1.
(No data reduction yet). After extraction, some
of the factors are retained, others are
dismissed. This leads to a welcome data
reduction. Now the amount of variation in each
variable explained by the factors is the
communality.
66Component matrix
Before rotation, most variables loaded highest
on the first factor (which can therefore explain
a high amount of variation (31,7)?
- The component matrix shows the factor loadings of
each variable before rotation. - SPSS has already extracted 4 components
(factors). - How can we decide how many factors we should
retain? - ??scree plot
Loadings lt.4 are suppressed, hence the blank
spaces.
67Scree plot
- After 2 or after 4 factors, the curve inflects.
- Since we have a huge sample, Eigenvalues can
still be well interpreted gt1, so retaining 4 is
justified. - However, it is also possible to retain just 2.
68Reproduced correlations
The first half of the Reproduction table contains
the correlation coefficients between all of the
questions based on the factor model. It contains
the "explained" correlations among the
variables. The diagonal contains the
communalities after extraction for each
variable. ? Compare with the 'Communalities'
matrix
69Reproduced correlations and residuals
The correlations in the reproduced matrix
correspond to those in the original R-matrix,
however, they differ since they now stem from the
model rather than from the data. To assess the
fit of the model to the data, we can
determine the differences between the observed
and the model correlations
R-Matrix from observed data Q01xQ02
Expl. residual robserved rfrom model
residual Q1Q2 (-.099) - (-.112)? .013 or
1.3E-02 It is these residuals which are given in
the 2nd half of the 'Reproduced correlations
matrix residuals'
Reproduced Matrix from model Q01xQ02
702nd half of Reproduced correlations residuals
Expl. from previous slide residual
robserved rfrom model residual Q1Q2 (-.099)
- (-.112)? .013 or 1.3E-02
For a good model, the residuals should be small.
In the footnote below the table, SPSS tells us
that only 35 of all resid are gt.05. (A good
model should have less than 50 of the
residuals be gt .05)?
71Rotated component matrix orthogonal rotation
The Rotated component matrix has the same
information as the component matrix, only that it
is calculated after orthogonal rotation (here
with VARIMAX).
Loadings lt.4 are suppressed, hence the blank
spaces.
72Comparing the component with the Rotated
component matrix
Before rotation, most Qs loaded highly on the
first extracted factor and much lower on the
following ones.
After rotation, all 4 extracted factors have a
couple of Qs loading highly on them.
Q12 loads equally high on factor 1 and 2!
Q12 People try to tell you that SPSS makes
statistics easier to understand but it doesn't
73Looking at the content of the Qs
- In order to interpret the factors, we have to
look at the content of the Qs that load highly on
them
74Looking at the content of the Qs
75Looking at the content of the Qs
76Looking at the content of the Qs
774 subscales of the SAQ
- Now the question arises if
- 1. SAQ does not measure what it says ('SPSS
anxiety') but some related constructs - 2. These four constructs are sub-components of
SPSS anxiety. - ? The Factor Analysis does not tell us
78Component or Factor transformation matrix
This Matrix tells us to what degree factors were
rotated in order to obtain a solution. If no
rotation were necessary, the matrix would be an
identity matrix (1's at the diagonal, 0's at all
off-diagonal positions). If orthogonal rotation
was completely satisfying, the matrix should be
symmetrical, i.e., all numbers above and below
the diagonal should be the same. If they aren't ?
try oblique rotation (seems appropriate here)? ?
The matrix is hard to interpret and as beginners
we are advised to ignore it...
79 Oblique rotationWhile in orthogonal
rotation, we have only one matrix, the factor
matrix, in oblique rotation the factor matrix is
split up into the pattern matrix and the
structure matrix.
- Structure Matrix
- takes into account the relationship betweeen
factors - ? should be used as a check on the pattern matrix
- ? should also be reported
- Pattern matrix
- contains the factor loadings and is interpreted
like the factor matrix. - ? is easier to interpret
- ? should be reported
80Oblique rotation pattern matrix
The pattern matrix gives us the unique
contribution of a variable to a factor. The same
4 patterns seem to have emerged
F1 'Fear of statistics'
F2 'Fear of peer evaluation'
F3 'Fear of computers'
F4 'Fear of mathematics'
81Oblique rotation structure matrix
In the structure matrix, the shared variance is
not ignored. Now several variables load highly
onto more than 1 factor.
Factors 1 and 3 'fear of statistics' and 'fear
of computers' go together. Also F4 'fear of
math' is related
Note Factor 3 'fear of computers' appears twice,
each time together with a different factor
Note Factor 3 'fear of computers' appears twice,
each time together with a different factor
Factors 3 and 4 'fear of computers' and 'fear
of math' go together
82Oblique rotation Component correlation matrix
The Component Correlation matrix contains the
correlation coefficients between factors. F2
'fear of peer evaluation' has little relation
with the others, but F1,3,4 'fear of stats,
computers, and maths', are somewhat interrelated.
? Independence of factors cannot be upheld, given
the correlations between the factors and also the
content of the factors 'fear of stats,
computers, and maths's, all have a similar
meaning. ? oblique rotation is more sensible.
83Factors statistically and conceptually
- The Factor Analysis has extracted 4 factors, 3 of
which are correlated with each other, one of
which is rather independent. An oblique rotation
is more sensible given the interrelation between
3 factors. - How does that match the interpretation of the
factors? - The three correlated factors
- fear of stats fear of math fear of
computers - are also conceptually closely related whereas the
- 4th factor 'fear of negative peer evaluation',
being socially based, is also conceptually
different. - Hence, the statistics and the meaning of the
factors go along with each other rather nicely.
84Factor scores- Matrix B
- The factor scores Matrix is the one from which
the factor scores and the covariance matrix of
factor scores is calculated. - If you are not particularly interested in the
math of it, you are forgiven if you ignore it.
'The factor matrix presents the loadings, by
which the existence of a pattern for the
variables can be ascertained. The factor score
matrix gives a score for each case (...) on these
patterns.'
http//www.hawaii.edu/powerkills/UFA.HTM
85Component Score Covariance Matrix
This matrix tells us the relationship between
factor scores. It is an unstandardized
correlation matrix. If factor scores are totally
uncorrelated, it should be an identity matrix
(with all diagonal elements 1 and all
off-diagonal elements 0). This seems to be met
here the numbers are all very very small.
- (Note SPSS 10 doesn't have it)?
86Case summaries of Factor scores
- The scores that we asked SPSS to calculate are
uncorrelated because they are based on the
Anderson-Rubin method which explicitly prevents
correlations. - The individual scores for each subject on all 4
components are listed in the data editor under
FAC1_1, FAC 2_1, FAC3_1, FAC4_1. - We can look at them by the aid of case summaries
for the first 10 cases (otherwise, the output
will be too voluminous).
87Case summaries of Factor scores
- Analyze ? Report ? Case Summaries...
Transfer the variables FAC1_1, 2_1, 3_1, and 4_1
to the Variables window.
Limit the number of cases to 10
88Output Case summaries of Factor scores
- With the factor scores you can compare single
individual's fear of math, stats, computers, and
peer evaluation. E.g., subject 9 scores highly on
all 4 factors, in particular on factors 3. - Also, factor scores for all 4 factors can be
added and a sum score can be derived. - Note that factor scores are standardized.
89Interim summary
- SAQ has 4 factors underlyingly, which we can
identify as fear of - stats maths computers peer evaluation
- Oblique rotation is to be preferred since three
of the four factors are inter-related,
statistically as well as conceptually - The use of Factor Analysis here is purely
exploratory. It helps you understand what factors
are underlying large data sets - Informed decisions may follow from such an
exploratory Factor Analysis, e.g., wrt working
out a better questionnaire.
90Reliability analysis
- Here, Factor Analysis has been used to validate a
questionnaire. Therefore, it is necessary to know
how reliable the scales are. - There are various ways of testing reliability
- 1. test-retest reliability do subjects achieve
similar scores when the SAQ is administered
again, some time later? - 2. split-half reliability Split the scale in
half and administer it to one subject if the
scale is consistent, the subject should obtain 2
similar scores. - A generalized form of the split-half reliability
is - 3. Cronbach's alpha, ????which splits data in two
in every possible way and computes the
correlation coefficient for each split.
91Cronbach's alpha, ??
Average of the covariance multiplied by the
square number of items.
- ????
- ???????????
- ?
- ? N2Cov__________
- ?s2item ? Covitem
- For all items on the scale, we calculate the
variance (s2) and the covariance between the item
and any other item on the scale. Hence, we need a
variance-covariance matrix of all items. - In this matrix, the diagonal elements will be the
item's variance and the off-diagonal elements
will be the covariation with another item.
Sum of variance plus sum of covariance of all
items
92Interpreting Cronbach's ?
- Reliability values of .7 - .8 are acceptable for
?. - If there are subscales, ? should be determined
for every subscale separately. - Reverse-phrased items (Q03 Standard deviations
excite me!) have to be reversed back SD? SA and
D? A. Otherwise the sum of covariation will
decrease only because the item had a negative
value.
SD(6-15) D(6-24) N(6-33) A(6-42) SA
(6-51)
Numerically, reversal is done by adding 1 to the
maximum score, here 516. Then subtract each
original score from 6.
93Reversal using SPSS
6-q03 yields the reversal
Create a new target variable q03
If you don't want to change the original SAQ.sav
file, there is another one, called SAQ(Item 3
reversed).sav
Confirm OK Now q03 is reversed
94Reliability analysis on SPSS
- We will determine reliability for each of the 4
subscales from orthogonal rotation separately
95Reliability analysis on SPSSsubscale 1 Fear of
computers
- Analyze ? Scale ? Reliability Analysis
Transfer all items of 1 factor to the Item's
window, here Subscale 1 (Items
6,7,10,13,14,15,18). Proceed alike with the
other 3 subscales
Leave the default setting for model Cronbach's
Alpha. Tick 'List Item Lables'
96Reliability analysis on SPSS
'Scale if item deleted' tests whether
alpha decreases if one item is deleted. In a
reliable test, this should not matter much.
We therefore expect still a high alpha (gt.8)? if
an item is delete.
For a basic realibility check, just tick 'Scale
if item deleted' and 'correlations'
Click OK
97Subscale 1 'fear of math'Correlation
Correlation matrix for subscale 1. Variables
mostly show a correlation gt.3 with each other.
98Subscale 1 'fear of math'Item-total Statistics
The individual ? values should not be
greater than the overall ? since then their
deletion would improve reliability! None of
the items here affect reliability substantially
The Corrected Item-total correlation give us the
correlations between each item and the total
score from the questionnare. In a reliable
scale, all items should correlate with the total.
Values should not be lt.3. Cronbach's ? is gt.7
for all items. Average ? .8234 (Cronbach's
?)? ? Subscale 1 is reliable.
99Reliability analysis on SPSSsubscale 2 Fear of
statisticsCorrelation matrix and item-total
statistics
Subscale 2, as well, has a good Cronbach's ?
.758 Individual 's ??are not higher than the
overall ? . ? Subscale 2 is reliable!
100Reliability analysis on SPSSsubscale 3 Fear of
mathsCorrelation matrix and item-total statistics
Subscale 3, as well, has a good Cronbach's ?
.8194 Individual 's ??are not higher than the
overall ? . ? Subscale 3 is reliable.
101Reliability analysis on SPSSsubscale 4 Fear of
negative peer evaluationCorrelation matrix and
item-total statistics
Subscale 4, however, has a poor Cronbach's ?
.5699 Individual 's ??are not higher than the
overall ? . ? Subscale 4 is not reliable.
What's wrong with this sub- scale? It might have
too heterogeneous questions which decreases
internal consistency. ? This subscale should
be rethought!
Q23 has low item-total correlation