Exploratory Factor Analysis - PowerPoint PPT Presentation

1 / 101
About This Presentation
Title:

Exploratory Factor Analysis

Description:

The scree-plot graphs each eigenvalue (Y-axis) against the factor with which it ... However, the scree-plot method is only a 'rule-of-the-thumb' ... – PowerPoint PPT presentation

Number of Views:681
Avg rating:3.0/5.0
Slides: 102
Provided by: iiMet
Category:

less

Transcript and Presenter's Notes

Title: Exploratory Factor Analysis


1
Chapter_15
  • Exploratory Factor Analysis
  • Field_2005

2
What is factor analysis?
  • Factor analysis (and principal component
    analysis, which you know already from MANOVA) is
    a technique for identifying groups or clusters of
    variables underlying a set of measures.
  • Those variables are called 'factors', or 'latent
    variables' since they are not directly
    observable, e.g., 'intelligence'.
  • A 'latent variable' is a variable that cannot be
    directly measured, but is assumed to be related
    to several variables that can be measured.
    (Glossary, p 736)?

3
What is factor analysis used for?
  • Factor analysis has 3 main uses
  • To understand the structure of a set of
    variables, e.g., intelligence
  • To construct a questionnaire to measure an
    underlying variable
  • To reduce a large data set to a more manageable
    size

4
Where is factor analysis used?
  • Factor analysis is popular in the Social
    Sciences, Economy, and in Psychology
  • In personality psychology finding out
    personality 'traits' such as 'extraversion-introv
    ersion', 'neuroticism', etc. Questionnaires
    typically are based in such factors
  • In economy finding out latent variables
    underlying productivity, profits, and workforce

5
The most basic data basisR-matrix
  • An R-matrix is simply a correlation matrix with
    Pearson r-coefficients between pairs of variables
    as the off-diagonal elements.
  • In factor analysis one tries to find latent
    variables that underlie clusters of correlations
    in such an R-matrix.

6
Example What makes a person popular?
These measures all tap different aspects
of 'popularity' of a person. Are there a few
underlying factors that can account for
them? Factor 1 sociability Factor 2
consideration to others
These measures all tap different aspects
of 'popularity' of a person. Are there a few
underlying factors that can account for them?
Factor 1
Factor 2
7
Graphical representations of factors
  • Factors can be visualized as axes along which we
    can plot variables.
  • The coordinates of variables along each axis
    represents the strength of the relationship
    between that variable and each factor. In our
    expl., we have 2 underlying factors.
  • The axis line ranges from -1 to 1, which is the
    range of possible correlations r.
  • The position of a variable depends on its
    correlation coefficient with the 2 factors.

8
2-D Factor plot
1 0,75 0,50 0,25 0 -0,25 -0,50
-0,75 -1
Selfish
Talk 2
Liar
The coordinate of a variable along a
classification axis is called 'factor loading' .
It is the Pearson correlation r between a factor
and a variable.
Interest
Talk 1
-1 -0,75 -0,5 -0,25 0 0,25 0,50 0,75
1
Soc Skills
Sociability
In this 2-dimensional factor plot, there are only
2 latent variables. Variables either load high on
'Sociability' or on 'Consideration to
others'. With 3 variables, we would have a
3D-factor plot. With gt3 factors, no graphical
factor plots are available any more.
Consideration
9
Mathematical representation of factors
  • The factors can be represented by a linear
    equation
  • Yi b1X1 b2X2 ... bnXn ?i
  • Factori b1Variable1 b2Variable2 ...
    bnVariablen ?i
  • Note b0, the intercept, is missing, since the
    axes intersect at 0. The b's represent the factor
    loadings.

10
Mathematical representation of factors
  • Describing the factors in terms of the underlying
    variables
  • Yi b1X1 b2X2 ... bnXn ?i
  • Sociabilityi b1Talk 1i b2SocSkillsi
    b3Interesti
  • b4Talk 2i b5Selfishi b6Liari ?i
  • Considerationi b1Talk 1i b2SocSkillsi
    b3Interesti
  • b4Talk 2i b5Selfishi b6Liari ?i
  • Note Both equations are identical in form. They
    include all variables measured. The values of b
    will differ, though.

11
Mathematical representation of factors
  • Yi b1X1 b2X2 ... bnXn ?i
  • In the 2 equations for 'Sociability' and
    'Consideration', we can substitute the factor
    loadings for the b-coefficients.
  • Sociabilityi .87 Talk 1i .96 SocSkillsi .92
    Interesti
  • .00 Talk 2i -.10 Selfishi .09Liari ?i
  • Considerationi .01Talk 1i -.03 SocSkillsi
    .04 Interesti .82 Talk 2i .75 Selfishi
    .70 Liari ?i
  • Note The first 3 var load high on 'Sociability',
    the last 3 on 'consideration'. The b's are,
    however, NOT the correlations from the R-Matrix.

12
Factor matrix or component matrix
  • The factor-loadings (b-values) can be arranged in
    a matrix A, called the 'Factor matrix' or
    'component matrix'.
  • Each column represents one factor
  • Each row represents one variable
  • Ideally, variables that load high on 1 factor
    should load low on the other.

A
13
Factors statistical or real-world phenomena?
  • It is a matter of debate whether factors are
    real-world phenomena, e.g., traits, abilities,
    social classes, etc., or if they have only
    statistical reality.
  • We will learn the statistics of factor analysis
    here. We should keep in mind that there may not
    be anything more except a statistical correlation
    and if we find some it depends on our ingenuity
    to give them a proper interpretation.

14
Pattern matrix vs. Structure matrix
  • Generally, factor loadings inform us about the
    relative contribution a variable makes to a
    factor.
  • But what are factor loadings, exactly?
  • Are they correlations between a variable and a
    factor or are they regression coefficients (b)?
  • When factors are unrelated, they can be rotated
    orthogonally, and the correlation coefficient is
    the same as the regression coefficient.
  • When factors are related underlyingly, they can
    be rotated obliquely. Then, there are two kinds
    of factor loadings the correlation coefficients
    between each variable and factor (in the factor
    structure matrix) and the regression coefficients
    (b) for each variable on each factor (in the
    factor pattern matrix). Both coefficients have a
    different interpretation.

15
Factor scores
  • Knowing the scores of an individual in the
  • variables, we can also calculate the factor
  • scores for that person.
  • A simplistic method for that is the weighted
    average
  • The scores of subject i are weighted by the
    regression coefficients of the regression
    equations
  • Sociabilityi .87 Talk 1i .96 SocSkillsi .92
    Interesti
  • .00 Talk 2i -.10 Selfishi .09Liari ?i
  • Sociabilityi (.87 x 4) (.96 x 9) (.92 x 8)
    (.00 x 6) (-.10 x 8) (.09 x 6)
  • 19.22

16
Factor scores
  • Knowing the scores of an individual in the
  • variables, we can also calculate the factor
  • scores for that person.
  • A simplistic method for that is the weighted
    average
  • The scores of subject i are weighted by the
    regression coefficients of the regression
    equations
  • Considerationi .01Talk 1i -.03 SocSkillsi
    .04 Interesti .82 Talk 2i .75 Selfishi
    .70 Liari ?i
  • Considerationi (.01 x 4) (-.03 x 9) (.04 x
    8)
  • (.82 x 6) (.75 x 8) (.70 x 6)
  • 15.21

17
The problem with weighted averages
  • When variables are measured with different
    measurement scales, the resulting factor scores
    cannot be compared.
  • Therefore, a better method is the regression
    method which takes into account the initial
    correlations between variables.

18
The regression method
  • In the regression method, the regression
    coefficients b are substituted by factor scores
    coefficients.
  • In order to obtain the factor score coefficient
    matrix (B), the factor matrix (A) is divided by
    the (R)-Matrix of the original correlations.
  • Matrices are divided by multiplying with their
    inverse, R-1.

19
Obtaining the factor score coefficient matrix (B)
with the regression methodB R-1 A
A
R-1
Factor 1
  • Note The factor score coefficients in the Factor
    Coefficient Matrix B preserve the loadings of the
    original factor scores


Factor 2
20
Factor score coefficients in the regression
equation
B
  • The factor score coefficients are inserted into
    the regression equation instead of the
    b-regression coefficients
  • Sociability .343 Talk 1 .376 SocSkills .362
    Interest
  • .000Talk 2 -.037 Selfish .039 Liar
  • Sociability (.343 x 4) (.376 x 9) (.362 x
    8)
  • (.000 x 6) -( .037 x 8) (.039 x 6)?
  • 7.59

Factor score for subject i on 'sociability'
21
Factor score coefficients in the regression
equation
B
  • The factor score coefficients are inserted into
    the regression equation instead of the
    b-regression coefficients
  • Consideration .006 Talk 1 - .020 SocSkills
    .020 Interest .473 Talk 2 .473 Selfish
    .405 Liar
  • Consideration (.006 x 4) - (.020 x 9) (.020 x
    8) (.473 x 6) (.473 x 8) (.405 x 6)?
  • 8.768

Factor score for subject i on 'consideration'
22
Comparison
  • Factor score coefficients
  • Sociabilityi 7.59
  • Considerationi 8.768
  • The factor scores of subject i on the two factors
    'sociability' and 'consideration' are very
    similar.
  • Factor score coefficients have a mean0 and a
    variance equal to the squared multiple
    correlation between the estimated factor scores
    and the true factor values.
  • Weighted average
  • Sociabilityi 19.22
  • Considerationi 15.21
  • The factor scores of subject i on the two factors
    'sociability' and 'consideration' are quite
    different sociability is higher than
    consideration.

? The regression method provides more accurate
factor scores than the weighted average method
does. However, the scores can correlate not only
with the factor under consideration but also with
another factor.
23
Other methodsthat readjust the regression method
  • Bartlett method
  • produces unbiased factor scores that correlate
    only with their own factor
  • but still factor scores can correlate with each
    other.
  • Anderson-Rubin method
  • modifies the Bartlett method and produces
    uncorrelated and standardized factor scores (mean
    0 SD1)?

No matter what method for calculating the factor
scores we choose, in general, factor scores
represent a composite score for each individual
on a particular factor. The Anderson-Rubin method
is advised.
24
Two uses of factor scores
  • Reduction of a large set of data into a smaller
    subset of measurement variables.
  • Then factor scores tell us each individual's
    score on those variables.
  • With these scores further computations can be
    conducted
  • (e.g., t-test between males and females on
    sociability)?
  • Overcoming collinearity
  • When 2 variables (in a multiple regression
    analysis) are correlated, we can run a factor
    analysis and combine the 2 variables into a
    single one.
  • The regression is rerun with this new factor as
    predictor.
  • Multicollinearity will have vanished.

25
Discovering factors
  • There are different methods for discovering
    factors, the main one being 'principal components
    analysis (PCA)'.
  • Besides choosing a discovery method, we also have
    to judge the importance of factors and interpret
    them.

26
Choosing a method for factor discoveryBefore
choosing a method, we have to decide what it is
that we want to do with our factor analysis
  • Testing a specific hypothesis?
  • ? Confirmatory factor analysis
  • Confirmatory factor analysis (CFA) seeks to
    determine if the number of factors and the
    loadings of measured (indicator) variables on
    them conform to what is expected on the basis of
    pre-established theory.
  • Exploring your data?
  • ? Exploratory factor analysis
  • Do you want to generalize to the population?
  • ??inferential method or
  • Do you only want to describe your sample
  • ??descriptive method
  • Principal component analysis and principal factor
    analysis find factors for a particular sample.

http//faculty.chass.ncsu.edu/garson/PA765/factor.
htm
27
Communality
  • The common variance h2 of a variable is the part
    of the reliable (non-error) variance that it
    shares with other variables. This variance part
    is called 'communality' as opposed to the
    'specific' variance which is characteristic of
    only this variable.
  • In terms of factor analysis, it is the proportion
    of a variable's variance explained by a factor
    structure.
  • In factor analysis, we are primarily interested
    in the common variance.

http//www.siu.edu/epse1/pohlmann/factglos/
28
CommunalityParadoxical situation Before running
a factor analysis, we have to know how much
common variance a variable has. However, we can
find this out only by running a factor analysis!
  • Solution1 Principal component analysis
  • Assuming that all variance is common variance.
    We then run a principal component analysis on the
    original data.
  • Solution 2 Factor analysis
  • Estimating the amount of common variance for each
    variable.
  • The most popular method is to use the squared
    multiple correlation (SMC) of each variable with
    all others.
  • Once the factors are extracted, new communalities
    can be calculated that represent the multiple
    correlation between each variable and the
    factors.

29
Differences between PCA and FA
  • Principal component analysis (PCA)?
  • decomposes the original data into a set of linear
    variates
  • it only identifies linear components in the data
  • Here we will only be concerned with PCA!
  • Factor analysis (FA)?
  • derives a mathematical model from which factors
    are estimated
  • only Factor Analysis is said to be capable of
    truly identifying underlying factors

Both PCA and FA yield similar results for large
numbers of variables (gt30) with high communality
(gt.70) whereas they differ for smaller numbers of
variables (lt20) with low communality (lt.40).
30
Theory behind principal component analysisPCA is
similar to MANOVA and discriminant analysis.
  • MANOVA
  • The Sum of Squares and Cross-product matrix
    (SSCP) represents the variance and co-variance of
    the multiple variables
  • MANOVA tries to find linear combinantions of the
    dependent variables that can discriminate groups
    of subjects
  • The eigenvectors of the SSCP represent these
    linear variates
  • PCA
  • The correlation matrix is similar to the SSCP in
    that it is an averaged and standardized version
    (from -1 to 1) of the SSCP.
  • In PCA, the linear variates (eigenvectors) are
    calculated from the correlation matrix. The
    number of variates is always the number of
    variables measured.

31
Theory behind principal component analysis-
continued
  • PCA
  • The linear components of the correlation matrix
    are caluclated by determining the eigenvalues of
    the matrix. From the eigenvalues, eigenvectors
    are calculated whose elements represent the
    loading of a particular variable on a particular
    factor (b-values) .
  • The eigenvalue also indicate the substantive
    importance of its associated eigenvector.
  • MANOVA
  • The elements of the eigenvectors are the weights
    of each variable on the variate. These are the
    factor loadings.
  • The largest eigenvalue is a single indicator of
    the importance of each variate. The idea is to
    only retain factors with large eigenvalues.

32
Principle component analysis (PCA)- Summary -
  • By far the most common form of factor analysis,
    PCA seeks a linear combination of variables such
    that the maximum variance is extracted from the
    variables. It then removes this variance and
    seeks a second linear combination which explains
    the maximum proportion of the remaining variance,
    and so on. This is called the principal axis
    method and results in orthogonal (uncorrelated)
    factors. PCA analyzes total (common and unique)
    variance.

http//faculty.chass.ncsu.edu/garson/PA765/factor.
htm
33
Factor extraction the scree plot
Scree 'the loose stones or debris at the
base of a hill or cliff'
  • How many factors should be extracted?
  • ??We should only retain factors with large
    eigenvalues
  • What is the statistical criterion for a
    substantive factor?
  • ? The scree-plot graphs each eigenvalue (Y-axis)
    against the factor with which it is associated
    (X-axis). By graphing the eigenvalues, the
    relative importance of each factor becomes
    apparent. Only factors with eigenvalues gt1 should
    be retained.

Top of hill high eigen- values
Base of hill 'scree'
http//janda.org/workshop/factor20analysis/SPSS2
0run/SPSS08.htm
34
Criteria for selecting factors
  • 1. Scree-plot only those factors that are no
    'scree'. However, the scree-plot method is only a
    'rule-of-the-thumb'
  • 2. Cattell recommends to select only factors
    whose eigenvalues are gt1
  • 3. Kaiser's criterion eigenvalues gt.7. SPSS uses
    both Cattell's and Kaiser's criterion.
  • In principal component analysis we start with a
    communality of 1. Then we have to discard some
    information since otherwise we could not reduce
    the data. Therefore, after PCA the communality
    will always be lt1.

35
Improving interpretation Factor rotation
  • After the factors have been extracted, we can
    determine how high the variables load on them.
  • Factors usually load high on one factor and low
    on the others. In order to get rid of the low
    factor loadings as much as possible, the factors
    represented as axes can be rotated so that
    the axes will cut through the variable cloud
    which loads highest on this factor.
  • If factor rotation maintains an orthogonal
    relation between the factors (90 angle, meaning
    'independence'), we speak of orthogonal rotation.
    Factors remain uncorrelated.
  • If factor rotation is done separately for each
    factor, (resulting in a non 90 angle) we speak
    of oblique rotation. The factors are allowed to
    correlate with each other.

36
Example of 2 factorsClassifying university
lecturers
  • The demographic group of university lecturers is
    characterized by two underlying factors
  • First factor alcoholism
  • Second factor achievement
  • Sets of variables load high on either of the 2
    factors.

37
Factor rotation
Orthogonal rotation
Oblique rotation
??is the rotation angle
?
?
?
Factor 1
Factor 1 Alcohol
Factor 2 Achievement
Factor 2
38
Example of 2 factorsClassifying university
lecturers
  • Which method should we choose orthogonal or
    oblique rotation for the two factors?
  • 1. For theoretical reasons, we might choose
    oblique rotation since achievement and alcohol
    seem to be correlated in real-life.
  • 2. For statistical reasons, oblique rotation
    might be preferable also. Note that for the 2nd
    graph on the previous slide oblique rotation
    would not cut through the 2nd variable cloud.
  • Both kinds of rotations should be run. If an
    oblique factor relations turns out, orthogonal
    rotation should be discarded. However, oblique
    rotation should always be theoretically motivated.

39
Oblique rotation the factor transformation matrix
  • In oblique rotation, each factor can be rotated
    by a different angle. These angles are
    represented in a factor transformation matrix.
  • A factor transformation matrix is a square matrix
    ? with sines and cosines of the angles of axis
    rotation (?). The square matrix has as many rows
    and columns as there are factors. In our example
  • The matrix is multiplied by the matrix of
    unrotated factor loadings A, to obtain a matrix
    of rotated factor loadings.

40
Choosing a method of orthogonal factor rotation
  • SPSS has 3 methods of orthogonal rotation
  • Quartimax maximizes the spread of factor
    loadings for a variable across all factors, i.e.,
    it maximizes the variance of the rows of a factor
    matrix. Easy to interpret because it concentrates
    on the variables and how they load on factors.
  • Varimax maximizes the dispersion of loadings
    within factors, i.e., it attempts to load a
    smaller number of variables highly onto each
    factor so that factor clusters become easier to
    interpret.
  • Equamax Hybrid between Quarti- and Varimax.

Recommended for beginners easy variable
interpretation. Recommended overall good factor
interpretation.
41
Choosing a method of oblique factor rotation
  • SPSS has 2 methods of oblique rotation
  • Direct oblimin The degree to which factors are
    allowed to correlate depends on the value of a
    constant, delta . Per default, delta is 0 in SPSS
    so that high correlations are ruled out.
  • Promax a faster version for very large data sets.

42
Orthogonal vs. Oblique rotation in psychology
  • Some advices
  • If there are theoretical reasons to assume that
    any factors are inter-related, oblique rotation
    should be used.
  • There is good reason to believe that in the human
    psychological domain there are no orthogonal
    factors at all. Somehow, everything depends on
    everything else.

43
Substantive importance of factor loadings
  • When you have found a factor structure, you have
    to decide which variable makes up which factors.
    The factor loadings tell us this. They can be
    tested for significance.

The loading of an absolute value of gt.3 is
considered important. However, the significance
depends on the sample size. The loads in the
table can be considered significant on the 0.001
level (1-tailed).
Note in large samples small factor loads can
still be meaningful.

44
Substantive importance of factor loadings
  • The amount of variance in a factor accounted for
    by a variable can be found by squaring the
    variable's factor load (R2).
  • Only factor loadings with a R2 gt .40 (i.e., 16
    of variance explained) should be considered
    meaningful.


45
Research exampleThe 'SPSS-Anxiety
Questionnaire' SAQ
  • One use of Factor Analysis is constructing
    questionnaires.
  • With the SAQ, students' anxiety towards SPSS
    shall be measured, using 23 questions.
  • The questionnaire can be used to predict
    individuals' anxiety towards learning SPSS.
  • Furthermore, the factor structure behind 'anxiety
    to use SPSS' shall be explored which latent
    variables contribute to anxiety about SPSS?

46
The SAQ
47
The SAQ data(using SAQ.sav)?
  • There are 23 questions (q01-q23), organized in
    columns.
  • There are n2571 subjects, organized in rows.
  • The questions are rated on a 5-point
    Likert-scale

q01-q23
n1- n2571
48
Initial considerations sample size
  • The reliability of factor analysis relies on the
    sample size.
  • As a 'rule of thumb', there should be 10-15
    subjects per variable.
  • The stability of a factor solution depends on
  • 1. Absolute sample size
  • 2. Magnitude of factor loading (gt.6)
  • 3. Communalities (gt.6 the higher the better)?
  • The KMO-measure is the ratio of the squared
    correlation between variables to the squared
    partial correlation between variables. It ranges
    from 0-1. Values between .7 and .8 are good. They
    suggest a factor analysis.
  • KMO Kaiser-Meyer-Olkin measure of sampling
    adequacy

49
Data screening
  • The variables in the questionnaire should
    intercorrelate if they measure the same thing.
    Questions that tap the same sub-variable, e.g.,
    worry, intrusive thoughts, or physiological
    arousal, should be highly correlated.
  • If there are questions that are not
    intercorrelated with others, they should not be
    entered into the factor analysis.
  • If questions correlate too highly, extreme
    multi-collinearity or even singularity (perfectly
    correlated variables) result.
  • ? Too low and too high intercorrelations should
    be avoided.
  • Finally, variables should be roughly normally
    distributed.

50
Running the analysis(using SAQ.sav)?
  • Analyze ? Data Reduction ? Factor ...

Main dialog box
Transfer all questions to the variables window
51
Descriptives
  • Tick everything

Means and SD for each variable Produces the
R-matrix Signif values of each correlation For
multicollinearity and singularity KMO and
Bartlett's test
Correlation matrix based on the model In the
Anti-image matrix the relation between two
variables with the influence of all other
variables having been eliminated is given
The determinant should be gt .00001
52
Extraction
Choose Principal components Other options
Analyze the Corr matrix OR the covariance matrix
Two plots can be displayed Unrotated
factors Scree plot
Cattel's (gt1) or Kaiser's (gt.7)? recommendation
53
Rotation
Choose Varimax
Normally, 25 iterations are enough.
However, here, we have a huge sample
Helps interpret the final rotated analysis
54
Scores
Factor scores for each subject will be saved in
the data editor
Best method of obtaining factor
scores Anderson-Rubin
Produces matrix B with the b-values
55
Options
Subjects with missing data for any variable are
excluded
Variables are sorted by size of their factor
loadings Too small variables should not be
displayed
Variables are sorted by size of their factor
loadings Too small variables should not be
displayed
56
Run the Factor AnalysisThen rerun it again,
this time changing the rotation to oblique
rotation 'Direct Oblimin'
Choose 'Direct Oblimin' this time
The output will be the same except for the
rotation.
57
Interpreting output from SPSS
  • Preliminary analysis
  • data screening
  • assumption testing
  • sampling adequacy

58
'Univariate Descriptives' withMean, SD, and n of
sample
59
Correlation Matrix R
Selected output for Q-5 19-23 Labels of
questions omitted
These are the Pearson corr coefficients
between all pairs of variables
These are the Significance levels for all
correlations. Note they are almost all
significant!
Determinant .0005271 OK!
60
Scanning theCorrelation Matrix
2. Then scan the corr coefficients for gt.9 ?
none! ? no problem with multicollinearity
All Q seem to be fine!
1. Look for many low correlations (p gt
.05)? for a single variable ? none!
61
Inverse of correlation matrix R-1- for your
attention...
62
Bartlett's test of sphericityKMO statistics
KMO-measures gt.9 are superb! KMO measures the
ratio of the squared correlation between
variables to the squared partial
correlation between variables.
KMO measures for individual factors are produced
on the diagonal of the anti-image corr matrix
? The KMO-measures give us a hint at which
variables should be excluded from the factor
analysis
Bartlett's test tests if the R-matrix is
an identity matrix (matrix with only 1's in
the diagonal and 0's off-diagonal). However, we
want to have correlated variables, so the
off-diagonal elements should NOT be 0. Thus, the
test should be significant, i.e., the R-matrix
should NOT be an identity matrix.
63
(2nd part of the) Anti-Images Matrices
Red underlined are the KMO-measures for the
individual variables They are all high
The off-diagonal numbers are the partial corr
between variables. They should all be very small,
which they are.
Anti-Image Correlation
Q1 Q2 Q3 Q4 Q5....
Q19 Q20 Q21 Q22 Q23
64
Factor extraction
Before extraction, there are as many factors as
there are variables, n23
Initial eigenvalues and explained variances
are ordered in decreasing magnitude
Before extraction
After extraction
After rotation
Rotation optimizes factor structure
(Varimax). The relative impor- tance of factors
is equalized. The explained variance of the 4
factors is more similar after rotation.
Only 4 factors with an eigenvalue gt 1 are
retained (Fisher's criterion)?
65
Communalities
Before and after extraction
Before and after extraction
E.g. 43,5 of variance in Q1 is common, shared
variance
  • Communality is the proportion of common
  • variance within a variable.
  • Initially, communality is assumed to be 1 ('all
    variance is common'). After extraction, the true
    communalities can be judged better.

Before extraction, there are as many factors as
there are variables, n23, so that all variance
is explained by the factors and communality is 1.
(No data reduction yet). After extraction, some
of the factors are retained, others are
dismissed. This leads to a welcome data
reduction. Now the amount of variation in each
variable explained by the factors is the
communality.
66
Component matrix
Before rotation, most variables loaded highest
on the first factor (which can therefore explain
a high amount of variation (31,7)?
  • The component matrix shows the factor loadings of
    each variable before rotation.
  • SPSS has already extracted 4 components
    (factors).
  • How can we decide how many factors we should
    retain?
  • ??scree plot

Loadings lt.4 are suppressed, hence the blank
spaces.
67
Scree plot
  • After 2 or after 4 factors, the curve inflects.
  • Since we have a huge sample, Eigenvalues can
    still be well interpreted gt1, so retaining 4 is
    justified.
  • However, it is also possible to retain just 2.

68
Reproduced correlations
The first half of the Reproduction table contains
the correlation coefficients between all of the
questions based on the factor model. It contains
the "explained" correlations among the
variables. The diagonal contains the
communalities after extraction for each
variable. ? Compare with the 'Communalities'
matrix
69
Reproduced correlations and residuals
The correlations in the reproduced matrix
correspond to those in the original R-matrix,
however, they differ since they now stem from the
model rather than from the data. To assess the
fit of the model to the data, we can
determine the differences between the observed
and the model correlations
R-Matrix from observed data Q01xQ02
Expl. residual robserved rfrom model
residual Q1Q2 (-.099) - (-.112)? .013 or
1.3E-02 It is these residuals which are given in
the 2nd half of the 'Reproduced correlations
matrix residuals'
Reproduced Matrix from model Q01xQ02
70
2nd half of Reproduced correlations residuals
Expl. from previous slide residual
robserved rfrom model residual Q1Q2 (-.099)
- (-.112)? .013 or 1.3E-02
For a good model, the residuals should be small.
In the footnote below the table, SPSS tells us
that only 35 of all resid are gt.05. (A good
model should have less than 50 of the
residuals be gt .05)?
71
Rotated component matrix orthogonal rotation
The Rotated component matrix has the same
information as the component matrix, only that it
is calculated after orthogonal rotation (here
with VARIMAX).
Loadings lt.4 are suppressed, hence the blank
spaces.
72
Comparing the component with the Rotated
component matrix
Before rotation, most Qs loaded highly on the
first extracted factor and much lower on the
following ones.
After rotation, all 4 extracted factors have a
couple of Qs loading highly on them.
Q12 loads equally high on factor 1 and 2!
Q12 People try to tell you that SPSS makes
statistics easier to understand but it doesn't
73
Looking at the content of the Qs
  • In order to interpret the factors, we have to
    look at the content of the Qs that load highly on
    them

74
Looking at the content of the Qs
75
Looking at the content of the Qs
76
Looking at the content of the Qs
77
4 subscales of the SAQ
  • Now the question arises if
  • 1. SAQ does not measure what it says ('SPSS
    anxiety') but some related constructs
  • 2. These four constructs are sub-components of
    SPSS anxiety.
  • ? The Factor Analysis does not tell us

78
Component or Factor transformation matrix
This Matrix tells us to what degree factors were
rotated in order to obtain a solution. If no
rotation were necessary, the matrix would be an
identity matrix (1's at the diagonal, 0's at all
off-diagonal positions). If orthogonal rotation
was completely satisfying, the matrix should be
symmetrical, i.e., all numbers above and below
the diagonal should be the same. If they aren't ?
try oblique rotation (seems appropriate here)? ?
The matrix is hard to interpret and as beginners
we are advised to ignore it...
79
Oblique rotationWhile in orthogonal
rotation, we have only one matrix, the factor
matrix, in oblique rotation the factor matrix is
split up into the pattern matrix and the
structure matrix.
  • Structure Matrix
  • takes into account the relationship betweeen
    factors
  • ? should be used as a check on the pattern matrix
  • ? should also be reported
  • Pattern matrix
  • contains the factor loadings and is interpreted
    like the factor matrix.
  • ? is easier to interpret
  • ? should be reported

80
Oblique rotation pattern matrix
The pattern matrix gives us the unique
contribution of a variable to a factor. The same
4 patterns seem to have emerged
F1 'Fear of statistics'
F2 'Fear of peer evaluation'
F3 'Fear of computers'
F4 'Fear of mathematics'
81
Oblique rotation structure matrix
In the structure matrix, the shared variance is
not ignored. Now several variables load highly
onto more than 1 factor.
Factors 1 and 3 'fear of statistics' and 'fear
of computers' go together. Also F4 'fear of
math' is related
Note Factor 3 'fear of computers' appears twice,
each time together with a different factor
Note Factor 3 'fear of computers' appears twice,
each time together with a different factor
Factors 3 and 4 'fear of computers' and 'fear
of math' go together
82
Oblique rotation Component correlation matrix
The Component Correlation matrix contains the
correlation coefficients between factors. F2
'fear of peer evaluation' has little relation
with the others, but F1,3,4 'fear of stats,
computers, and maths', are somewhat interrelated.
? Independence of factors cannot be upheld, given
the correlations between the factors and also the
content of the factors 'fear of stats,
computers, and maths's, all have a similar
meaning. ? oblique rotation is more sensible.
83
Factors statistically and conceptually
  • The Factor Analysis has extracted 4 factors, 3 of
    which are correlated with each other, one of
    which is rather independent. An oblique rotation
    is more sensible given the interrelation between
    3 factors.
  • How does that match the interpretation of the
    factors?
  • The three correlated factors
  • fear of stats fear of math fear of
    computers
  • are also conceptually closely related whereas the
  • 4th factor 'fear of negative peer evaluation',
    being socially based, is also conceptually
    different.
  • Hence, the statistics and the meaning of the
    factors go along with each other rather nicely.

84
Factor scores- Matrix B
  • The factor scores Matrix is the one from which
    the factor scores and the covariance matrix of
    factor scores is calculated.
  • If you are not particularly interested in the
    math of it, you are forgiven if you ignore it.

'The factor matrix presents the loadings, by
which the existence of a pattern for the
variables can be ascertained. The factor score
matrix gives a score for each case (...) on these
patterns.'
http//www.hawaii.edu/powerkills/UFA.HTM
85
Component Score Covariance Matrix
This matrix tells us the relationship between
factor scores. It is an unstandardized
correlation matrix. If factor scores are totally
uncorrelated, it should be an identity matrix
(with all diagonal elements 1 and all
off-diagonal elements 0). This seems to be met
here the numbers are all very very small.
  • (Note SPSS 10 doesn't have it)?

86
Case summaries of Factor scores
  • The scores that we asked SPSS to calculate are
    uncorrelated because they are based on the
    Anderson-Rubin method which explicitly prevents
    correlations.
  • The individual scores for each subject on all 4
    components are listed in the data editor under
    FAC1_1, FAC 2_1, FAC3_1, FAC4_1.
  • We can look at them by the aid of case summaries
    for the first 10 cases (otherwise, the output
    will be too voluminous).

87
Case summaries of Factor scores
  • Analyze ? Report ? Case Summaries...

Transfer the variables FAC1_1, 2_1, 3_1, and 4_1
to the Variables window.
Limit the number of cases to 10
88
Output Case summaries of Factor scores
  • With the factor scores you can compare single
    individual's fear of math, stats, computers, and
    peer evaluation. E.g., subject 9 scores highly on
    all 4 factors, in particular on factors 3.
  • Also, factor scores for all 4 factors can be
    added and a sum score can be derived.
  • Note that factor scores are standardized.

89
Interim summary
  • SAQ has 4 factors underlyingly, which we can
    identify as fear of
  • stats maths computers peer evaluation
  • Oblique rotation is to be preferred since three
    of the four factors are inter-related,
    statistically as well as conceptually
  • The use of Factor Analysis here is purely
    exploratory. It helps you understand what factors
    are underlying large data sets
  • Informed decisions may follow from such an
    exploratory Factor Analysis, e.g., wrt working
    out a better questionnaire.

90
Reliability analysis
  • Here, Factor Analysis has been used to validate a
    questionnaire. Therefore, it is necessary to know
    how reliable the scales are.
  • There are various ways of testing reliability
  • 1. test-retest reliability do subjects achieve
    similar scores when the SAQ is administered
    again, some time later?
  • 2. split-half reliability Split the scale in
    half and administer it to one subject if the
    scale is consistent, the subject should obtain 2
    similar scores.
  • A generalized form of the split-half reliability
    is
  • 3. Cronbach's alpha, ????which splits data in two
    in every possible way and computes the
    correlation coefficient for each split.

91
Cronbach's alpha, ??
Average of the covariance multiplied by the
square number of items.
  • ????
  • ???????????
  • ?
  • ? N2Cov__________
  • ?s2item ? Covitem
  • For all items on the scale, we calculate the
    variance (s2) and the covariance between the item
    and any other item on the scale. Hence, we need a
    variance-covariance matrix of all items.
  • In this matrix, the diagonal elements will be the
    item's variance and the off-diagonal elements
    will be the covariation with another item.

Sum of variance plus sum of covariance of all
items
92
Interpreting Cronbach's ?
  • Reliability values of .7 - .8 are acceptable for
    ?.
  • If there are subscales, ? should be determined
    for every subscale separately.
  • Reverse-phrased items (Q03 Standard deviations
    excite me!) have to be reversed back SD? SA and
    D? A. Otherwise the sum of covariation will
    decrease only because the item had a negative
    value.

SD(6-15) D(6-24) N(6-33) A(6-42) SA
(6-51)
Numerically, reversal is done by adding 1 to the
maximum score, here 516. Then subtract each
original score from 6.
93
Reversal using SPSS
  • Transform ? Compute...

6-q03 yields the reversal
Create a new target variable q03
If you don't want to change the original SAQ.sav
file, there is another one, called SAQ(Item 3
reversed).sav
Confirm OK Now q03 is reversed
94
Reliability analysis on SPSS
  • We will determine reliability for each of the 4
    subscales from orthogonal rotation separately

95
Reliability analysis on SPSSsubscale 1 Fear of
computers
  • Analyze ? Scale ? Reliability Analysis

Transfer all items of 1 factor to the Item's
window, here Subscale 1 (Items
6,7,10,13,14,15,18). Proceed alike with the
other 3 subscales
Leave the default setting for model Cronbach's
Alpha. Tick 'List Item Lables'
96
Reliability analysis on SPSS
'Scale if item deleted' tests whether
alpha decreases if one item is deleted. In a
reliable test, this should not matter much.
We therefore expect still a high alpha (gt.8)? if
an item is delete.
For a basic realibility check, just tick 'Scale
if item deleted' and 'correlations'
Click OK
97
Subscale 1 'fear of math'Correlation
Correlation matrix for subscale 1. Variables
mostly show a correlation gt.3 with each other.
98
Subscale 1 'fear of math'Item-total Statistics
The individual ? values should not be
greater than the overall ? since then their
deletion would improve reliability! None of
the items here affect reliability substantially
The Corrected Item-total correlation give us the
correlations between each item and the total
score from the questionnare. In a reliable
scale, all items should correlate with the total.
Values should not be lt.3. Cronbach's ? is gt.7
for all items. Average ? .8234 (Cronbach's
?)? ? Subscale 1 is reliable.
99
Reliability analysis on SPSSsubscale 2 Fear of
statisticsCorrelation matrix and item-total
statistics
Subscale 2, as well, has a good Cronbach's ?
.758 Individual 's ??are not higher than the
overall ? . ? Subscale 2 is reliable!
100
Reliability analysis on SPSSsubscale 3 Fear of
mathsCorrelation matrix and item-total statistics
Subscale 3, as well, has a good Cronbach's ?
.8194 Individual 's ??are not higher than the
overall ? . ? Subscale 3 is reliable.
101
Reliability analysis on SPSSsubscale 4 Fear of
negative peer evaluationCorrelation matrix and
item-total statistics
Subscale 4, however, has a poor Cronbach's ?
.5699 Individual 's ??are not higher than the
overall ? . ? Subscale 4 is not reliable.
What's wrong with this sub- scale? It might have
too heterogeneous questions which decreases
internal consistency. ? This subscale should
be rethought!
Q23 has low item-total correlation
Write a Comment
User Comments (0)
About PowerShow.com