Title: Statistical Modelling (Special Topic: SEM)
1Statistical Modelling(Special Topic SEM)
- Bidin Yatim, PhDAssociate Professor in
Statistics - College of Art and ScienceUUM.Phd Applied
Statistics (Exeter, UK)MSc Industrial Maths
(Aston,UK)BSc Maths Stats (Nottingham, UK)
2Main Focus
- Relationship Analysis
- Awareness on the fact that some relationships /
models are meaningful and some are not. - Meaningful relationships / models normally have
theoretical basis (underlying theory) and exhibit
causality or cause-and-effect - For those cause-and-effect relationships, SEM
provides a formal way of analysing them
3Agenda
- Part I SEM the Basic
- SEM Nomenclature / Terminologies
- SEM related Models
- Part II Modeling and Computing
- how to draw a model using AMOS.
- how to run the AMOS model and evaluate several
key components of the AMOS graphics and text
output, including overall model fit and test
statistics for individual path coefficients. - how to modify and respecify a non-fitting model.
- Part III SEM and Its Applications
4Part One
- SEM The Basic
- http//58.26.137.12/byatim/
5An overview SEM References
- www.statsoft.com/textbook/stsepath.html
- Chapter 1 of Structural Equation Modeling with
AMOS. Basic Concepts, Applications and
Programming - Barbara M. Bryne
6Welcome to SEM The MusicalLyrics by Alan
Reifman(May be sung to the tune of "Matchmaker,"
Bock/Harnick, from Fiddler on the Roof)
- SEM, SEM, it can be sung,Youll be amazed, at
what weve sprung,We hope youll learn more
bout this stats technique,Through songs of
which youre among, SEM, SEM, we like to
run,It takes awhile, but we get it done,We hope
youll learn of the steps that we take,And take
home from this, some fun
7SEM
- Is a statistical methodology of the analysis of a
structural theory that bears on some phenomenon
using a confirmatory (hypothesis testing)
approach. Most other multivariate procedures are
descriptive/ exploratory in nature. - The theory represent causal processes that
generate observations on multiple variables.
8SEM
- conveys 2 important aspects of the procedures..
- The causal processes under study are represented
by a series of structural equations, and - These structural equations can be modeled
pictorially to enable a clearer conceptualization
of the theory under study. - The model can be tested simultaneously to
determine the extent to which it is consistent
with the data if the goodness of fit adequate,
the model is not rejected, otherwise the
hypothesized relations rejected.
9SEM A Note
- SEM is a very general, very powerful and very
popular multivariate analysis technique. - It provides a comprehensive method for the
quantification and testing of theories. - Been applied in econometric, psychology,
sociology, political science, education, market
and medical research etc. - Also known as
- covariance structure analysis,
- covariance structure modeling,
- latent vaviable modeling,
- confirmatory factor analysis,
- linear structural relationship and
- analysis of covariance structures.
10SEM is
- a family of statistical techniques which
incorporates and integrates - Path analysis
- Linear regression
- Factor analysis
11SEM
- serves purposes similar to multiple regression,
but in a more powerful way which takes into
account the modeling - of interactions, nonlinearities, correlated
independents, measurement error, correlated error
terms, multiple latent independents each measured
by multiple indicators, and one or more latent
dependents also each with multiple indicators. - may be used as a more powerful alternative to
multiple regression, path analysis, factor
analysis, time series analysis, and analysis of
covariance. These procedures are special cases of
SEM, or, - is an extension of the general linear model (GLM)
of which multiple regression is a part.
12Advantages of SEM compared to multiple regression
- more flexible assumptions (particularly allowing
interpretation even in the face of
multicollinearity), - use of confirmatory factor analysis to reduce
measurement error by having multiple indicators
per latent variable, - the attraction of SEM's graphical modeling
interface, the desirability of testing models
overall rather than coefficients individually, - the ability to
- test models with multiple dependents,
- model mediating variables,
- model error terms,
- test coefficients across multiple
between-subjects groups, and - handle difficult data (time series with
autocorrelated error, non-normal data, incomplete
data).
13Major applications of structural equation modeling
- causal modeling, or path analysis - hypothesizes
causal relationships among variables and tests
the causal models with a linear equation system.
Causal models can involve either manifest
variables, latent variables, or both - confirmatory factor analysis - extension of
factor analysis in which specific hypotheses
about the structure of the factor loadings and
intercorrelations are tested - regression models, in which regression weights
may be constrained to be equal to each other, or
to specified numerical values - covariance structure models, which hypothesize
that a covariance matrix has a particular form.
For example, you can test the hypothesis that a
set of variables all have equal variances with
this procedure - correlation structure models, which hypothesize
that a correlation matrix has a particular form.
14Aims and Objectives
- By the end of this course you should
- Have a working knowledge of the principles behind
causality. - Understand the basic steps to building a model of
the phenomenon of interest. - Be able to construct/ interpret path diagrams.
- Understand the basic principles of how models are
tested using SEM. - Be able to test models adequacy using SEM
- Be able to use AMOS intelligently.
15SEM Another Note
- Assumption 1 you are familiar with the basic
logic of statistical reasoning as described in
Elementary Concepts. - Assumption 2 you are familiar with the concepts
of variance, covariance, correlation and
regression analysis if not, you are advised to
read the Basic Statistics. - It is highly desirable that you have some
background in factor analysis before attempting
to use structural modeling.
16 Introduction to SEM
- How Useful is Statistical Model?
- The Basic Idea Behind SEM
- Causality (Cause-and-Effect Relationship)
- SEM Nomenclature/Terminologies
- SEM related Statistical Models
17How Useful is Statistical Model?
- All models are wrong, but some are useful
- G.E.P Box
- SEM models can never be accepted (as absolute
truth) they can only fail to be rejected. - This leads researchers to provisionally accept a
given model. - While models that fit the data well can only be
provisionally accepted, models that do not fit
the data well can be absolutely rejected.
18The Basic Idea Behind SEM
- In Distribution Theory course you are taught
that, if you multiply every number in a list by
some constant K, you multiply the mean of the
numbers by K. Similarly, you multiply the
standard deviation by the absolute value of K. - Suppose you have the list of numbers 1,2,3
having a mean of 2 and a standard deviation of
1. Suppose also you take these 3 numbers and
multiply them by 4. Then the mean would become 8,
and the standard deviation would become 4, the
variance thus 16.
19The Basic Idea Behind SEM
- The point is, if you have a set of numbers X
related to another set of numbers Y by the
equation Y 4X, then the variance of Y must be
16 times that of X, so you can test the
hypothesis that Y and X are related by the
equation Y 4X indirectly by comparing the
variances of the Y and X variables. This idea
generalizes, in various ways, to several
variables inter-related by a group of linear
equations. The rules become more complex, the
calculations more difficult, but the basic
message remains the same -- you can test whether
variables are interrelated through a set of
linear relationships by examining the variances
and covariances of the variables.
20The Basic Idea Behind SEM
- Statisticians have developed procedures for
testing whether a set of variances and
covariances in a covariance matrix fits a
specified structure. The way SEM works is as
follows - You state the way that you (the theory) believe
the variables are inter-related, often with the
use of a path diagram. - You (AMOS) work out, via some complex internal
rules, what the implications of this are for the
variances and covariances of the variables. - You test whether the variances and covariances
fit this model of them. - Results of the statistical testing, and also
parameter estimates and standard errors for the
numerical coefficients in the linear equations
are reported. - On the basis of this information, you decide
whether the model seems like a good fit to your
data.
21A Simple SEM
- SEM is an attempt to model causal relations
between variables by including all variables that
are known to have some involvement in the process
of interest - test the effect of a drug on some psychological
disorder (e.g. obsessive compulsive disorder, OCD)
22Causality
Causality has theoretical basis
Education
Success in Life
Price
Demand
Supply
Windows of Opportunity for Crime
Unemp-loyment Rate
No. of Crimes
23Cause and Effect
- Philosophers have had a great deal to say about
the conditions necessary to infer causality.
Cause and effect - should occur close together in time,
- cause should occur before an effect is observed,
and - the cause should never occur without the presence
of the effect.
24John Stuart Mill (1865) described three
conditions necessary to infer cause
- Cause has to precede effect
- Cause and effect must be related
- All other explanations of the cause-effect
relationship must be ruled out.
25To verify the third criterion, Mill proposed the
- method of agreement which states that an effect
is present when the cause is present - method of difference which states that when the
cause is absent the effect will be absent also
and - method of concomitant variation which states that
when the above relationships are observed, causal
inference will be made stronger because most
other interpretations of the cause-effect
relationship will have been ruled out.
26Example
- If we wanted to say that me talking about
causality causes boredom, we would have to
satisfy the following conditions - (1) I talk about causality before boredom
occurs. - (2) Whenever I talk about causality, boredom
occurs shortly afterwards. - (3) The correlation between boredom and my
talking about causality must be strong (e.g. 4
out of 4 occasions when I talk about causality
boredom is observed) - . (4) When cause is absent effect is absent
when I dont talk about causality no boredom is
observed. - (5) The manipulation of cause leads to an
associated change in effect. So, if we
manipulated whether someone is listening to me
talking about causality or to my cat is mewing,
the effect elicited should change according to
the manipulation. - This final manipulation serves to rule out
external variables that might affect the
cause-effect relationship.
27Continue
- in situations in which cause cannot be
manipulated we cannot make causal attributions
about our variables. Statistically speaking, this
means that when we analyze data from
non-experimental situations we cannot conclude
anything about cause an effect. - Structural Equation Modeling (SEM) is an attempt
to provide a flexible framework within which
causal models can be built.
28Statistical Modeling
A Statistical Model DOES NOT necessarily have
theoretical basis It may be interpreted as
either make sense or nonsense
Weight
Heart Disease
Income
Smoking
No. of Road Accidents
No. of Newspaper Readers
29SEM Related Statistical Models
- General Linear Model (GLM)
- Regression Model
- Time Series Model
- Log-linear Model
- Mixed Models
- Survival Models
- Many more
All these Statistical Models may or may not have
theoretical basis
30Exogenous Latent Variable /Construct
Endogenous Latent Variable
Indicators
Indicators
Exogenous Latent Variable
Indicators
31SEM Nomenclature
- Independent variables, which are assumed to be
measured without error, are called exogenous or
upstream variables - Dependent or mediating variables are called
endogenous or downstream variables. - Manifest or observed variables or indicators are
directly measured by researchers - Latent or unobserved variables are not directly
measured but are inferred by the relationships or
correlations among measured variables in the
analysis. Example, self-concept, motivation,
powerlessness, anomie, verbal ability,
capitalism, social class. -
32SEM Nomenclature (cont.)
- SEM illustrates relationships among observed and
unobserved variables using path diagrams. - Ovals or circles represent latent variables,
- Rectangles or squares represent measured
variables. - Residuals are always unobserved, so they are
represented by ovals or circles.Â
33SEM Definition
- SEM is an extension of the general linear model
(GLM) that enables a researcher to test a set of
regression equations simultaneously. - SEM consists of TWO components
- Structural Model
- illustrates the relationships among the latent
constructs or endogenous variables - Measurement Model
- represents how the constructs are related to
their indicators or manifest variables
34Example
In psychology, the theory postulates that
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
35Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
36Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
37Structural Model
- The structural model allows for certain
relationships among the latent variables,
depicted by lines or arrows (in a path diagram) - In the path diagram, we specified that Ability
and Achievement were related in a specific way.
That is, intelligence had some influence on later
achievement. - Thus, one result from the structural model is an
indication of the extent to which these a priori
hypothesized relationships are supported by our
sample data.
38Structural Model (Cont.)
- The structural equation addresses the following
questions - Are Ability and Achievement related?
- Exactly how strong is the influence of Ability on
Achievement? - Could there be other latent variables that we
need to consider to get a better understanding of
the influence on Achievement?
39Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
40Mathematical Form of Structural Model
41Measurement Model
- Specifying the relationship between the latent
variables and the observed variables - Answers the questions
- To what extent are the observed variables
actually measuring the hypothesized latent
variables? - Which observed variable is the best measure of a
particular latent variable? - To what extent are the observed variables
actually measuring something other than the
hypothesized latent variable? - Using Exploratory Factor Analysis (EFA) or
Confirmatory Factor Analysis (CFA) to determine
the significant observed variables related to
each of the latent variables
42Exploratory FA (EFA)
- In EFA the factor structure or theory about a
phenomenon is NOT KNOWN. - For example, the researcher is interested in
measuring the achievement of a personnel. - Suppose he has no knowledge ( very little theory)
regarding - the factors that contribute to achievement
- the no. of indicators of each factor
- which indicators represent which factor
- In such a case, the researcher may collect data
and explore for a factor or theory which can
explain the correlations among the indicators.
43Confirmatory FA (CFA)
- In CFA the precise factor structure or theory
about a phenomenon is KNOWN or specified priori. - For example, a researcher is interested in
measuring consumer preference to a product. - Suppose that based on previous research it is
hypothesized (the theory) that a construct or
factor to measure consumer preference is - a one-dimensional construct with 7 indicators or
items as its measures - The obvious question is
- How well do the empirical data conform to the
theory of consumer preferences? Or - How well do the data fit the model?
- In such a case, CFA is used to do empirical
confirmation or testing of the theory
44Using Factor Analysis
Factor Loadings
Academic Skill
x1
Ability
Inter-personal Skill
x2
Communi-cation Skill
x3
45Using Factor Analysis
Factor Loadings
Family Status
y1
Aspiration
Fathers Occupation
y2
Peers Influence
y3
46Using Factor Analysis
Factor Loading
Personal Actualisation
y4
Achievement
Professional Status
y5
Social Status
y6
47Measurement Model (Cont.)
- The relationships between the observed variables
and the latent variables are described by factor
loadings - Factor loadings provide information about the
extent to which a given observed variable is able
to measure the latent variable. They serve as
validity coefficients. - Measurement error is defined as that portion of
an observed variable that is measuring something
other than what the latent variable is
hypothesized to measure. It serves as a measure
of reliability.
48Measurement Model (Cont.)
- Measurement error could be the result of
- An unobserved variable that is measuring some
other latent variable - Unreliability
- A second-order factor
49Mathematical Form of Measurement Model
How the latent (unobservable) exogenous variable
are related to their indicators or
manifest/observed variables x1,x2 x3
50Measurement Model (cont.)
How the TWO latent (unobservable) constructs or
endogenous variables , are related
to their indicators or manifest variables y1, ..y6
51Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
52Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
53Reliability
- Definition Extent to which a variable or set of
variables or set of variables is consistent in
what it is intended to measure - If multiple measurement are taken, the reliable
measures will all be consistent in their values - It is a degree to which the observed variable
measure the true value and is error free - It is different from validity
54True Score and Measurement Error
- True score a component which indicates the
subject actually stands on the variable
(statement) of interest - Measurement error A component which indicates
the inaccuracies when measuring true scores due
to fallibility of survey instrument, responses
scales, data entry or respondent error
55Reliability
- The degree to which scores are free from random
measurement error - Reliability measures
- Internal Consistency Reliability
- Test-retest Reliability
- Alternate Forms Reliability
56Reliability
- Levels of Reliability
- 0.90 Excellent
- 0.80 Very Good
- 0.70 Adequate
- lt0.70 Poor
57Example Reliability of Observed Variables
- Cronbachs alpha were computed for the all
variables - Variable No. of items Reliability
- Variable1 10 .91
- Variable2 10 .87
- Variable3 10 .58
- Variable4 10 .70
- Variable5 12 .72
- Variable6 12 .80
- Variable7 12 .80
- Variable8 12 .87
- Variable9 10 .84
- Variable10 7 .71
- Variable11 4 .48
58Summated Scale Reliability
- When reliability involves multiple scaled items,
reliability must be measured in a summated scale. - A summated rating scale is a short list of
statements, questions or other items that the
subject responds to. - A summated is a sum of responses from a list of
statements to create an overall score.
59Reliability coefficient (1)
- There are several ways to measure reliability
which will be discussed later. - The measurement is normally called the
reliability coefficient. - This coefficient is the percent of variance in an
observed variable that is accounted for by the
true scores of the underlying construct.
60Reliability Coefficient (2)
- Imagine you have collected 2 scores from a survey
- True and observed scores of customer satisfaction
- You compute the correlation between the scores
- The square of correlation coefficient will be
your reliability coefficient which is - The total variances explained in the observed
scores by the true score or - The percent of variance in observed scores that
is accounted for by true scores. -
61Types of Reliability
- Test-retest
- Assessed by administering the same instrument to
the same sample respondent at two points in time,
and computing the correlation between two sets of
scores. - Internal consistency reliability
- The extent to which individual items that
constitute a test correlate with one another or
with the test total. In short, it measures how
consistently respondents respond to the items
within scale.
62Types of Reliability (2)
- For example, if the first half of an instrument
is educational items which correlate highly among
themselves and second is political items which
correlate highly among themselves., the
instrument would have high internal consistency
anyway, even though they are two distinct
dimensions - Note that measure of internal consistency are
often called measures of internal consistency
reliability or even reliability, but this
merge the distinct concepts of internal
consistency and reliability, which necessarily go
together - How do we solve this problem?
- The most commonly used internal consistency
reliability is Cronbachs Alpha
63Validity
- Definition extent to which an item or set of
items correctly represent the construct of study-
the degree of which it is free from any
systematic or non-random error - Validity deals with
- How well the construct is defined by the item/s
(what should be measured) - While Reliability deals with
- How consistent the item/s is/are measuring the
construct (HOW it is measured)
64Validity
- Whether the scores measure what they are supposed
to measure - Types of validity
- Construct Validity (SEM Confirmatory Factor
Analysis helps to establish construct validity) - Criterion-Related Validity (Correlation with an
external standard) - Convergent Validity/ Discriminant Validity (Can
be determined through SEM Confirmatory Factor
Analysis)
65Examples
- Example 1 How happy are you?
- This example is validity -whether the measure
accurately represents what it is supposed to
measure - Example 2 How happy are you when you are
smoking? Ask this question repeatedly on the same
subject or multiple subject and see how
consistent their answers are? - This example is about reliability (sometimes Id
like to call it consistency)
66I Am an IndicatorLyrics by Alan Reifman(May be
sung to the tune of "The Entertainer," Billy Joel)
- I am an indicator, a latent construct I
represent,I'm measurable, sometimes pleasurable,
A manifestation of what is meant,I am an
indicator, I usually come in a multiple set,With
other signs of the same construct, you may
instruct, I'm correlated with my co-indicators,
you can bet,I am an indicator, from my presence
the construct is inferred,I'm tap-able, the
construct is not palpable,The distinction should
not be blurred
67At Least ThreeLyrics by Alan Reifman(May be
sung to the tune of "Think of Me," Lloyd
Webber/Hart/Stilgoe, from Phantom of the Opera)
- At least three, indicators are urged,For each
latent construct shown,At least three,
indicators should help,Avoid output where you
groan,With less than three, your construct sure
will be, locally unidentified,Though the model
might still run, you could have a rough ride
68Total, Direct and Indirect Effects
- There is a direct effect between two latent
variables when a single directed line or arrow
connects them - There is an indirect effect between two variables
when the second latent variable is connected to
the first latent variable through one or more
other latent variables - The total effect between two latent variables is
the sum of any direct effect and all indirect
effects that connect them.
69Example Direct and Indirect
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
70Semantics
- Types of measurement scale
- Metric and Non-metric
- Correlation coefficient
- Correlation and Covariance Matrix
- Standardized and Un-standardized Estimates
71Types of Measurement Scale
- There 4 types of measurement scale in a scale
instrument - Nominal Scale
- Ordinal
- Interval Scales
- Ratio
- Some other common scales like Likert scales,
Semantic Differential Scales, Dichotomous Scales
etc can be categorized into the 4 above - This is important as assumptions on SEM rely on
what we know on this page
72Metric and Non-metric Scales
- Metric scales are quantitative data where the
parameters of the scale is continuum - Interval or Ratio scale data
- Non-metric scales are qualitative data where
attributes, characteristics or categorical
properties that identify or describe a subject or
object - Possibly Nominal or Ordinal scale data
- But the use of metric and non-metric scales can
be misused or abused sometimehow?
73VARIABLE SCALES
- SEM in general assumes observed variables are
measured on a linear continuous scale - Dichotomous and ordinal variables cause problems
because correlations /covariances tend to be
truncated. These scores are not normally
distributed and responses to individual items may
not be very reliable.
74Correlation
- Perhaps the most basic semantic
- Definition the linear relationship of two
variables - The strength of relationship is determined by the
correlation coefficient and r² (explained later) - There are 2 common types of correlation
coefficient - Pearson Product Moment Correlation (Interval)
- Spearman Ranking Correlation (Ordinal)
- The former is the one we will use in this course
75Correlation Matrix (1)
- The correlation matrix of n random variables
X1,,Xn is the n n matrix whose i,j entry is
corr(Xi,Xj) - If the measurement of correlation used are
product-moment coefficients, the correlation
matrix is the same as the covariance matrix of
the standardized random variables Xi/SD(Xi) for
i1,,n - Consequently it is necessary a non-negative
definite matrix important assumption - The correlation matrix is symmetric because the
correlation between Xi and Xj is the same as the
correlation between Xj and Xi
76Correlation Matrix (2)
A1 A2 A3 A4 A5 A6 A7 B1 B2 B3
A1 a1 1.0000 0.65579 lt.0001 0.46296 lt.0001 0.58812 lt.0001 0.62082 lt.0001 0.62629 lt.0001 0.64288 lt.0001 0.34385 0.0004 0.57904 lt.0001 0.56353 lt.0001
A2 a2 0.65579 lt.0001 1.00000 0.45951 lt.0001 0.66297 lt.0001 0.72727 lt.0001 0.77384 lt.0001 0.76693 lt.0001 0.40987 lt.0001 0.67796 lt.0001 0.59493 lt.0001
A3 a3 0.46296 lt.0001 0.45951 lt.0001 1.00000 0.51913 lt.0001 0.46652 lt.0001 0.45752 lt.0001 0.44520 lt.0001 0.33407 0.0006 0.35833 0.0002 0.33623 0.0006
A4 a4 0.55812 lt.0001 0.66297 lt.0001 0.51913 lt.0001 1.00000 0.69905 lt.0001 0.64969 lt.0001 0.59358 lt.0001 0.34148 0.0004 0.58859 lt.0001 0.44284 lt.0001
A5 a5 0.62082 lt.0001 0.72727 lt.0001 0.46652 lt.0001 0.69905 lt.0001 1.00000 0.67281 lt.0001 0.66939 lt.0001 0.31277 lt.0014 0.63133 lt.0001 0.54744 lt.0001
A6 A6 0.62629 lt.0001 0.77384 lt.0001 0.45752 lt.0001 0.64969 lt.0001 0.67281 lt.0001 1.00000 0.86014 lt.0001 0.40483 lt.0001 0.66758 lt.0001 0.56944 lt.0001
A7 A7 0.64288 lt.0001 0.76693 lt.0001 0.44520 lt.0001 0.59358 lt.0001 0.66939 lt.0001 0.86014 lt.0001 1.00000 0.39913 lt.0001 0.68141 lt.0001 0.62075 lt.0001
B1 b1 0.34385 lt.0004 0.40987 lt.0001 0.33407 lt.0006 0.34148 lt.0004 0.31277 lt.0014 0.40483 lt.0001 0.39913 lt.0001 1.00000 0.58187 lt.0001 0.62583 lt.0001
B2 b2 0.57904 lt.0001 0.67796 lt.0001 0.35833 lt.0002 0.58859 lt.0001 0.63133 lt.0001 0.66758 lt.0001 0.68141 lt.0001 0.58187 lt.0001 1.00000 0.85335lt.0001
B3 b3 0.56353 lt.0001 0.59493 lt.0001 0.33623 lt.0006 0.44284 lt.0001 0.54744 lt.0001 0.56944 lt.0001 0.62075 lt.0001 0.62583 lt.0001 0.85335 lt.000 1.00000
77Correlation Matrix (3)
- So we say that
- If the input matrix used is the Covariance
Matrix the estimated coefficients in the
parameters measured are unstandardized estimates - If the input matrix used is the Correlation
Matrix the estimated coefficients in the
parameters measured are the standardized
estimates - So what?
78Covariance
- The covariance between two variables equals the
correlation times the product of the variables'
standard deviations. The covariance of a
variable with itself is the variable's variance
79Correlation Matrix (4)
- Therefore when we want to test a theory, we use
variance-covariance matrix - (to validate the causal relationships among
constructs) - When we just want to explain the pattern of the
relationships then we use correlation matrix - (Theory testing is not required)
80Factors Effecting Correlation/ Covariance
Coefficient
- Type of scale and range of values
- Pearson correlation is basis for analysis in
regression, path, factor analysis and SEM. Hence
data must be in metric form. - There must be enough variation in scores to allow
correlation relationship to manifest. - Linearity
- Pearson correlation coefficient measures degree
of linear relationship between two variables,
hence need to test linearity. - Sample size
- SEM requires big sample size. Rule of thumb
10-20 times the number of variables. Ding,
Velicer and Harlow (1995) 100-150 Boomsma
(1982,1983) 400 Hu, Bentler and Kano (1992) in
some cases 5000 is still insufficient Schumaker,
Lomax (1999) many articles 250-500. Bentler and
Chou (1987) for normal data 5 subjects per
variable is sufficient.
81CovarianceLyrics by Alan Reifman (May be sung to
the tune of "Aquarius," Rado/Ragni/MacDermot,
from Hair, also popularized by the Fifth
Dimension)
- You draw paths to show relationships,You hope
align with the known rs,Your model will guide
the tracings,From constructs near to constructs
far,You will compare this with the datas
covariance,The datas covariance...Covariance!C
ovariance!Similar to correlation,With the
variables unstandardized,Does each known
covariance match up with,The one the model
tracings will derive?Covariance!Covariance!
82SEM Assumptions
- Sample Size
- a good rule of thumb is gt15 cases per predictor /
indicator (James Stevens Applied Multivariate
Statistics for the Social Sciences) - Model with TWO factors,
recommended sample size gt100 - Model with FOUR factors,
recommended sample size gt 200
83SEM Assumptions (cont.)
- Sample Size
- Consequences of using smaller samples
- convergence failures (the software cannot reach a
satisfactory solution), - improper solutions (including negative error
variance estimates for measured variables), - lowered accuracy of parameter estimates and, in
particular, standard errors - SEM program standard errors are computed under
the assumption of large sample sizes.Â
84SEM Assumptions (cont.)
- Normality
- Many SEM estimation procedures assume
multivariate normal distributions - Lack of univariate normality occurs when the skew
index is gt 3.0 and kurtosis index gt 10. - Multivariate normality can be detected by indices
of multivariate skew or kurtosis - Non-normal distributions can sometimes be
corrected by transforming variables
85SEM Assumptions (cont.)
- Multicollinearity
- Occurs when intercorrelations among some
variables are so high that certain mathematical
operations are impossible or results are unstable
because denominators are close to 0. - Bivariate correlations gt0.85
- Multiple correlationsgt0.90
- May cause a non-positive definite/ singular
covariance matrix - May be due to inclusion of individual and
composite variables - Detection Tolerance 1-R2 , 0.10
Variance Inflation Factor (VIF) 1/(1-R2) gt10 - Can be corrected by eliminating or combining
redundant variables
86SEM Assumptions (cont.)
- Outliers
- Univariate outliers more than three SDs away from
the mean - Detection by inspecting frequency distributions
and univariate measures of skewness and kurtosis - Multivariate outliers may have extreme scores on
two or more variables or their figurations of
scores may be unusual - Detection by inspecting indices of multivariate
skewness and kurtosis. Mahalanobis Distance
squared is distributed as chi square with df
equal to the number of variables. - Can be remedied by correcting errors or by
dropping these cases of transforming the
variables
87VIOLATIONS OF ASSUMPTIONS(1)
- The best known distribution with no kurtosis is
the multi-normal. - Leptokurtic (more peaked) distributions result in
too many rejections of Ho based on the Chi square
statistic. - Platykurtic distributions will lead to too low
estimates of Chi Square.
88VIOLATIONS OF ASSUMPTIONS (2)
- High degrees of skewness lead to excessively
large Chi square estimates. - In small samples (Nlt100), the Chi square
statistic tends to be too large.
89SEM, Oh, SEM
- Lyrics by Alan Reifman, dedicated to Peter
Westfall (article of his)(May be sung to the
tune of "Galveston," Jimmy Webb, popularized by
Glen Campbell)Ultimately, SEM,Your LVs cannot
be measured,Which gives the critics some
displeasure,Theres nothing physical to grab
on,When you run SEM,SEM, Oh, SEM,You make
many an assumption,Is it recklessness or
gumption?Assume the es uncorrelated...When you
run SEM,I can see the critics point of view,
now,Theyre saying the models arent
unique,That, we must willingly acknowledge,In
response to the critique, if we want to keep on
using...SEM, Oh, SEM...
90Model Identification (Identified Equations)
- Identification refers to the idea that there is
at least one unique solution for each parameter
estimate in a SEM model. - Models in which there is only one possible
solution for each parameter estimate are said to
be just-identified. - Models for which there are an infinite number of
possible parameter estimate values are said to be
underidentified. - Finally, models that have more than one possible
solution (but one best or optimal solution) for
each parameter estimate are considered
overidentified.Â
91Model Identification (Identified Equations)
- Underidentification
- empirical underidentification or
- structural underidentification
- Empirical underidentification occurs when a
parameter estimate that establishes model
identification has a very small (close to zero)
estimate. - A path coefficient whose value is estimated as
being close to zero may be treated as zero by the
SEM program's matrix inversion algorithm. If that
path coefficient is necessary to identify the
model, the model thus becomes underidentified. - Remedy for Empirical underidentification -
collect more data or respecify the model - Remedy for Structural underidentification -
respecify the model
92Examples of Identified Model
- Case 1 Let say we have an equation
- x 2y 7
- Question Is this equation / model identified?
- Answer No, it is underidentified because
there are an infinite number of solutions for x
and y (e.g., x 5 and y 1, or x 3
and y 2). These values are therefore
underidentified because there are fewer "knowns"
than "unknowns." - Case 2 Let say we have a set of equations
- x 2y 7
- 3x - y 7
- Question Is this equation / model identified?
- Answer Yes, it is just-identified model as
there are as many knowns as unknowns. There is
one best pair of values (x 3, y 2).Â