Statistical Modelling (Special Topic: SEM)

About This Presentation

Title:

Statistical Modelling (Special Topic: SEM)

Description:

Statistical Modelling (Special Topic: SEM) Bidin Yatim, PhD Associate Professor in Statistics College of Art and Science UUM. Phd Applied Statistics (Exeter, UK) – PowerPoint PPT presentation

Number of Views:195

Avg rating:3.0/5.0

Slides: 93

Provided by: stafUumE9

Category:

more less

Transcript and Presenter's Notes

Title: Statistical Modelling (Special Topic: SEM)

1
Statistical Modelling(Special Topic SEM)

Bidin Yatim, PhDAssociate Professor in
Statistics
College of Art and ScienceUUM.Phd Applied
Statistics (Exeter, UK)MSc Industrial Maths
(Aston,UK)BSc Maths Stats (Nottingham, UK)

2
Main Focus

Relationship Analysis
Awareness on the fact that some relationships /
models are meaningful and some are not.
Meaningful relationships / models normally have
theoretical basis (underlying theory) and exhibit
causality or cause-and-effect
For those cause-and-effect relationships, SEM
provides a formal way of analysing them

3
Agenda

Part I SEM the Basic
SEM Nomenclature / Terminologies
SEM related Models
Part II Modeling and Computing
how to draw a model using AMOS.
how to run the AMOS model and evaluate several
key components of the AMOS graphics and text
output, including overall model fit and test
statistics for individual path coefficients.
how to modify and respecify a non-fitting model.
Part III SEM and Its Applications

4
Part One

SEM The Basic
http//58.26.137.12/byatim/

5
An overview SEM References

www.statsoft.com/textbook/stsepath.html
Chapter 1 of Structural Equation Modeling with
AMOS. Basic Concepts, Applications and
Programming
Barbara M. Bryne

6
Welcome to SEM The MusicalLyrics by Alan
Reifman(May be sung to the tune of "Matchmaker,"
Bock/Harnick, from Fiddler on the Roof)

SEM, SEM, it can be sung,Youll be amazed, at
what weve sprung,We hope youll learn more
bout this stats technique,Through songs of
which youre among, SEM, SEM, we like to
run,It takes awhile, but we get it done,We hope
youll learn of the steps that we take,And take
home from this, some fun

7
SEM

Is a statistical methodology of the analysis of a
structural theory that bears on some phenomenon
using a confirmatory (hypothesis testing)
approach. Most other multivariate procedures are
descriptive/ exploratory in nature.
The theory represent causal processes that
generate observations on multiple variables.

8
SEM

conveys 2 important aspects of the procedures..
The causal processes under study are represented
by a series of structural equations, and
These structural equations can be modeled
pictorially to enable a clearer conceptualization
of the theory under study.
The model can be tested simultaneously to
determine the extent to which it is consistent
with the data if the goodness of fit adequate,
the model is not rejected, otherwise the
hypothesized relations rejected.

9
SEM A Note

SEM is a very general, very powerful and very
popular multivariate analysis technique.
It provides a comprehensive method for the
quantification and testing of theories.
Been applied in econometric, psychology,
sociology, political science, education, market
and medical research etc.
Also known as
covariance structure analysis,
covariance structure modeling,
latent vaviable modeling,
confirmatory factor analysis,
linear structural relationship and
analysis of covariance structures.

10
SEM is

a family of statistical techniques which
incorporates and integrates
Path analysis
Linear regression
Factor analysis

11
SEM

serves purposes similar to multiple regression,
but in a more powerful way which takes into
account the modeling
of interactions, nonlinearities, correlated
independents, measurement error, correlated error
terms, multiple latent independents each measured
by multiple indicators, and one or more latent
dependents also each with multiple indicators.
may be used as a more powerful alternative to
multiple regression, path analysis, factor
analysis, time series analysis, and analysis of
covariance. These procedures are special cases of
SEM, or,
is an extension of the general linear model (GLM)
of which multiple regression is a part.

12
Advantages of SEM compared to multiple regression

more flexible assumptions (particularly allowing
interpretation even in the face of
multicollinearity),
use of confirmatory factor analysis to reduce
measurement error by having multiple indicators
per latent variable,
the attraction of SEM's graphical modeling
interface, the desirability of testing models
overall rather than coefficients individually,
the ability to
test models with multiple dependents,
model mediating variables,
model error terms,
test coefficients across multiple
between-subjects groups, and
handle difficult data (time series with
autocorrelated error, non-normal data, incomplete
data).

13
Major applications of structural equation modeling

causal modeling, or path analysis - hypothesizes
causal relationships among variables and tests
the causal models with a linear equation system.
Causal models can involve either manifest
variables, latent variables, or both
confirmatory factor analysis - extension of
factor analysis in which specific hypotheses
about the structure of the factor loadings and
intercorrelations are tested
regression models, in which regression weights
may be constrained to be equal to each other, or
to specified numerical values
covariance structure models, which hypothesize
that a covariance matrix has a particular form.
For example, you can test the hypothesis that a
set of variables all have equal variances with
this procedure
correlation structure models, which hypothesize
that a correlation matrix has a particular form.

14
Aims and Objectives

By the end of this course you should
Have a working knowledge of the principles behind
causality.
Understand the basic steps to building a model of
the phenomenon of interest.
Be able to construct/ interpret path diagrams.
Understand the basic principles of how models are
tested using SEM.
Be able to test models adequacy using SEM
Be able to use AMOS intelligently.

15
SEM Another Note

Assumption 1 you are familiar with the basic
logic of statistical reasoning as described in
Elementary Concepts.
Assumption 2 you are familiar with the concepts
of variance, covariance, correlation and
regression analysis if not, you are advised to
read the Basic Statistics.
It is highly desirable that you have some
background in factor analysis before attempting
to use structural modeling.

16
Introduction to SEM

How Useful is Statistical Model?
The Basic Idea Behind SEM
Causality (Cause-and-Effect Relationship)
SEM Nomenclature/Terminologies
SEM related Statistical Models

17
How Useful is Statistical Model?

All models are wrong, but some are useful
G.E.P Box
SEM models can never be accepted (as absolute
truth) they can only fail to be rejected.
This leads researchers to provisionally accept a
given model.
While models that fit the data well can only be
provisionally accepted, models that do not fit
the data well can be absolutely rejected.

18
The Basic Idea Behind SEM

In Distribution Theory course you are taught
that, if you multiply every number in a list by
some constant K, you multiply the mean of the
numbers by K. Similarly, you multiply the
standard deviation by the absolute value of K.
Suppose you have the list of numbers 1,2,3
having a mean of 2 and a standard deviation of
1. Suppose also you take these 3 numbers and
multiply them by 4. Then the mean would become 8,
and the standard deviation would become 4, the
variance thus 16.

19
The Basic Idea Behind SEM

The point is, if you have a set of numbers X
related to another set of numbers Y by the
equation Y 4X, then the variance of Y must be
16 times that of X, so you can test the
hypothesis that Y and X are related by the
equation Y 4X indirectly by comparing the
variances of the Y and X variables. This idea
generalizes, in various ways, to several
variables inter-related by a group of linear
equations. The rules become more complex, the
calculations more difficult, but the basic
message remains the same -- you can test whether
variables are interrelated through a set of
linear relationships by examining the variances
and covariances of the variables.

20
The Basic Idea Behind SEM

Statisticians have developed procedures for
testing whether a set of variances and
covariances in a covariance matrix fits a
specified structure. The way SEM works is as
follows
You state the way that you (the theory) believe
the variables are inter-related, often with the
use of a path diagram.
You (AMOS) work out, via some complex internal
rules, what the implications of this are for the
variances and covariances of the variables.
You test whether the variances and covariances
fit this model of them.
Results of the statistical testing, and also
parameter estimates and standard errors for the
numerical coefficients in the linear equations
are reported.
On the basis of this information, you decide
whether the model seems like a good fit to your
data.

21
A Simple SEM

SEM is an attempt to model causal relations
between variables by including all variables that
are known to have some involvement in the process
of interest
test the effect of a drug on some psychological
disorder (e.g. obsessive compulsive disorder, OCD)

22
Causality
Causality has theoretical basis
Education
Success in Life
Price
Demand
Supply
Windows of Opportunity for Crime
Unemp-loyment Rate
No. of Crimes
23
Cause and Effect

Philosophers have had a great deal to say about
the conditions necessary to infer causality.
Cause and effect
should occur close together in time,
cause should occur before an effect is observed,
and
the cause should never occur without the presence
of the effect.

24
John Stuart Mill (1865) described three
conditions necessary to infer cause

Cause has to precede effect
Cause and effect must be related
All other explanations of the cause-effect
relationship must be ruled out.

25
To verify the third criterion, Mill proposed the

method of agreement which states that an effect
is present when the cause is present
method of difference which states that when the
cause is absent the effect will be absent also
and
method of concomitant variation which states that
when the above relationships are observed, causal
inference will be made stronger because most
other interpretations of the cause-effect
relationship will have been ruled out.

26
Example

If we wanted to say that me talking about
causality causes boredom, we would have to
satisfy the following conditions
(1) I talk about causality before boredom
occurs.
(2) Whenever I talk about causality, boredom
occurs shortly afterwards.
(3) The correlation between boredom and my
talking about causality must be strong (e.g. 4
out of 4 occasions when I talk about causality
boredom is observed)
. (4) When cause is absent effect is absent
when I dont talk about causality no boredom is
observed.
(5) The manipulation of cause leads to an
associated change in effect. So, if we
manipulated whether someone is listening to me
talking about causality or to my cat is mewing,
the effect elicited should change according to
the manipulation.
This final manipulation serves to rule out
external variables that might affect the
cause-effect relationship.

27
Continue

in situations in which cause cannot be
manipulated we cannot make causal attributions
about our variables. Statistically speaking, this
means that when we analyze data from
non-experimental situations we cannot conclude
anything about cause an effect.
Structural Equation Modeling (SEM) is an attempt
to provide a flexible framework within which
causal models can be built.

28
Statistical Modeling
A Statistical Model DOES NOT necessarily have
theoretical basis It may be interpreted as
either make sense or nonsense
Weight
Heart Disease
Income
Smoking
No. of Road Accidents
No. of Newspaper Readers
29
SEM Related Statistical Models

General Linear Model (GLM)
Regression Model
Time Series Model
Log-linear Model
Mixed Models
Survival Models
Many more

All these Statistical Models may or may not have
theoretical basis
30
Exogenous Latent Variable /Construct
Endogenous Latent Variable
Indicators
Indicators
Exogenous Latent Variable
Indicators
31
SEM Nomenclature

Independent variables, which are assumed to be
measured without error, are called exogenous or
upstream variables
Dependent or mediating variables are called
endogenous or downstream variables.
Manifest or observed variables or indicators are
directly measured by researchers
Latent or unobserved variables are not directly
measured but are inferred by the relationships or
correlations among measured variables in the
analysis. Example, self-concept, motivation,
powerlessness, anomie, verbal ability,
capitalism, social class.

32
SEM Nomenclature (cont.)

SEM illustrates relationships among observed and
unobserved variables using path diagrams.
Ovals or circles represent latent variables,
Rectangles or squares represent measured
variables.
Residuals are always unobserved, so they are
represented by ovals or circles.

33
SEM Definition

SEM is an extension of the general linear model
(GLM) that enables a researcher to test a set of
regression equations simultaneously.
SEM consists of TWO components
Structural Model
illustrates the relationships among the latent
constructs or endogenous variables
Measurement Model
represents how the constructs are related to
their indicators or manifest variables

34
Example
In psychology, the theory postulates that
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
35
Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
36
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
37
Structural Model

The structural model allows for certain
relationships among the latent variables,
depicted by lines or arrows (in a path diagram)
In the path diagram, we specified that Ability
and Achievement were related in a specific way.
That is, intelligence had some influence on later
achievement.
Thus, one result from the structural model is an
indication of the extent to which these a priori
hypothesized relationships are supported by our
sample data.

38
Structural Model (Cont.)

The structural equation addresses the following
questions
Are Ability and Achievement related?
Exactly how strong is the influence of Ability on
Achievement?
Could there be other latent variables that we
need to consider to get a better understanding of
the influence on Achievement?

39
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
40
Mathematical Form of Structural Model
41
Measurement Model

Specifying the relationship between the latent
variables and the observed variables
Answers the questions
To what extent are the observed variables
actually measuring the hypothesized latent
variables?
Which observed variable is the best measure of a
particular latent variable?
To what extent are the observed variables
actually measuring something other than the
hypothesized latent variable?
Using Exploratory Factor Analysis (EFA) or
Confirmatory Factor Analysis (CFA) to determine
the significant observed variables related to
each of the latent variables

42
Exploratory FA (EFA)

In EFA the factor structure or theory about a
phenomenon is NOT KNOWN.
For example, the researcher is interested in
measuring the achievement of a personnel.
Suppose he has no knowledge ( very little theory)
regarding
the factors that contribute to achievement
the no. of indicators of each factor
which indicators represent which factor
In such a case, the researcher may collect data
and explore for a factor or theory which can
explain the correlations among the indicators.

43
Confirmatory FA (CFA)

In CFA the precise factor structure or theory
about a phenomenon is KNOWN or specified priori.
For example, a researcher is interested in
measuring consumer preference to a product.
Suppose that based on previous research it is
hypothesized (the theory) that a construct or
factor to measure consumer preference is
a one-dimensional construct with 7 indicators or
items as its measures
The obvious question is
How well do the empirical data conform to the
theory of consumer preferences? Or
How well do the data fit the model?
In such a case, CFA is used to do empirical
confirmation or testing of the theory

44
Using Factor Analysis
Factor Loadings
Academic Skill
x1
Ability
Inter-personal Skill
x2
Communi-cation Skill
x3
45
Using Factor Analysis
Factor Loadings
Family Status
y1
Aspiration
Fathers Occupation
y2
Peers Influence
y3
46
Using Factor Analysis
Factor Loading
Personal Actualisation
y4
Achievement
Professional Status
y5
Social Status
y6
47
Measurement Model (Cont.)

The relationships between the observed variables
and the latent variables are described by factor
loadings
Factor loadings provide information about the
extent to which a given observed variable is able
to measure the latent variable. They serve as
validity coefficients.
Measurement error is defined as that portion of
an observed variable that is measuring something
other than what the latent variable is
hypothesized to measure. It serves as a measure
of reliability.

48
Measurement Model (Cont.)

Measurement error could be the result of
An unobserved variable that is measuring some
other latent variable
Unreliability
A second-order factor

49
Mathematical Form of Measurement Model
How the latent (unobservable) exogenous variable
are related to their indicators or
manifest/observed variables x1,x2 x3
50
Measurement Model (cont.)
How the TWO latent (unobservable) constructs or
endogenous variables , are related
to their indicators or manifest variables y1, ..y6
51
Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
52
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
53
Reliability

Definition Extent to which a variable or set of
variables or set of variables is consistent in
what it is intended to measure
If multiple measurement are taken, the reliable
measures will all be consistent in their values
It is a degree to which the observed variable
measure the true value and is error free
It is different from validity

54
True Score and Measurement Error

True score a component which indicates the
subject actually stands on the variable
(statement) of interest
Measurement error A component which indicates
the inaccuracies when measuring true scores due
to fallibility of survey instrument, responses
scales, data entry or respondent error

55
Reliability

The degree to which scores are free from random
measurement error
Reliability measures
Internal Consistency Reliability
Test-retest Reliability
Alternate Forms Reliability

56
Reliability

Levels of Reliability
0.90 Excellent
0.80 Very Good
0.70 Adequate
lt0.70 Poor

57
Example Reliability of Observed Variables

Cronbachs alpha were computed for the all
variables
Variable No. of items Reliability
Variable1 10 .91
Variable2 10 .87
Variable3 10 .58
Variable4 10 .70
Variable5 12 .72
Variable6 12 .80
Variable7 12 .80
Variable8 12 .87
Variable9 10 .84
Variable10 7 .71
Variable11 4 .48

58
Summated Scale Reliability

When reliability involves multiple scaled items,
reliability must be measured in a summated scale.
A summated rating scale is a short list of
statements, questions or other items that the
subject responds to.
A summated is a sum of responses from a list of
statements to create an overall score.

59
Reliability coefficient (1)

There are several ways to measure reliability
which will be discussed later.
The measurement is normally called the
reliability coefficient.
This coefficient is the percent of variance in an
observed variable that is accounted for by the
true scores of the underlying construct.

60
Reliability Coefficient (2)

Imagine you have collected 2 scores from a survey
True and observed scores of customer satisfaction
You compute the correlation between the scores
The square of correlation coefficient will be
your reliability coefficient which is
The total variances explained in the observed
scores by the true score or
The percent of variance in observed scores that
is accounted for by true scores.

61
Types of Reliability

Test-retest
Assessed by administering the same instrument to
the same sample respondent at two points in time,
and computing the correlation between two sets of
scores.
Internal consistency reliability
The extent to which individual items that
constitute a test correlate with one another or
with the test total. In short, it measures how
consistently respondents respond to the items
within scale.

62
Types of Reliability (2)

For example, if the first half of an instrument
is educational items which correlate highly among
themselves and second is political items which
correlate highly among themselves., the
instrument would have high internal consistency
anyway, even though they are two distinct
dimensions
Note that measure of internal consistency are
often called measures of internal consistency
reliability or even reliability, but this
merge the distinct concepts of internal
consistency and reliability, which necessarily go
together
How do we solve this problem?
The most commonly used internal consistency
reliability is Cronbachs Alpha

63
Validity

Definition extent to which an item or set of
items correctly represent the construct of study-
the degree of which it is free from any
systematic or non-random error
Validity deals with
How well the construct is defined by the item/s
(what should be measured)
While Reliability deals with
How consistent the item/s is/are measuring the
construct (HOW it is measured)

64
Validity

Whether the scores measure what they are supposed
to measure
Types of validity
Construct Validity (SEM Confirmatory Factor
Analysis helps to establish construct validity)
Criterion-Related Validity (Correlation with an
external standard)
Convergent Validity/ Discriminant Validity (Can
be determined through SEM Confirmatory Factor
Analysis)

65
Examples

Example 1 How happy are you?
This example is validity -whether the measure
accurately represents what it is supposed to
measure
Example 2 How happy are you when you are
smoking? Ask this question repeatedly on the same
subject or multiple subject and see how
consistent their answers are?
This example is about reliability (sometimes Id
like to call it consistency)

66
I Am an IndicatorLyrics by Alan Reifman(May be
sung to the tune of "The Entertainer," Billy Joel)

I am an indicator, a latent construct I
represent,I'm measurable, sometimes pleasurable,
A manifestation of what is meant,I am an
indicator, I usually come in a multiple set,With
other signs of the same construct, you may
instruct, I'm correlated with my co-indicators,
you can bet,I am an indicator, from my presence
the construct is inferred,I'm tap-able, the
construct is not palpable,The distinction should
not be blurred

67
At Least ThreeLyrics by Alan Reifman(May be
sung to the tune of "Think of Me," Lloyd
Webber/Hart/Stilgoe, from Phantom of the Opera)

At least three, indicators are urged,For each
latent construct shown,At least three,
indicators should help,Avoid output where you
groan,With less than three, your construct sure
will be, locally unidentified,Though the model
might still run, you could have a rough ride

68
Total, Direct and Indirect Effects

There is a direct effect between two latent
variables when a single directed line or arrow
connects them
There is an indirect effect between two variables
when the second latent variable is connected to
the first latent variable through one or more
other latent variables
The total effect between two latent variables is
the sum of any direct effect and all indirect
effects that connect them.

69
Example Direct and Indirect
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
70
Semantics

Types of measurement scale
Metric and Non-metric
Correlation coefficient
Correlation and Covariance Matrix
Standardized and Un-standardized Estimates

71
Types of Measurement Scale

There 4 types of measurement scale in a scale
instrument
Nominal Scale
Ordinal
Interval Scales
Ratio
Some other common scales like Likert scales,
Semantic Differential Scales, Dichotomous Scales
etc can be categorized into the 4 above
This is important as assumptions on SEM rely on
what we know on this page

72
Metric and Non-metric Scales

Metric scales are quantitative data where the
parameters of the scale is continuum
Interval or Ratio scale data
Non-metric scales are qualitative data where
attributes, characteristics or categorical
properties that identify or describe a subject or
object
Possibly Nominal or Ordinal scale data
But the use of metric and non-metric scales can
be misused or abused sometimehow?

73
VARIABLE SCALES

SEM in general assumes observed variables are
measured on a linear continuous scale
Dichotomous and ordinal variables cause problems
because correlations /covariances tend to be
truncated. These scores are not normally
distributed and responses to individual items may
not be very reliable.

74
Correlation

Perhaps the most basic semantic
Definition the linear relationship of two
variables
The strength of relationship is determined by the
correlation coefficient and r² (explained later)
There are 2 common types of correlation
coefficient
Pearson Product Moment Correlation (Interval)
Spearman Ranking Correlation (Ordinal)
The former is the one we will use in this course

75
Correlation Matrix (1)

The correlation matrix of n random variables
X1,,Xn is the n n matrix whose i,j entry is
corr(Xi,Xj)
If the measurement of correlation used are
product-moment coefficients, the correlation
matrix is the same as the covariance matrix of
the standardized random variables Xi/SD(Xi) for
i1,,n
Consequently it is necessary a non-negative
definite matrix important assumption
The correlation matrix is symmetric because the
correlation between Xi and Xj is the same as the
correlation between Xj and Xi

76
Correlation Matrix (2)
A1 A2 A3 A4 A5 A6 A7 B1 B2 B3
A1 a1 1.0000 0.65579 lt.0001 0.46296 lt.0001 0.58812 lt.0001 0.62082 lt.0001 0.62629 lt.0001 0.64288 lt.0001 0.34385 0.0004 0.57904 lt.0001 0.56353 lt.0001
A2 a2 0.65579 lt.0001 1.00000 0.45951 lt.0001 0.66297 lt.0001 0.72727 lt.0001 0.77384 lt.0001 0.76693 lt.0001 0.40987 lt.0001 0.67796 lt.0001 0.59493 lt.0001
A3 a3 0.46296 lt.0001 0.45951 lt.0001 1.00000 0.51913 lt.0001 0.46652 lt.0001 0.45752 lt.0001 0.44520 lt.0001 0.33407 0.0006 0.35833 0.0002 0.33623 0.0006
A4 a4 0.55812 lt.0001 0.66297 lt.0001 0.51913 lt.0001 1.00000 0.69905 lt.0001 0.64969 lt.0001 0.59358 lt.0001 0.34148 0.0004 0.58859 lt.0001 0.44284 lt.0001
A5 a5 0.62082 lt.0001 0.72727 lt.0001 0.46652 lt.0001 0.69905 lt.0001 1.00000 0.67281 lt.0001 0.66939 lt.0001 0.31277 lt.0014 0.63133 lt.0001 0.54744 lt.0001
A6 A6 0.62629 lt.0001 0.77384 lt.0001 0.45752 lt.0001 0.64969 lt.0001 0.67281 lt.0001 1.00000 0.86014 lt.0001 0.40483 lt.0001 0.66758 lt.0001 0.56944 lt.0001
A7 A7 0.64288 lt.0001 0.76693 lt.0001 0.44520 lt.0001 0.59358 lt.0001 0.66939 lt.0001 0.86014 lt.0001 1.00000 0.39913 lt.0001 0.68141 lt.0001 0.62075 lt.0001
B1 b1 0.34385 lt.0004 0.40987 lt.0001 0.33407 lt.0006 0.34148 lt.0004 0.31277 lt.0014 0.40483 lt.0001 0.39913 lt.0001 1.00000 0.58187 lt.0001 0.62583 lt.0001
B2 b2 0.57904 lt.0001 0.67796 lt.0001 0.35833 lt.0002 0.58859 lt.0001 0.63133 lt.0001 0.66758 lt.0001 0.68141 lt.0001 0.58187 lt.0001 1.00000 0.85335lt.0001
B3 b3 0.56353 lt.0001 0.59493 lt.0001 0.33623 lt.0006 0.44284 lt.0001 0.54744 lt.0001 0.56944 lt.0001 0.62075 lt.0001 0.62583 lt.0001 0.85335 lt.000 1.00000
77
Correlation Matrix (3)

So we say that
If the input matrix used is the Covariance
Matrix the estimated coefficients in the
parameters measured are unstandardized estimates
If the input matrix used is the Correlation
Matrix the estimated coefficients in the
parameters measured are the standardized
estimates
So what?

78
Covariance

The covariance between two variables equals the
correlation times the product of the variables'
standard deviations. The covariance of a
variable with itself is the variable's variance

79
Correlation Matrix (4)

Therefore when we want to test a theory, we use
variance-covariance matrix
(to validate the causal relationships among
constructs)
When we just want to explain the pattern of the
relationships then we use correlation matrix
(Theory testing is not required)

80
Factors Effecting Correlation/ Covariance
Coefficient

Type of scale and range of values
Pearson correlation is basis for analysis in
regression, path, factor analysis and SEM. Hence
data must be in metric form.
There must be enough variation in scores to allow
correlation relationship to manifest.
Linearity
Pearson correlation coefficient measures degree
of linear relationship between two variables,
hence need to test linearity.
Sample size
SEM requires big sample size. Rule of thumb
10-20 times the number of variables. Ding,
Velicer and Harlow (1995) 100-150 Boomsma
(1982,1983) 400 Hu, Bentler and Kano (1992) in
some cases 5000 is still insufficient Schumaker,
Lomax (1999) many articles 250-500. Bentler and
Chou (1987) for normal data 5 subjects per
variable is sufficient.

81
CovarianceLyrics by Alan Reifman (May be sung to
the tune of "Aquarius," Rado/Ragni/MacDermot,
from Hair, also popularized by the Fifth
Dimension)

You draw paths to show relationships,You hope
align with the known rs,Your model will guide
the tracings,From constructs near to constructs
far,You will compare this with the datas
covariance,The datas covariance...Covariance!C
ovariance!Similar to correlation,With the
variables unstandardized,Does each known
covariance match up with,The one the model
tracings will derive?Covariance!Covariance!

82
SEM Assumptions

Sample Size
a good rule of thumb is gt15 cases per predictor /
indicator (James Stevens Applied Multivariate
Statistics for the Social Sciences)
Model with TWO factors,
recommended sample size gt100
Model with FOUR factors,
recommended sample size gt 200

83
SEM Assumptions (cont.)

Sample Size
Consequences of using smaller samples
convergence failures (the software cannot reach a
satisfactory solution),
improper solutions (including negative error
variance estimates for measured variables),
lowered accuracy of parameter estimates and, in
particular, standard errors
SEM program standard errors are computed under
the assumption of large sample sizes.

84
SEM Assumptions (cont.)

Normality
Many SEM estimation procedures assume
multivariate normal distributions
Lack of univariate normality occurs when the skew
index is gt 3.0 and kurtosis index gt 10.
Multivariate normality can be detected by indices
of multivariate skew or kurtosis
Non-normal distributions can sometimes be
corrected by transforming variables

85
SEM Assumptions (cont.)

Multicollinearity
Occurs when intercorrelations among some
variables are so high that certain mathematical
operations are impossible or results are unstable
because denominators are close to 0.
Bivariate correlations gt0.85
Multiple correlationsgt0.90
May cause a non-positive definite/ singular
covariance matrix
May be due to inclusion of individual and
composite variables
Detection Tolerance 1-R2 , 0.10

Variance Inflation Factor (VIF) 1/(1-R2) gt10
Can be corrected by eliminating or combining
redundant variables

86
SEM Assumptions (cont.)

Outliers
Univariate outliers more than three SDs away from
the mean
Detection by inspecting frequency distributions
and univariate measures of skewness and kurtosis
Multivariate outliers may have extreme scores on
two or more variables or their figurations of
scores may be unusual
Detection by inspecting indices of multivariate
skewness and kurtosis. Mahalanobis Distance
squared is distributed as chi square with df
equal to the number of variables.
Can be remedied by correcting errors or by
dropping these cases of transforming the
variables

87
VIOLATIONS OF ASSUMPTIONS(1)

The best known distribution with no kurtosis is
the multi-normal.
Leptokurtic (more peaked) distributions result in
too many rejections of Ho based on the Chi square
statistic.
Platykurtic distributions will lead to too low
estimates of Chi Square.

88
VIOLATIONS OF ASSUMPTIONS (2)

High degrees of skewness lead to excessively
large Chi square estimates.
In small samples (Nlt100), the Chi square
statistic tends to be too large.

89
SEM, Oh, SEM

Lyrics by Alan Reifman, dedicated to Peter
Westfall (article of his)(May be sung to the
tune of "Galveston," Jimmy Webb, popularized by
Glen Campbell)Ultimately, SEM,Your LVs cannot
be measured,Which gives the critics some
displeasure,Theres nothing physical to grab
on,When you run SEM,SEM, Oh, SEM,You make
many an assumption,Is it recklessness or
gumption?Assume the es uncorrelated...When you
run SEM,I can see the critics point of view,
now,Theyre saying the models arent
unique,That, we must willingly acknowledge,In
response to the critique, if we want to keep on
using...SEM, Oh, SEM...

90
Model Identification (Identified Equations)

Identification refers to the idea that there is
at least one unique solution for each parameter
estimate in a SEM model.
Models in which there is only one possible
solution for each parameter estimate are said to
be just-identified.
Models for which there are an infinite number of
possible parameter estimate values are said to be
underidentified.
Finally, models that have more than one possible
solution (but one best or optimal solution) for
each parameter estimate are considered
overidentified.

91
Model Identification (Identified Equations)

Underidentification
empirical underidentification or
structural underidentification
Empirical underidentification occurs when a
parameter estimate that establishes model
identification has a very small (close to zero)
estimate.
A path coefficient whose value is estimated as
being close to zero may be treated as zero by the
SEM program's matrix inversion algorithm. If that
path coefficient is necessary to identify the
model, the model thus becomes underidentified.
Remedy for Empirical underidentification -
collect more data or respecify the model
Remedy for Structural underidentification -
respecify the model

92
Examples of Identified Model

Case 1 Let say we have an equation
x 2y 7
Question Is this equation / model identified?
Answer No, it is underidentified because
there are an infinite number of solutions for x
and y (e.g., x 5 and y 1, or x 3
and y 2). These values are therefore
underidentified because there are fewer "knowns"
than "unknowns."
Case 2 Let say we have a set of equations
x 2y 7
3x - y 7
Question Is this equation / model identified?
Answer Yes, it is just-identified model as
there are as many knowns as unknowns. There is
one best pair of values (x 3, y 2).