Statistical Modelling (Special Topic: SEM) - PowerPoint PPT Presentation

1 / 92
About This Presentation
Title:

Statistical Modelling (Special Topic: SEM)

Description:

Statistical Modelling (Special Topic: SEM) Bidin Yatim, PhD Associate Professor in Statistics College of Art and Science UUM. Phd Applied Statistics (Exeter, UK) – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 93
Provided by: stafUumE9
Category:

less

Transcript and Presenter's Notes

Title: Statistical Modelling (Special Topic: SEM)


1
Statistical Modelling(Special Topic SEM)
  • Bidin Yatim, PhDAssociate Professor in
    Statistics
  • College of Art and ScienceUUM.Phd Applied
    Statistics (Exeter, UK)MSc Industrial Maths
    (Aston,UK)BSc Maths Stats (Nottingham, UK)

2
Main Focus
  • Relationship Analysis
  • Awareness on the fact that some relationships /
    models are meaningful and some are not.
  • Meaningful relationships / models normally have
    theoretical basis (underlying theory) and exhibit
    causality or cause-and-effect
  • For those cause-and-effect relationships, SEM
    provides a formal way of analysing them

3
Agenda
  • Part I SEM the Basic
  • SEM Nomenclature / Terminologies
  • SEM related Models
  • Part II Modeling and Computing
  • how to draw a model using AMOS.
  • how to run the AMOS model and evaluate several
    key components of the AMOS graphics and text
    output, including overall model fit and test
    statistics for individual path coefficients.
  • how to modify and respecify a non-fitting model.
  • Part III SEM and Its Applications

4
Part One
  • SEM The Basic
  • http//58.26.137.12/byatim/

5
An overview SEM References
  • www.statsoft.com/textbook/stsepath.html
  • Chapter 1 of Structural Equation Modeling with
    AMOS. Basic Concepts, Applications and
    Programming
  • Barbara M. Bryne

6
Welcome to SEM The MusicalLyrics by Alan
Reifman(May be sung to the tune of "Matchmaker,"
Bock/Harnick, from Fiddler on the Roof)
  • SEM, SEM, it can be sung,Youll be amazed, at
    what weve sprung,We hope youll learn more
    bout this stats technique,Through songs of
    which youre among, SEM, SEM, we like to
    run,It takes awhile, but we get it done,We hope
    youll learn of the steps that we take,And take
    home from this, some fun

7
SEM
  • Is a statistical methodology of the analysis of a
    structural theory that bears on some phenomenon
    using a confirmatory (hypothesis testing)
    approach. Most other multivariate procedures are
    descriptive/ exploratory in nature.
  • The theory represent causal processes that
    generate observations on multiple variables.

8
SEM
  • conveys 2 important aspects of the procedures..
  • The causal processes under study are represented
    by a series of structural equations, and
  • These structural equations can be modeled
    pictorially to enable a clearer conceptualization
    of the theory under study.
  • The model can be tested simultaneously to
    determine the extent to which it is consistent
    with the data if the goodness of fit adequate,
    the model is not rejected, otherwise the
    hypothesized relations rejected.

9
SEM A Note
  • SEM is a very general, very powerful and very
    popular multivariate analysis technique.
  • It provides a comprehensive method for the
    quantification and testing of theories.
  • Been applied in econometric, psychology,
    sociology, political science, education, market
    and medical research etc.
  • Also known as
  • covariance structure analysis,
  • covariance structure modeling,
  • latent vaviable modeling,
  • confirmatory factor analysis,
  • linear structural relationship and
  • analysis of covariance structures.

10
SEM is
  • a family of statistical techniques which
    incorporates and integrates
  • Path analysis
  • Linear regression
  • Factor analysis

11
SEM
  • serves purposes similar to multiple regression,
    but in a more powerful way which takes into
    account the modeling
  • of interactions, nonlinearities, correlated
    independents, measurement error, correlated error
    terms, multiple latent independents each measured
    by multiple indicators, and one or more latent
    dependents also each with multiple indicators.
  • may be used as a more powerful alternative to
    multiple regression, path analysis, factor
    analysis, time series analysis, and analysis of
    covariance. These procedures are special cases of
    SEM, or,
  • is an extension of the general linear model (GLM)
    of which multiple regression is a part.

12
Advantages of SEM compared to multiple regression
  • more flexible assumptions (particularly allowing
    interpretation even in the face of
    multicollinearity),
  • use of confirmatory factor analysis to reduce
    measurement error by having multiple indicators
    per latent variable,
  • the attraction of SEM's graphical modeling
    interface, the desirability of testing models
    overall rather than coefficients individually,
  • the ability to
  • test models with multiple dependents,
  • model mediating variables,
  • model error terms,
  • test coefficients across multiple
    between-subjects groups, and
  • handle difficult data (time series with
    autocorrelated error, non-normal data, incomplete
    data).

13
Major applications of structural equation modeling
  1. causal modeling, or path analysis - hypothesizes
    causal relationships among variables and tests
    the causal models with a linear equation system.
    Causal models can involve either manifest
    variables, latent variables, or both
  2. confirmatory factor analysis - extension of
    factor analysis in which specific hypotheses
    about the structure of the factor loadings and
    intercorrelations are tested
  3. regression models, in which regression weights
    may be constrained to be equal to each other, or
    to specified numerical values
  4. covariance structure models, which hypothesize
    that a covariance matrix has a particular form.
    For example, you can test the hypothesis that a
    set of variables all have equal variances with
    this procedure
  5. correlation structure models, which hypothesize
    that a correlation matrix has a particular form.

14
Aims and Objectives
  • By the end of this course you should
  • Have a working knowledge of the principles behind
    causality.
  • Understand the basic steps to building a model of
    the phenomenon of interest.
  • Be able to construct/ interpret path diagrams.
  • Understand the basic principles of how models are
    tested using SEM.
  • Be able to test models adequacy using SEM
  • Be able to use AMOS intelligently.

15
SEM Another Note
  • Assumption 1 you are familiar with the basic
    logic of statistical reasoning as described in
    Elementary Concepts.
  • Assumption 2 you are familiar with the concepts
    of variance, covariance, correlation and
    regression analysis if not, you are advised to
    read the Basic Statistics.
  • It is highly desirable that you have some
    background in factor analysis before attempting
    to use structural modeling.

16
Introduction to SEM
  • How Useful is Statistical Model?
  • The Basic Idea Behind SEM
  • Causality (Cause-and-Effect Relationship)
  • SEM Nomenclature/Terminologies
  • SEM related Statistical Models

17
How Useful is Statistical Model?
  • All models are wrong, but some are useful
  • G.E.P Box
  • SEM models can never be accepted (as absolute
    truth) they can only fail to be rejected.
  • This leads researchers to provisionally accept a
    given model.
  • While models that fit the data well can only be
    provisionally accepted, models that do not fit
    the data well can be absolutely rejected.

18
The Basic Idea Behind SEM
  • In Distribution Theory course you are taught
    that, if you multiply every number in a list by
    some constant K, you multiply the mean of the
    numbers by K. Similarly, you multiply the
    standard deviation by the absolute value of K.
  • Suppose you have the list of numbers 1,2,3
    having a mean of 2 and a standard deviation of
    1. Suppose also you take these 3 numbers and
    multiply them by 4. Then the mean would become 8,
    and the standard deviation would become 4, the
    variance thus 16.

19
The Basic Idea Behind SEM
  • The point is, if you have a set of numbers X
    related to another set of numbers Y by the
    equation Y 4X, then the variance of Y must be
    16 times that of X, so you can test the
    hypothesis that Y and X are related by the
    equation Y 4X indirectly by comparing the
    variances of the Y and X variables. This idea
    generalizes, in various ways, to several
    variables inter-related by a group of linear
    equations. The rules become more complex, the
    calculations more difficult, but the basic
    message remains the same -- you can test whether
    variables are interrelated through a set of
    linear relationships by examining the variances
    and covariances of the variables.

20
The Basic Idea Behind SEM
  • Statisticians have developed procedures for
    testing whether a set of variances and
    covariances in a covariance matrix fits a
    specified structure. The way SEM works is as
    follows
  • You state the way that you (the theory) believe
    the variables are inter-related, often with the
    use of a path diagram.
  • You (AMOS) work out, via some complex internal
    rules, what the implications of this are for the
    variances and covariances of the variables.
  • You test whether the variances and covariances
    fit this model of them.
  • Results of the statistical testing, and also
    parameter estimates and standard errors for the
    numerical coefficients in the linear equations
    are reported.
  • On the basis of this information, you decide
    whether the model seems like a good fit to your
    data.

21
A Simple SEM
  • SEM is an attempt to model causal relations
    between variables by including all variables that
    are known to have some involvement in the process
    of interest
  • test the effect of a drug on some psychological
    disorder (e.g. obsessive compulsive disorder, OCD)

22
Causality
Causality has theoretical basis
Education
Success in Life
Price
Demand
Supply
Windows of Opportunity for Crime
Unemp-loyment Rate
No. of Crimes
23
Cause and Effect
  • Philosophers have had a great deal to say about
    the conditions necessary to infer causality.
    Cause and effect
  • should occur close together in time,
  • cause should occur before an effect is observed,
    and
  • the cause should never occur without the presence
    of the effect.

24
John Stuart Mill (1865) described three
conditions necessary to infer cause
  • Cause has to precede effect
  • Cause and effect must be related
  • All other explanations of the cause-effect
    relationship must be ruled out.

25
To verify the third criterion, Mill proposed the
  • method of agreement which states that an effect
    is present when the cause is present
  • method of difference which states that when the
    cause is absent the effect will be absent also
    and
  • method of concomitant variation which states that
    when the above relationships are observed, causal
    inference will be made stronger because most
    other interpretations of the cause-effect
    relationship will have been ruled out.

26
Example
  • If we wanted to say that me talking about
    causality causes boredom, we would have to
    satisfy the following conditions
  • (1) I talk about causality before boredom
    occurs.
  • (2) Whenever I talk about causality, boredom
    occurs shortly afterwards.
  • (3) The correlation between boredom and my
    talking about causality must be strong (e.g. 4
    out of 4 occasions when I talk about causality
    boredom is observed)
  • . (4) When cause is absent effect is absent
    when I dont talk about causality no boredom is
    observed.
  • (5) The manipulation of cause leads to an
    associated change in effect. So, if we
    manipulated whether someone is listening to me
    talking about causality or to my cat is mewing,
    the effect elicited should change according to
    the manipulation.
  • This final manipulation serves to rule out
    external variables that might affect the
    cause-effect relationship.

27
Continue
  • in situations in which cause cannot be
    manipulated we cannot make causal attributions
    about our variables. Statistically speaking, this
    means that when we analyze data from
    non-experimental situations we cannot conclude
    anything about cause an effect.
  • Structural Equation Modeling (SEM) is an attempt
    to provide a flexible framework within which
    causal models can be built.

28
Statistical Modeling
A Statistical Model DOES NOT necessarily have
theoretical basis It may be interpreted as
either make sense or nonsense
Weight
Heart Disease
Income
Smoking
No. of Road Accidents
No. of Newspaper Readers
29
SEM Related Statistical Models
  • General Linear Model (GLM)
  • Regression Model
  • Time Series Model
  • Log-linear Model
  • Mixed Models
  • Survival Models
  • Many more

All these Statistical Models may or may not have
theoretical basis
30
Exogenous Latent Variable /Construct
Endogenous Latent Variable
Indicators
Indicators
Exogenous Latent Variable
Indicators
31
SEM Nomenclature
  • Independent variables, which are assumed to be
    measured without error, are called exogenous or
    upstream variables
  • Dependent or mediating variables are called
    endogenous or downstream variables. 
  • Manifest or observed variables or indicators are
    directly measured by researchers
  • Latent or unobserved variables are not directly
    measured but are inferred by the relationships or
    correlations among measured variables in the
    analysis. Example, self-concept, motivation,
    powerlessness, anomie, verbal ability,
    capitalism, social class.

32
SEM Nomenclature (cont.)
  • SEM illustrates relationships among observed and
    unobserved variables using path diagrams.
  • Ovals or circles represent latent variables,
  • Rectangles or squares represent measured
    variables.
  • Residuals are always unobserved, so they are
    represented by ovals or circles. 

33
SEM Definition
  • SEM is an extension of the general linear model
    (GLM) that enables a researcher to test a set of
    regression equations simultaneously.
  • SEM consists of TWO components
  • Structural Model
  • illustrates the relationships among the latent
    constructs or endogenous variables
  • Measurement Model
  • represents how the constructs are related to
    their indicators or manifest variables

34
Example
In psychology, the theory postulates that
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
35
Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
36
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
37
Structural Model
  • The structural model allows for certain
    relationships among the latent variables,
    depicted by lines or arrows (in a path diagram)
  • In the path diagram, we specified that Ability
    and Achievement were related in a specific way.
    That is, intelligence had some influence on later
    achievement.
  • Thus, one result from the structural model is an
    indication of the extent to which these a priori
    hypothesized relationships are supported by our
    sample data.

38
Structural Model (Cont.)
  • The structural equation addresses the following
    questions
  • Are Ability and Achievement related?
  • Exactly how strong is the influence of Ability on
    Achievement?
  • Could there be other latent variables that we
    need to consider to get a better understanding of
    the influence on Achievement?

39
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
40
Mathematical Form of Structural Model
41
Measurement Model
  • Specifying the relationship between the latent
    variables and the observed variables
  • Answers the questions
  • To what extent are the observed variables
    actually measuring the hypothesized latent
    variables?
  • Which observed variable is the best measure of a
    particular latent variable?
  • To what extent are the observed variables
    actually measuring something other than the
    hypothesized latent variable?
  • Using Exploratory Factor Analysis (EFA) or
    Confirmatory Factor Analysis (CFA) to determine
    the significant observed variables related to
    each of the latent variables

42
Exploratory FA (EFA)
  • In EFA the factor structure or theory about a
    phenomenon is NOT KNOWN.
  • For example, the researcher is interested in
    measuring the achievement of a personnel.
  • Suppose he has no knowledge ( very little theory)
    regarding
  • the factors that contribute to achievement
  • the no. of indicators of each factor
  • which indicators represent which factor
  • In such a case, the researcher may collect data
    and explore for a factor or theory which can
    explain the correlations among the indicators.

43
Confirmatory FA (CFA)
  • In CFA the precise factor structure or theory
    about a phenomenon is KNOWN or specified priori.
  • For example, a researcher is interested in
    measuring consumer preference to a product.
  • Suppose that based on previous research it is
    hypothesized (the theory) that a construct or
    factor to measure consumer preference is
  • a one-dimensional construct with 7 indicators or
    items as its measures
  • The obvious question is
  • How well do the empirical data conform to the
    theory of consumer preferences? Or
  • How well do the data fit the model?
  • In such a case, CFA is used to do empirical
    confirmation or testing of the theory

44
Using Factor Analysis
Factor Loadings
Academic Skill
x1
Ability
Inter-personal Skill
x2
Communi-cation Skill
x3
45
Using Factor Analysis
Factor Loadings
Family Status
y1
Aspiration
Fathers Occupation
y2
Peers Influence
y3
46
Using Factor Analysis
Factor Loading
Personal Actualisation
y4
Achievement
Professional Status
y5
Social Status
y6
47
Measurement Model (Cont.)
  • The relationships between the observed variables
    and the latent variables are described by factor
    loadings
  • Factor loadings provide information about the
    extent to which a given observed variable is able
    to measure the latent variable. They serve as
    validity coefficients.
  • Measurement error is defined as that portion of
    an observed variable that is measuring something
    other than what the latent variable is
    hypothesized to measure. It serves as a measure
    of reliability.

48
Measurement Model (Cont.)
  • Measurement error could be the result of
  • An unobserved variable that is measuring some
    other latent variable
  • Unreliability
  • A second-order factor

49
Mathematical Form of Measurement Model
How the latent (unobservable) exogenous variable
are related to their indicators or
manifest/observed variables x1,x2 x3
50
Measurement Model (cont.)
How the TWO latent (unobservable) constructs or
endogenous variables , are related
to their indicators or manifest variables y1, ..y6
51
Full Latent Variable Model
Aspiration
Achievement
Ability
Interpersonal Skill, x2
Peers Influence y3
Family Status, y1
Fathers Occupation, y2
Professional Status, y5
Social Status, y6
Personal Actualization, y4
Academic Skill, x1
Communication Skill, x3
52
Example ONE Latent (unobserved) Exogenous
Variable TWO Latent (unobserved) Endogenous
Variables
Structural Model
Measurement Model
53
Reliability
  • Definition Extent to which a variable or set of
    variables or set of variables is consistent in
    what it is intended to measure
  • If multiple measurement are taken, the reliable
    measures will all be consistent in their values
  • It is a degree to which the observed variable
    measure the true value and is error free
  • It is different from validity

54
True Score and Measurement Error
  • True score a component which indicates the
    subject actually stands on the variable
    (statement) of interest
  • Measurement error A component which indicates
    the inaccuracies when measuring true scores due
    to fallibility of survey instrument, responses
    scales, data entry or respondent error

55
Reliability
  • The degree to which scores are free from random
    measurement error
  • Reliability measures
  • Internal Consistency Reliability
  • Test-retest Reliability
  • Alternate Forms Reliability

56
Reliability
  • Levels of Reliability
  • 0.90 Excellent
  • 0.80 Very Good
  • 0.70 Adequate
  • lt0.70 Poor

57
Example Reliability of Observed Variables
  • Cronbachs alpha were computed for the all
    variables
  • Variable No. of items Reliability
  • Variable1 10 .91
  • Variable2 10 .87
  • Variable3 10 .58
  • Variable4 10 .70
  • Variable5 12 .72
  • Variable6 12 .80
  • Variable7 12 .80
  • Variable8 12 .87
  • Variable9 10 .84
  • Variable10 7 .71
  • Variable11 4 .48

58
Summated Scale Reliability
  • When reliability involves multiple scaled items,
    reliability must be measured in a summated scale.
  • A summated rating scale is a short list of
    statements, questions or other items that the
    subject responds to.
  • A summated is a sum of responses from a list of
    statements to create an overall score.

59
Reliability coefficient (1)
  • There are several ways to measure reliability
    which will be discussed later.
  • The measurement is normally called the
    reliability coefficient.
  • This coefficient is the percent of variance in an
    observed variable that is accounted for by the
    true scores of the underlying construct.

60
Reliability Coefficient (2)
  • Imagine you have collected 2 scores from a survey
  • True and observed scores of customer satisfaction
  • You compute the correlation between the scores
  • The square of correlation coefficient will be
    your reliability coefficient which is
  • The total variances explained in the observed
    scores by the true score or
  • The percent of variance in observed scores that
    is accounted for by true scores.

61
Types of Reliability
  • Test-retest
  • Assessed by administering the same instrument to
    the same sample respondent at two points in time,
    and computing the correlation between two sets of
    scores.
  • Internal consistency reliability
  • The extent to which individual items that
    constitute a test correlate with one another or
    with the test total. In short, it measures how
    consistently respondents respond to the items
    within scale.

62
Types of Reliability (2)
  • For example, if the first half of an instrument
    is educational items which correlate highly among
    themselves and second is political items which
    correlate highly among themselves., the
    instrument would have high internal consistency
    anyway, even though they are two distinct
    dimensions
  • Note that measure of internal consistency are
    often called measures of internal consistency
    reliability or even reliability, but this
    merge the distinct concepts of internal
    consistency and reliability, which necessarily go
    together
  • How do we solve this problem?
  • The most commonly used internal consistency
    reliability is Cronbachs Alpha

63
Validity
  • Definition extent to which an item or set of
    items correctly represent the construct of study-
    the degree of which it is free from any
    systematic or non-random error
  • Validity deals with
  • How well the construct is defined by the item/s
    (what should be measured)
  • While Reliability deals with
  • How consistent the item/s is/are measuring the
    construct (HOW it is measured)

64
Validity
  • Whether the scores measure what they are supposed
    to measure
  • Types of validity
  • Construct Validity (SEM Confirmatory Factor
    Analysis helps to establish construct validity)
  • Criterion-Related Validity (Correlation with an
    external standard)
  • Convergent Validity/ Discriminant Validity (Can
    be determined through SEM Confirmatory Factor
    Analysis)

65
Examples
  • Example 1 How happy are you?
  • This example is validity -whether the measure
    accurately represents what it is supposed to
    measure
  • Example 2 How happy are you when you are
    smoking? Ask this question repeatedly on the same
    subject or multiple subject and see how
    consistent their answers are?
  • This example is about reliability (sometimes Id
    like to call it consistency)

66
I Am an IndicatorLyrics by Alan Reifman(May be
sung to the tune of "The Entertainer," Billy Joel)
  • I am an indicator, a latent construct I
    represent,I'm measurable, sometimes pleasurable,
    A manifestation of what is meant,I am an
    indicator, I usually come in a multiple set,With
    other signs of the same construct, you may
    instruct, I'm correlated with my co-indicators,
    you can bet,I am an indicator, from my presence
    the construct is inferred,I'm tap-able, the
    construct is not palpable,The distinction should
    not be blurred

67
At Least ThreeLyrics by Alan Reifman(May be
sung to the tune of "Think of Me," Lloyd
Webber/Hart/Stilgoe, from Phantom of the Opera)
  • At least three, indicators are urged,For each
    latent construct shown,At least three,
    indicators should help,Avoid output where you
    groan,With less than three, your construct sure
    will be, locally unidentified,Though the model
    might still run, you could have a rough ride

68
Total, Direct and Indirect Effects
  • There is a direct effect between two latent
    variables when a single directed line or arrow
    connects them
  • There is an indirect effect between two variables
    when the second latent variable is connected to
    the first latent variable through one or more
    other latent variables
  • The total effect between two latent variables is
    the sum of any direct effect and all indirect
    effects that connect them.

69
Example Direct and Indirect
Ability / Intelligence
Aspirations
Achievement
Exogenous Latent Construct
Endogenous Latent Construct
Endogenous Latent Construct
70
Semantics
  • Types of measurement scale
  • Metric and Non-metric
  • Correlation coefficient
  • Correlation and Covariance Matrix
  • Standardized and Un-standardized Estimates

71
Types of Measurement Scale
  • There 4 types of measurement scale in a scale
    instrument
  • Nominal Scale
  • Ordinal
  • Interval Scales
  • Ratio
  • Some other common scales like Likert scales,
    Semantic Differential Scales, Dichotomous Scales
    etc can be categorized into the 4 above
  • This is important as assumptions on SEM rely on
    what we know on this page

72
Metric and Non-metric Scales
  • Metric scales are quantitative data where the
    parameters of the scale is continuum
  • Interval or Ratio scale data
  • Non-metric scales are qualitative data where
    attributes, characteristics or categorical
    properties that identify or describe a subject or
    object
  • Possibly Nominal or Ordinal scale data
  • But the use of metric and non-metric scales can
    be misused or abused sometimehow?

73
VARIABLE SCALES
  • SEM in general assumes observed variables are
    measured on a linear continuous scale
  • Dichotomous and ordinal variables cause problems
    because correlations /covariances tend to be
    truncated. These scores are not normally
    distributed and responses to individual items may
    not be very reliable.

74
Correlation
  • Perhaps the most basic semantic
  • Definition the linear relationship of two
    variables
  • The strength of relationship is determined by the
    correlation coefficient and r² (explained later)
  • There are 2 common types of correlation
    coefficient
  • Pearson Product Moment Correlation (Interval)
  • Spearman Ranking Correlation (Ordinal)
  • The former is the one we will use in this course

75
Correlation Matrix (1)
  • The correlation matrix of n random variables
    X1,,Xn is the n n matrix whose i,j entry is
    corr(Xi,Xj)
  • If the measurement of correlation used are
    product-moment coefficients, the correlation
    matrix is the same as the covariance matrix of
    the standardized random variables Xi/SD(Xi) for
    i1,,n
  • Consequently it is necessary a non-negative
    definite matrix important assumption
  • The correlation matrix is symmetric because the
    correlation between Xi and Xj is the same as the
    correlation between Xj and Xi

76
Correlation Matrix (2)
A1 A2 A3 A4 A5 A6 A7 B1 B2 B3
A1 a1 1.0000 0.65579 lt.0001 0.46296 lt.0001 0.58812 lt.0001 0.62082 lt.0001 0.62629 lt.0001 0.64288 lt.0001 0.34385 0.0004 0.57904 lt.0001 0.56353 lt.0001
A2 a2 0.65579 lt.0001 1.00000 0.45951 lt.0001 0.66297 lt.0001 0.72727 lt.0001 0.77384 lt.0001 0.76693 lt.0001 0.40987 lt.0001 0.67796 lt.0001 0.59493 lt.0001
A3 a3 0.46296 lt.0001 0.45951 lt.0001 1.00000 0.51913 lt.0001 0.46652 lt.0001 0.45752 lt.0001 0.44520 lt.0001 0.33407 0.0006 0.35833 0.0002 0.33623 0.0006
A4 a4 0.55812 lt.0001 0.66297 lt.0001 0.51913 lt.0001 1.00000 0.69905 lt.0001 0.64969 lt.0001 0.59358 lt.0001 0.34148 0.0004 0.58859 lt.0001 0.44284 lt.0001
A5 a5 0.62082 lt.0001 0.72727 lt.0001 0.46652 lt.0001 0.69905 lt.0001 1.00000 0.67281 lt.0001 0.66939 lt.0001 0.31277 lt.0014 0.63133 lt.0001 0.54744 lt.0001
A6 A6 0.62629 lt.0001 0.77384 lt.0001 0.45752 lt.0001 0.64969 lt.0001 0.67281 lt.0001 1.00000 0.86014 lt.0001 0.40483 lt.0001 0.66758 lt.0001 0.56944 lt.0001
A7 A7 0.64288 lt.0001 0.76693 lt.0001 0.44520 lt.0001 0.59358 lt.0001 0.66939 lt.0001 0.86014 lt.0001 1.00000 0.39913 lt.0001 0.68141 lt.0001 0.62075 lt.0001
B1 b1 0.34385 lt.0004 0.40987 lt.0001 0.33407 lt.0006 0.34148 lt.0004 0.31277 lt.0014 0.40483 lt.0001 0.39913 lt.0001 1.00000 0.58187 lt.0001 0.62583 lt.0001
B2 b2 0.57904 lt.0001 0.67796 lt.0001 0.35833 lt.0002 0.58859 lt.0001 0.63133 lt.0001 0.66758 lt.0001 0.68141 lt.0001 0.58187 lt.0001 1.00000 0.85335lt.0001
B3 b3 0.56353 lt.0001 0.59493 lt.0001 0.33623 lt.0006 0.44284 lt.0001 0.54744 lt.0001 0.56944 lt.0001 0.62075 lt.0001 0.62583 lt.0001 0.85335 lt.000 1.00000
77
Correlation Matrix (3)
  • So we say that
  • If the input matrix used is the Covariance
    Matrix the estimated coefficients in the
    parameters measured are unstandardized estimates
  • If the input matrix used is the Correlation
    Matrix the estimated coefficients in the
    parameters measured are the standardized
    estimates
  • So what?

78
Covariance
  • The covariance between two variables equals the
    correlation times the product of the variables'
    standard deviations.  The covariance of a
    variable with itself is the variable's variance

79
Correlation Matrix (4)
  • Therefore when we want to test a theory, we use
    variance-covariance matrix
  • (to validate the causal relationships among
    constructs)
  • When we just want to explain the pattern of the
    relationships then we use correlation matrix
  • (Theory testing is not required)

80
Factors Effecting Correlation/ Covariance
Coefficient
  • Type of scale and range of values
  • Pearson correlation is basis for analysis in
    regression, path, factor analysis and SEM. Hence
    data must be in metric form.
  • There must be enough variation in scores to allow
    correlation relationship to manifest.
  • Linearity
  • Pearson correlation coefficient measures degree
    of linear relationship between two variables,
    hence need to test linearity.
  • Sample size
  • SEM requires big sample size. Rule of thumb
    10-20 times the number of variables. Ding,
    Velicer and Harlow (1995) 100-150 Boomsma
    (1982,1983) 400 Hu, Bentler and Kano (1992) in
    some cases 5000 is still insufficient Schumaker,
    Lomax (1999) many articles 250-500. Bentler and
    Chou (1987) for normal data 5 subjects per
    variable is sufficient.

81
CovarianceLyrics by Alan Reifman (May be sung to
the tune of "Aquarius," Rado/Ragni/MacDermot,
from Hair, also popularized by the Fifth
Dimension)
  • You draw paths to show relationships,You hope
    align with the known rs,Your model will guide
    the tracings,From constructs near to constructs
    far,You will compare this with the datas
    covariance,The datas covariance...Covariance!C
    ovariance!Similar to correlation,With the
    variables unstandardized,Does each known
    covariance match up with,The one the model
    tracings will derive?Covariance!Covariance!

82
SEM Assumptions
  • Sample Size
  • a good rule of thumb is gt15 cases per predictor /
    indicator (James Stevens Applied Multivariate
    Statistics for the Social Sciences)
  • Model with TWO factors,
    recommended sample size gt100
  • Model with FOUR factors,
    recommended sample size gt 200

83
SEM Assumptions (cont.)
  • Sample Size
  • Consequences of using smaller samples
  • convergence failures (the software cannot reach a
    satisfactory solution),
  • improper solutions (including negative error
    variance estimates for measured variables),
  • lowered accuracy of parameter estimates and, in
    particular, standard errors
  • SEM program standard errors are computed under
    the assumption of large sample sizes. 

84
SEM Assumptions (cont.)
  • Normality
  • Many SEM estimation procedures assume
    multivariate normal distributions
  • Lack of univariate normality occurs when the skew
    index is gt 3.0 and kurtosis index gt 10.
  • Multivariate normality can be detected by indices
    of multivariate skew or kurtosis
  • Non-normal distributions can sometimes be
    corrected by transforming variables

85
SEM Assumptions (cont.)
  • Multicollinearity
  • Occurs when intercorrelations among some
    variables are so high that certain mathematical
    operations are impossible or results are unstable
    because denominators are close to 0.
  • Bivariate correlations gt0.85
  • Multiple correlationsgt0.90
  • May cause a non-positive definite/ singular
    covariance matrix
  • May be due to inclusion of individual and
    composite variables
  • Detection Tolerance 1-R2 , 0.10

    Variance Inflation Factor (VIF) 1/(1-R2) gt10
  • Can be corrected by eliminating or combining
    redundant variables

86
SEM Assumptions (cont.)
  • Outliers
  • Univariate outliers more than three SDs away from
    the mean
  • Detection by inspecting frequency distributions
    and univariate measures of skewness and kurtosis
  • Multivariate outliers may have extreme scores on
    two or more variables or their figurations of
    scores may be unusual
  • Detection by inspecting indices of multivariate
    skewness and kurtosis. Mahalanobis Distance
    squared is distributed as chi square with df
    equal to the number of variables.
  • Can be remedied by correcting errors or by
    dropping these cases of transforming the
    variables

87
VIOLATIONS OF ASSUMPTIONS(1)
  • The best known distribution with no kurtosis is
    the multi-normal.
  • Leptokurtic (more peaked) distributions result in
    too many rejections of Ho based on the Chi square
    statistic.
  • Platykurtic distributions will lead to too low
    estimates of Chi Square.

88
VIOLATIONS OF ASSUMPTIONS (2)
  • High degrees of skewness lead to excessively
    large Chi square estimates.
  • In small samples (Nlt100), the Chi square
    statistic tends to be too large.

89
SEM, Oh, SEM
  • Lyrics by Alan Reifman, dedicated to Peter
    Westfall (article of his)(May be sung to the
    tune of "Galveston," Jimmy Webb, popularized by
    Glen Campbell)Ultimately, SEM,Your LVs cannot
    be measured,Which gives the critics some
    displeasure,Theres nothing physical to grab
    on,When you run SEM,SEM, Oh, SEM,You make
    many an assumption,Is it recklessness or
    gumption?Assume the es uncorrelated...When you
    run SEM,I can see the critics point of view,
    now,Theyre saying the models arent
    unique,That, we must willingly acknowledge,In
    response to the critique, if we want to keep on
    using...SEM, Oh, SEM...

90
Model Identification (Identified Equations)
  • Identification refers to the idea that there is
    at least one unique solution for each parameter
    estimate in a SEM model.
  • Models in which there is only one possible
    solution for each parameter estimate are said to
    be just-identified.
  • Models for which there are an infinite number of
    possible parameter estimate values are said to be
    underidentified.
  • Finally, models that have more than one possible
    solution (but one best or optimal solution) for
    each parameter estimate are considered
    overidentified. 

91
Model Identification (Identified Equations)
  • Underidentification
  • empirical underidentification or
  • structural underidentification
  • Empirical underidentification occurs when a
    parameter estimate that establishes model
    identification has a very small (close to zero)
    estimate.
  • A path coefficient whose value is estimated as
    being close to zero may be treated as zero by the
    SEM program's matrix inversion algorithm. If that
    path coefficient is necessary to identify the
    model, the model thus becomes underidentified. 
  • Remedy for Empirical underidentification -
    collect more data or respecify the model
  • Remedy for Structural underidentification -
    respecify the model

92
Examples of Identified Model
  • Case 1 Let say we have an equation
  • x 2y 7
  • Question Is this equation / model identified?
  • Answer No, it is underidentified because
    there are an infinite number of solutions for x
    and y (e.g., x 5 and y 1, or x 3
    and y 2). These values are therefore
    underidentified because there are fewer "knowns"
    than "unknowns."
  • Case 2 Let say we have a set of equations
  • x 2y 7
  • 3x - y 7
  • Question Is this equation / model identified?
  • Answer Yes, it is just-identified model as
    there are as many knowns as unknowns. There is
    one best pair of values (x 3, y 2). 
Write a Comment
User Comments (0)
About PowerShow.com