Title: SEM for small samples
1SEM for small samples
ESSEC-HEC Research Workshop Series on PLS
(Partial Least Squares) Developments
2Orange juice example (J. Pagès)
X1 Physico-chemical, X2 Sensorial, X X1,
X2, Y Hedonic
3Structural Equation Modeling The PLS approach of
Herman WOLD
- Study of a system of linear relationships between
latent variables. - Each latent variable is described by a set of
manifest variables, or summarizes them. - Variables can be numerical, ordinal or nominal
(no need for normality assumptions). - The number of observations can be small compare
to the number of variables.
4Orange juice exampleon a homogenous group of
judges
Glucose Fructose Saccharose Sweetening power pH
before processing pH after centrifugation Titer C
itric acid Vitamin C
Physico-chemical
w11
?1
w12
?21
w19
Judge 2 Judge 3 Judge 96
w32
?2
w33
?1
w396
Smell intensity Odor typicity Pulp Taste
intensity Acidity Bitterness Sweetness
Hedonic
w21
?22
?1
w22
w27
Sensorial
5A SEM tree
For good blocks, all methods give almost the
same results.
6Results
- When all blocks are good, all the methods give
practically the same results - M. Tenenhaus Component-based SEM
- Total Quality Management, 2008
- For all data, PLS and SEM yield to highly
correlated LV scores - M. Tenenhaus SEM for small samples
- HEC Working paper, 2008.
Data structures are stronger than statistical
methods.
7PLS algorithm (Mode A, Centroid scheme)
Z1Y2Y3 (inner estimate)
Glucose Fructose Saccharose Sweetening power pH
before processing pH after centrifugation Titer C
itric acid Vitamin C
Y1X1w1 (outer estimate)
w11 Cor(glucose,Z1)
?1
w12 Cor(fructose,Z1)
?21
Juge 2 Juge 3 Juge 96
w19 Cor(vitamin C,Z1)
w32 Cor(judge2,Z3)
?2
?1
w33 Cor(judge3,Z3)
w3,96 Cor(judge96,Z3)
Smell intensity Odor typicity Pulp Taste
intensity Acidity Bitterness Sweetness
Y3X3w3
w21 Cor(smell int.,Z2)
?22
?1
Z3Y1Y2
w22 Cor(odor typ.,Z2)
w27 Cor(Sweetness,Z2)
Iterate until convergence.
Y2X2w2
Z2Y1Y3
8SPECIAL CASES OF PLS PATH MODELLING
- Principal component analysis
- Multiple factor analysis
- Canonical correlation analysis
- Redundancy analysis
- PLS regression
- Generalized canonical correlation analysis
(Horst) - Generalized canonical correlation
analysis(Carroll) - Analyse de la co-inertie multiple (Chessel
Hanafi) - etc.
9Use of XLSTAT-PLSPM
10Outer weight w
Non significant variables are in red
11Outer weight w
12Correlation MV-LV
13Correlation MV-LV
14Use of XLSTAT-PLSPM
Latent variables
Physico-chimical Sensorial Hedonic
--------------------------------------------------
--------- Fruivita refr. 0.917
0.964 1.253 Tropicana refr. 0.630
1.378 0.946 Tropicana r.t.
1.120 0.462 0.742
--------------------------------------------------
--------- Pampryl refr. -0.176
-0.570 -0.747 Joker r.t. -1.680
-0.852 -0.991 Pampryl r.t.
-0.810 -1.381 -1.203
15Use of XLSTAT-PLSPM
16Model estimation by PLS Inner model and
correlations
Fructose
Glucose
Saccharose
Sweetening power
Vitamin C
-.89
.93
-.89
.1
pH before processing
.95
-. 19
pH after
centrifugation
.94
x
.306 (t 1.522)
1
-. 97
Titer
Judge 2
gt0
-. 98
Judge 3
x
gt0
.820
Citric acid
3
M
(t 2.864)
gt0
Smell intensity
Judge 96
.41
.713 (t 3.546)
Odor typicity
. 98
2
R
0.96
x
.71
Pulp
.97
2
-.64
Taste intensity
Sweetness
- .93
-.95
Acidity
Bitterness
Non significant variables are in red
17Estimation of the inner model by PLS regression
The correlation between the physico-chemical and
the sensorial variables can be taken into account
by using PLS regression
R2 0.946
0
.
7
0
0
.
6
0
0
.
5
0
0
.
4
0
CoeffCS1(hédonic)
0
.
3
0
0
.
2
0
0
.
1
0
Physico-chemical
0
.
0
0
sensorial
Validation of PLS regression by Jack-knife
18Use of the PLS option of XLSTAT-PLSPM
Physico-chemical has no direct effect on Hedonic,
but a strong indirect effect.
19Direct, indirect and total effects
20Covariance-based Structural Equation Modeling
Latent variables
Structural model (inner model)
Ici
21Structural Equation Modeling
Measurement model (outer model)
VM
VL
VL
Exogenous
Endogenous
22Structural Equation Modeling
MV covariance matrix
23Covariance-based SEM
ULS algorithm (Unweighted Least Squares)
S Observed covariance matrix for MVs
Goodness-of-fit Index (Jöreskog Sorbum)
24Use of AMOS 6.0 Method ULS
This is a computational trick Residual variances
are passed to errors and can always be computed
afterwards.
25Covariance-based SEM
ULS algorithm with the McDonalds constraints
S Observed covariance matrix for MV
Goodness-of-fit Index (Jöreskog Sorbum)
26Use of AMOS 6.0 - Method ULS - Measurement
residual variances 0
27Results
GFI .903
Outer LV Estimates 2nd McDonalds idea
28(No Transcript)
29(No Transcript)
30Model estimation by SEM-ULS Inner model and
correlations
Fructose
Glucose
Saccharose
Sweetening power
Vitamin C
-.76
.89
-.77
.22
pH before processing
1
-. 08
pH after
centrifugation
1.00
x
.22 (P .35)
1
-. 87
Titer
Judge 2,
gt0
-. 88
Judge 3,
x
gt0
.79
Citric acid
M
3
(P .01)
gt0
Smell intensity
Judge 96
.26
.64 (P .05)
Odor typicity
. 94
2
R
0.96
x
.66
Pulp
1
2
-.56
Taste intensity
Sweetness
- .94
-.97
Acidity
Bitterness
Non significant variables in red. Constraint
weights in blue.
31Use of SEM-ULSLatent variable estimates (Scores)
Latent variables
Physico-chemical Sensorial Hedonic
--------------------------------------------------
--------- Fruivita refr. 0.915
0.866 1.141 Tropicana refr. 0.526
1.270 0.868 Tropicana r.t.
0.832 0.422 0.672
--------------------------------------------------
--------- Pampryl refr. -0.158
-0.526 -0.686 Joker r.t. -1.740
-0.774 -0.867 Pampryl r.t.
-0.375 -1.258 -1.127
32Comparison between the PLS and SEM-ULS scores
33Path analysis on scores with AMOS
Bootstrap validation
34Direct, indirect and total effects
35Conclusion 1 SEM-ULS gt PLS
- When mode A is chosen, outer LV estimates using
Covariance-based SEM (ULS or ML) or Component
based SEM (PLS) are always very close. - It is possible to mimic PLS with a
covariance-based SEM software (McDonald,1996,
Tenenhaus, 2001). - Covariance-based SEM authorizes to implement
constraints on the model parameters. This is
impossible with PLS.
36Conclusion 2 PLS gt SEM-ULS
- When SEM-ULS does not converge or does not give
an admissible solution, PLS is an attractive
alternative. - PLS offers many optimization criterions for the
LV search (but rigorous proofs are still to be
found). - PLS still works when the number of MV is very
high and the number of cases very small (for
example 38 MV and 6 cases). - PLS allows to use formative LV in a much easier
way than SEM-ULS.
37Second particular case Multi-block data
analysis
38Sensory analysis of 21 Loire Red Wines (J. Pagès)
X1
X2
X4
X1 Smell at rest, X2 View, X3 Smell after
shaking, X4 Tasting
39PCA of each block Correlation loadings
40PCA of each block with AMOS Correlation loadings
GFI .301
41Multi-block data analysis Confirmatory Factor
Analysis
GFI .849
42First dimension
Using MV with significant loadings
43First global score
2nd order CFA
GFI .973
44Validation of the first dimension
Correlations
Rest1
View
Shaking1
Tasting1
Rest1
1
View
.621
1
Shaking1
.865
.762
1
Tasting1
.682
.813
.895
1
Score1
.813
.920
.942
.944
45Second dimension
462nd global score
GFI .905
47Validation of the second dimension
Correlations
Rest2
Shaking2
Tasting2
Rest2
1
Shaking2
.789
1
Tasting2
.782
.803
1
Score2
.944
.904
.928
48Mapping of the correlations with the global scores
49Correlation with global quality
New result Not obtained with other multi-block
data analysis methods, nor with factor analysis
of the whole data set.
50Wine visualization in the global score
space Wines marked by Appellation
51Wine visualization in the global score
space Wines marked by Soil
52DAM Dampierre-sur-Loire
53Cuvée Lisagathe 1995
54Final conclusion
- All the proofs of a pudding are in the eating,
not in the cooking .
William Camden (1623)