Title: A simulation study of Pathmox with nonnormal data
1A simulation study of Pathmox with non-normal
data
- Gastón Sánchez, Tomàs Aluja-Banet
based on Ph.D. Gaston Sánchez. PATHMOX approach
Segementation tress in PLS-PM
Laboratory of Information Analysis and Modeling
Universitat Politècnica de Catalunya
2Outline
- Heterogeneity in PLS-PM
- PATHMOX Approach
- Simulation Studies
- Conclusions
3Heterogeneity
Standard application of PLS Path Model One model
for the whole population
We assume the same model for all individuals
(i.e. we are assuming that the satisfaction
model for mobile phone operators Is the same
regardless the individual is a teenager or an
adult)
4Heterogeneity
Different sets of individuals with particular
behavior
5Assignable sources of heterogeneity
Heterogeneity with observed segmentation variables
Segments defined by observable segmentation
variables Group Information (age, gender,
ethnicity, religion, etc) Number of segments
based on the levels (categories) of segmentation
variables Multi-group approaches
Heterogeneity without segmentation variables
Unknown variables causing heterogeneity No Group
Information Unknown number of segments Cluster-bas
ed / Latent Class approaches
6Heterogeneity in PATHMOX
Where the data come from?
Socio-demographic
Segmentation Variables
Survey data
Psycho-demographic
Consumption
It is interesting to detect the segments with
different PLS-PM models
We are searching for groups of individuals where
the relation between image and satisfaction (for
instance) is different for one group to
another. We perform this search to define the
different groups using the external segmentation
variables.
7Heterogeneity in PATHMOX
How to use the segmentation variables?
For each segmentation variable, we can define
binary partitions
Global Model
X11
?1
X1k
Y31
h3
Y32
X21
Y3p
?2
X2q
Z1 Zk Zm Segmentation variables
8The PATHMOX Approach
Segmentation Tree of PLS Path Models
Root Node Global Model
Child Node
MOX gtMOXEXELOA From Nahuatl (Aztec language)
which means Divide into groups
Leaf Node
9Split criterion
Parent Model
A B
Child Models
A
B
10Hypothesis test
Equality of Coefficients
BA1
BB1
B1
VS
BA2
BB2
B2
A B
Under H0
Under H1
E(z1) N(0, s2In) , E(z2 ) N(0, s2In)
Assuming a common variance
F-statistic for multiple regression (Lebart et
al, 1979)
F distribution with p1p2, 2n-2(p1p2) d.f.
Best split with most significant F-statistic
11Stopping criterion
1. Minimum number of elements inside one node
2. p-value gt threshold
p value gt a
3. Specifying a growth depth-level
12Simulation studies
Sensitivity of the F-test respect to
- Skewness of data
- Distance between path coefficients
- Sample Sizes
- Levels of noise of the endogenous term
- Levels of noise of indicators
- Unbalanced segments
- Variances of endogenous residuals
13Experimental conditions
Node A (fixed inner model)
Node B
l11
l11
x11
e11
x11
e11
l12
x1
l12
z
z
x12
e12
x1
x12
e12
x31
e31
l31
b31
x13
e13
x31
e31
l31
l13
x13
e13
l13
b31
l32
x3
l32
x3
x32
e32
x32
e32
l33
l33
b32
l21
x21
e21
b32
x21
l21
e21
x33
e33
x33
e33
l22
x2
l22
x22
e22
x2
x22
e22
l23
x23
e23
l23
x23
e23
Simulation according to the following
experimental conditions
- Data Distributions Normal (Sanchez Aluja
PLS07), Non-normal - Path Coefficients
- Sample size 100, 200, 500, 1000
- Balancing proportions 60, 70, 80, 90
- Noise levels of z 10, 30, 50
- Noise levels of e 10, 30, 50
14Path coefficients
9 Pairs of path coefficients
Gradually changed values
Fixed values
15Data distributions
Examples of Distributions
Beta (6,6)
Beta (9,4)
Beta (9,1)
16Symmetric distribution b (6,6)
17Moderate skew distribution b (9,4)
18High skew distribution b (9,1)
19Global results
20Influence of b distance by distribution
21Influence of sample size by distribution
22Influence of noise of LVs
23Influence of noise of MVs
24Unbalanced Segments (normal data)
25Unbalanced Segments b (9,4)
26Influence of different variances
Different Variances of endogenous error terms
x1
x1
0.7
0.7
h2
h2
0.8
z2
0.8
z2
h1
h1
z1
z1
0.5
0.5
E(z1) N(0, s12In) , E(z2 ) N(0, s22In)
Four types of different variances
27Influence of different variances
28Conclusions
- Non normality of distributions doesnt affect the
results of the test statistic - Splits with unbalanced children nodes delivers
less sensitive p-values of the statistic.
F-statistic favors balanced splits. - Unequal variances of endogenous latent variables
render less reliable the test statistic and hence
the tree. -
- The F test is used to discover unexpected
segments by ordering the splits for a given node,
as a data mining tool.
29References
- Cassel, C., Hackl, P., Westlund, A.H. (1999)
Robustness of partial least squares method for
estimating latent variable quality structures.
Journal of Applied Statistics, 26(4) 435-446. - Chin, W. (2000) Frequently Asked QuestionsPLS
PLS-Graph http//disc-nt.cba.uh.edu/chin/plsfaq/pl
sfaq.htm. - Chin, W. (2003) A Permutation Based Procedure for
Multi-Group Comparison of PLS Models. In
Proceedings of the PLS03 Intl. Symposium, 33-43.
M. Vilares, M. Tenenhaus, P. Coelho, V. Esposito
Vinzi, A. Morineau (Eds), Decisia. - Chin, W.W., Marcolin, B.L., and Newsted, P.R.
(2003) A Partial Least Squares Latent Variable
Approach for Measuring Interaction Effects
Results from a Monte Carlo Simulation Study and
Voice Mail Emotion/Adoption Study. Information
Systems Research, 14(2) 189-217. - Chow, G. (1960) Tests of Equality between Sets of
Coeffs. in Two Linear Regressions. Econometrica,
28(3) 591-605. - Dilon, W.R., Kumar, A. (1994) Latent Structure
and Other Mixture Models in Marketing An
Integrative Survey and Overview. In Advanced
Methods of Marketing Research, Richard P. Bagozzi
(Ed.), Blackwell, 295-351. - Esposito Vinzi, V., Trinchera, L.,
Squillacciotti, S., Tenenhaus, M. (2008)
REBUS-PLS, A Response-based procedure for
detecting unit segments in PLS Path Modelling.
App. Stoch. Models in Business Industry, 24(5)
439-459. - Goodhue, D., Lewis, W. Thompson, R. PLS, Small
Sample Size, and Statistical Power in MIS
Research. (Proceedings of the 39th Hawaii
International Conference on System Sciences -
2006, HICSS06, Track 8, 2006). - Hahn, C., Johnson, M.D., Herrmann, A., Huber,
A. (2002) Capturing Customer Heterogeneity Using
a Finite Mixture PLS Approach. Schmalenbach
Business Review, 54 243-269. - Henseler, J. (2007) A New and Simple Approach to
Multi-Group Analysis in PLS Path Modeling. In H.
Martens and T. Naes. (Eds), Proceedings of the
PLS07 International Symposium, Matforsk, As,
Norway, 104-107.
30References
- Jedidi, K., Jagpal, H.S., DeSarbo, W.S. (1997)
Finite-Mixture Structural Equation Models for
Response-Based Segmentation and Unobserved
Heterogeneity. Marketing Science, 16(1) 39-59. - Lebart, L., Morineau, A., Fénelon J.P. (1979)
Traitement des données statistiques. Paris
Dunod. - Lohmöller, J. B. (1989) Latent Variable Path
Modeling with Partial Least Squares. Heidelberg
Physica-Verlag. - Lubke, G.H. Muthén, B. (2005) Investigating
Population Heterogeneity with Factor Mixture
Models. Psychological Methods, 19(1) 21-39. - Palumbo, F. Romano, R. (2008) Possibilistic PLS
Path Modeling A New Approach to the Multigroup
Comparison. In Proceedings in Computational
Statistics, 303-314. Paula Brito (Ed),
Heidelberg Physica-Verlag. - Ringle, C. Schlittgen, R. (2007) A Genetic
Algorithm Segmentation Approach for Uncovering
Separating Groups of Data in PLS-PM. In H.
Martens T. Naes. (Eds) Proceedings of the
PLS07 Intl. Symposium, Matforsk, As, Norway,
75-78. - Sánchez, G. Aluja, T. (2007) A Simulation Study
of PATHMOX (PLS Path Modeling Segmentation Tree)
Sensitivity. In H. Martens T. Naes. (Eds)
Proceedings of the PLS07 Intl. Symposium,
Matforsk, As, Norway, 33-36. - Serch, O. (2008) Sistema de Visualització de
models PLS-PM. Projecte Final de Carrera.
Facultat dInformàtica de Barcelona, Universitat
Politècnica de Catalunya. Enero, 2008. - Squillacciotti, S. (2005) Prediction oriented
classification in PLS Path Modelling. In
Proceedings of the PLS05 Intl. Symposium, T.
Aluja, J. Casanovas, V. Esposito, A. Morineau, M.
Tenenhaus (Eds), SPAD TestGo, 499-506. - Tenenhaus, M., Esposito Vinzi, V., Chatelin,
Y.M., Lauro, C. (2005) PLS path modeling.
Computational Statistics Data Analysis, 48
159-205.