Title: Estimating Interaction Effects Using Multiple Regression
1Estimating Interaction Effects Using Multiple
Regression
- Herman Aguinis, Ph.D.
- Mehalchin Term Professor of Management
- The Business School
- University of Colorado at Denver
- www.cudenver.edu/haguinis
2Overview
- What is an Interaction Effect?
- The So What Question Importance of Interaction
Effects for Theory and Practice - Estimating Interaction Effects Using Moderated
Multiple Regression (MMR) - Problems with MMR
- Aguinis, Beaty, Boik, Pierce (2005, J. of
Applied Psychology) - The Now What Question Addressing problems with
MMR - Some Conclusions
3What is an Interaction Effect?
- The relationship between X and Y depends on Z
(i.e., a moderator) - X Y X Y
-
- Z Z
- Other terms used
- Population control variable (Gaylord Carroll,
1948) Subgrouping variable (Frederiksen
Melville, 1954) Predictability variable
(Ghiselli, 1956) Referent variable (Toops,
1959) Modifier variable (Grooms Endler,
1960)Homologizer variable (Johnson, 1966)
4Importance of Interaction Effects Theory
- Going beyond main effects
- We typically say it depends
- More complex models
- If we want to know how well we are doing in the
biological, psychological, and social sciences,
an index that will serve us well is how far we
have advanced in our understanding of the
moderator variables of our field (Hall
Rosenthal, 1991, p. 447)
5Importance of Interaction Effects Practice
- For example, personnel selection
- Test bias The relationship between a test and a
criterion depends on gender or ethnicity - No bias exists if the regression equations
relating the test and the criterion are
indistinguishable for the groups in question
(Standards, 1999, p. 79) - In other words, the X-Y relationship differs
depending on the value of Z (e.g., 1 Female, 0
Male)
6Illustration of Gender as a Moderator in
Personnel Selection
Women
Ywomen
Common line
Ycommon
Ymen
Job Performance
Men
X
Test Scores
7Importance of Interaction Effects Practice
- Management in General
- Does an intervention work similarly well for, for
example, Cantonese and American employees working
in Hong Kong? (categorical moderator) - Example Performance management system regarding
teaching at university in Hong Kong. Would the
same evaluation methods lead to employee (i.e.,
faculty) satisfaction depending on the national
origin of faculty members?
8Estimating Interaction Effects
- Moderated Multiple Regression (MMR)
- Y a b1 X b2 Z b3 XZ,
- where Y criterion (continuous variable)
- X predictor (typically continuous)
- Z moderator (continuous or
categorical) - XZ product term carrying information
about the moderating effect (i.e., interaction
between X and Z)
9Statistical Significance Test
- Y a b1 X b2 Z
- Y a b1 X b2 Z b3 XZ
-
- Ho ?1 ?2
- Ho ß3 0 (using a t-statistic)
10Estimating Interaction Effects Using Moderated
Multiple Regression (MMR)
-
- For example
- Personnel selection Y measure of performance,
X test score, Z gender - Additional research areas training, turnover,
performance appraisal, return on investment,
mentoring, self-efficacy, job satisfaction,
organizational commitment, and career
development, among others
11Interpreting Interactions(Z is continuous)
- Y a b1 X b2 Z b3 XZ,
- b3 2 means that a one-unit change in X (Z)
increases the slope of Y on Z (Y on X) by 2 points
12Interpreting Interactions(Z is binary, dummy
coded)
- Y a b1 X b2 Z b3 XZ,
- b3 estimated difference between the slope of Y
on X between the group coded as 1 and the group
coded as 0. - b2 estimated difference between X scores for a
member in group coded as 1 and a member in group
coded as 0 assuming the scores on Y are 0. - b1 estimated X score for members of the group
coded as 1 assuming the scores on Y are 0. - a mean score on X for members of group coded as
0.
13Pervasive Use of MMR in the Organizational
Sciences
- Recent review MMR was used in over 600 attempts
to detect moderating effects of categorical
variables in AMJ, JAP, and PP between 1977-1998
(Aguinis, Beaty, Boik, Pierce, 2005, JAP)
14Selected Research on MMR
- Aguinis (2004, Regression Analysis for
Categorical Moderators, Guilford Press) - Aguinis, Beaty, Boik, and Pierce (2005, J. of
Applied Psychology) - Aguinis, Boik, and Pierce (2001, Organizational
Research Methods) - Aguinis, Petersen, and Pierce (1999,
Organizational Research Methods) - Aguinis and Pierce (1998, Organizational Research
Methods) - Aguinis and Pierce (1998, Ed. Psychological
Measurement) - Aguinis and Stone-Romero (1997, J. of Applied
Psychology) - Aguinis, Bommer, and Pierce (1996, Ed.
Psychological Measurement) - Aguinis (1995, J. of Management)
15Methodology Monte Carlo Simulations
- Research question Does MMR do a good job at
estimating moderating effects? - Difficulty We dont know the population
- Solution Monte Carlo methodology
- Create a population
- Generate random samples
- Perform MMR analyses on samples
- Compare population versus samples
- Assess of hits and misses
16Problems with MMR
- We dont find moderators
- If we find them, they are small
- Why should we care?
- Theory Failure to find support for correct
hypotheses (derailment of theory advancement
process model misspecification) - Practice Erroneous decision making (e.g., over
and under prediction of performance,
implementation of ineffective interventions) - Ethical implications
- Legal implications
17Some Culprits for Erroneous Estimation of
Moderating Effects
- Small total sample size
- Unequal sample size across moderator-based groups
- Range restriction (i.e., truncation) in predictor
variable X - Scale coarseness
- Violation of homogeneity of error variance
assumption - Unreliability of measurement
- Artificial dichotomization/polichotomization of
continuous variables - Interactive effects
18Unequal Sample Size Across Moderator-based
Subgroups
- Applies to categorical moderators (e.g., gender,
national origin) - In many research situations, n1 ? n2
- Two studies examined this issue (Aguinis
Stone-Romero, 1997 Stone, Alliger, and Aguinis,
1994) (see also Aguinis, 1995) -
- Conclusion n1 needs to be (.3 n2) or larger to
detect medium moderating effects
19Truncation in Predictor X
- Non-random sampling
- Pervasive in field settings (systematic in
personnel selection/test validation research,
X,Y X gt x) - Aguinis and Stone-Romero (1997) (categorical
moderator) McClelland and Judd, 1993 (continuous
moderator) - Truncation has a dramatic impact on power
- N 300, medium moderating effect, power .81
- Same conditions, truncation .80, power .51
- Conclusion Even mild levels of truncation can
have a substantial detrimental effect on power
20Violation of Homogeneity of Error Variance
Assumption
- Applies to categorical moderators
- Error variance Variance in Y that remains after
predicting Y from X is equal across subgroups
(e.g., women, men) -
- Distinct from homoscedasticity assumption
21Regression of Homoscedastic Data
Total Sample Women Men
22Regression for Subgroups
Women
Men
23Artificial polichotomization of continuous
variables
- Median split and other common methods for
simplifying the data before conducting ANOVAs - Cohen (1983) showed this practice is
inappropriate - In the context of MMR, some have used a median
split procedure on continuous predictor Z and
compared correlations across groups - MMR always performs better than comparing
artificially-created subgroups (Stone-Romero
Anderson, 1994) - Conclusion Do not polichotomize truly continuous
predictors
24Interactions Among Artifacts
- Concurrent manipulation of truncation, N, n1 and
n2, and moderating effect magnitude (Aguinis
Stone-Romero, JAP, 1997) . - Results Methodological artifacts have
interactive effects on power. - Even if conditions conducive to high power are
favorable regarding one factor (e.g., N),
conditions unfavorable regarding other factors
(e.g., truncation) will lead to low power. - Conclusion Relying on a single strategy (e.g.,
increase N) to improve power will not be
successful if other methodological and
statistical artifacts
25Aguinis, Beaty, Boik, Pierce (2005, JAP)
- Q1 What is the size of observed moderating
effects of categorical variables in published
research? - Q2 What would the size of moderating effects of
categorical variables be in published research
under conditions of perfect reliability? - Q3 What is the a priori power of MMR to detect
moderating effects of categorical variables in
published research? - Q4 Do MMR tests reported in published research
have sufficient statistical power to detect
moderating effects conventionally defined as
small, medium, and large?
26Method
- Review of all articles published from 1969 to
1998 in Academy of Management Journal (AMJ),
Journal of Applied Psychology (JAP), and
Personnel Psychology (PP) - Criteria for study inclusion
- At least one MMR analysis
- The MMR analysis included a continuous criterion
Y, a continuous predictor X, and a categorical
moderator Z
27Effect Size and Power Computation
- Total of 636 MMR analyses
- Moderator sample sizes for 507 (79.72)
- Moderator group sample sizes and
predictor-criterion rs for 261 (41.04) - Effect sizes and power computation based on 261
MMR analyses for which ns and rs were available.
We used SD information when available, and
assumed homogeneity or error variance when this
information was not available
28Results (I)
- Frequency of MMR Use over Time
29Q1 Size of Observed Effects (I)
- Effect size metric
- Median f 2 .002,
- Mean (SD) .009 (.025)
- 95 CI .0089 to .0091
- 25th percentile .0004
- 75th percentile .0053
- Effect size values over time r(261) .15, p lt
.05
30Q1 Size of Observed Effects (II)
AMJ (k 6) JAP (k 236) PP (k 19)
Mean (SD) Median .040 (.047) .025 .007 (.024) .002 .017 (.025) .006
- F(2, 258) 4.97, p .008, ?2 .04
- Tukey HSD tests AMJ gt JAP and PP gt JAP
31Q1 Size of Observed Effects (III)
- F(2, 258) 8.71, p lt .001, ?2 .06
- Tukey HSD tests Other gt Ethnicity
Gender (k 63) Ethnicity (k 45) Other (k 153)
Mean (SD) Median .005 (.011) .002 .002 (.002) .001 .013 (.031) .002
32Q1 Size of Observed Effects (IV)
Personnel Selection (k 20) Other (k 241)
Mean (SD) Median .010 (.023) .001 .009 (.025) .002
- t(259) -.226, p ns
- t(259) -0.95, p ns
Work Attitudes (k 96) Other (k 165)
Mean (SD) Median .005 (.015) .002 .011 (.029) .002
33Q2 Construct-level Effects (I)
- Median f 2 .003
- Increase of .001 over median observed effect size
- Mean (SD) .017
- Increase of .008 over mean observed effect size
34Q3 Statistical Power (I)
35Q3 Statistical Power (II)
36Q4 Power to Detect Small, Medium, and Large
Effects
- Small f 2 (.02) mean power .84 72 of tests
would have a power of .80 or higher - Medium f 2 (.15) mean power .98
- Large f 2 (.35) mean power 1.0
37Some Conclusions
- We expected effect size to be small, but not so
small (i.e., median of .002) - Computation of construct-level effect sizes did
not improve things by much (i.e., median of .003) - More encouraging results
- None of the 95 CIs around the mean effect size
for the various comparisons included zero - Effect sizes have increased over time
- Given the observed sample sizes, mean power is
sufficient to detect effects .02 - 72 of studies had sufficient power to detect an
effect .02
38Some Implications
- Are theories in dozens of research domains
incorrect in hypothesizing moderators? - Are hundreds of researchers in dozens of
disparate domains wrong and population moderating
effects so small? - Could be, but.. more likely, methodological
artifacts decrease the observed effect sizes
substantially vis-à-vis their population
counterparts - More attention needs to be paid to design and
analysis issues that decrease observed effect
sizes - Conventional definitions of effect size (f 2) for
moderators should probably be revised
39The Now What Question
- Before data are collected
- Larger sample size
- More reliable measures
- Avoid truncated samples
- Use non-coarse scales (e.g., program by Aguinis,
Bommer, Pierce, 1996, Ed. Psych. Measurement) - Equalize sample size across moderator-based
subgroups - Use computer programs in the public domain to
estimate sample size needed for desired power
level - Gather information on research design trade-offs
- Easier said that done!
40Tools to Improve Moderating Effect Estimation
(Aguinis, 2004)
- Scale coarseness
- Aguinis, Bommer, and Pierce (1996, Educational
Psychological Measurement) - Homogeneity of error variance
- Aguinis, Petersen, and Pierce (1999,
Organizational Research Methods) - Power estimation and research design trade-offs
- Aguinis, Pierce, and Stone-Romero (1994,
Educational Psychological Measurement) - Aguinis and Pierce (1998, Educational
Psychological Measurement) - Aguinis, Boik, and Pierce (2001, Organizational
Research Methods)
41Assessment of Assumption Compliance
- DeShon and Alexanders (1996) 1.5 rule of thumb
- Bartletts homogeneity test
- M
- k number of sub-groups
- nk number of observations in each sub-group
- s2 sub-group variance on the criterion
- v degrees of freedom from which s2 is based
42Homogeneity is not Met... Now What?
- Use alternatives to MMR
- Alexander and colleagues' normalized-t
approximation - OR James's second-order approximation
where
43(No Transcript)
44Program ALTMMR
- Calculates
- Error variance ratio (highest if more than 2
subgroups) - Bartletts M
- Jamess J
- Alexanders A
- Uses sample descriptive data
- nk , sx , sy , rxy
- User sets p .05 or .01 (for all but Jamess
statistic)
45Program ALTMMR
- Described in detail in Aguinis (2004)
- Available at www.cudenver.edu/haguinis/ (click
on MMR icon on left side of page) - Executable on-line or locally
46Power Estimation
- Program POWER
- Aguinis, Pierce, and Stone-Romero (1994, Ed.
Psych. Measurement) - Program MMRPWR
- Aguinis and Pierce (1998, Ed. Psych.
Measurement) - Program MMRPOWER
- Aguinis, Boik, and Pierce (2001, Organizational
Research Methods)
47Program MMRPOWER
- Problems/Challenges regarding POWER and MMRPWR
programs - Based on extrapolation from simulations Range of
values is limited - Absence of factors known to affect power of MMR
(e.g., unreliability) - Theoretical approximation to power
48Program MMRPOWER
- Described in detail in Aguinis (2004)
- Available at www.cudenver.edu/haguinis/ (click
on MMR icon on left side of page) - Executable on-line or locally
49(No Transcript)
50Some Conclusions
- Observed moderating effects are very small
- MMR is a low power test for detecting effect
sizes as typically observed - Researchers are not aware of problems with MMR
- Implications for theory and practice
- User-friendly programs are available and allow
researchers to improve moderating effect
estimation - Using these tools will allow researchers to make
more informed decisions regarding the operation
of moderating effects