Title: Analysis of Variance
1Analysis of Variance
2One-Factor (Oneway) ANOVA
- ANOVA represents a set of techniques designed to
investigate relationships by testing group
differences. If group differences are found, we
can say that there is a relationship between the
IV and the DV. - Assume you have some exposure to ANOVA previously
- Path model
- Surprise! What does the path model look like?
3Doing Normal Statistics
One way ANOVA (Dummy coding, IV is nm)
x1
Y (m)
x2
x3
4(No Transcript)
5Anova with 2 Groups
- Anova can be used to test group differences among
2 or more groups. - The Independent t test and the 2-group Anova are
equivalent procedures. F t2 - The 7-step procedure is the same except for use
of a different test statistic (F) and
corresponding different rejection rule and
critical value.
6Analyses
- Runs for independent t, 2-group anova, and a
surprise. - Compare outputs
- Path model for 2-group situation
- Anoval Model
- X u A(i) e
- Another familiar model
- Y B0 B1X1 e
- (Compare models and terms. Substitute values for
x1.) - Overview conclusion
- Given indicator coding for grpi where 0 Alc and
1 Placebo, how do you interpret the intercept
and slope of the regression of digits on grpi? -
7Data list Free/ Grp digits. Begin
datat 1 5 1 6 1 7 1 4 1 4 1 3 1 5 1 7 1 7
1 6 2 7 2 7 2 8 2 6 2 7 2 7 2 8 2 5 2 6
2 7 End data. T-TEST GROUPS Grp(1 2)
/VARIABLES digits /CRITERIA CI(.90)
. EXAMINE VARIABLESdigits BY Grp
/PLOTBOXPLOT/STATISTICSNONE/NOTOTAL. Now
run the 2 sample problem using oneway
anova. UNIANOVA digits BY Grp
/METHOD SSTYPE(3) /INTERCEPT INCLUDE
/PLOT PROFILE( Grp ) /CRITERIA ALPHA(.05)
/DESIGN Grp . Recode Grp as indicator
varialbe. Recode grp (10) (21) into
grpi . / Warning recoding a variable into the
same variable will destroy original. list
/casesto 12. Execute. scatterplot
. GRAPH /SCATTERPLOT(BIVAR)grpi WITH
digits /MISSINGLISTWISE . GRAPH
/SCATTERPLOT(BIVAR)grpi WITH digits
/MISSINGLISTWISE . GRAPH /BAR(SIMPLE)MEAN(digi
ts) BY grpi . REGRESSION /DESCRIPTIVES MEAN
STDDEV CORR SIG N /STATISTICS COEFF OUTS R
ANOVA /DEPENDENT digits /METHODENTER Grpi
/SCATTERPLOT(SDRESID ,ZPRED ) .
8Regression and ANOVA
- Anova is really a special case of Multiple
Regression (The underlying matrix-algebra based
computational formulas are identical.) - Where the grouping variables are identified using
special dummy variable coding schemes - Two groups, as we saw, uses one variable coded 0
or 1 depending on the group - Three groups require 2 variables
9Dummy Variables(and the Dummies who use them)
- White is the reference category
- By use of indicator coding (shown) and effect
coding (not shown), the levels of one or more Ivs
can be included as predictors in a regression
model - By getting R2s for full and reduced models, one
can get all the needed sums of squares to get the
terms in an ANOVA Source table - In terms of underlying models, regression and
ANOVA are the same - All are subsumed under THE GENERAL LINEAR MODEL
- Which is actually a big multivariate multiple
regression model - ANOVA techniques and formulas came about because
people like Fisher were thinking in terms of
agricultural treatment to plots and using
regression models was not convenient, so special
ANOVA methods were developed - There are some who think that the regression
approach is better and have made it easier!! - One is guess who?
10(No Transcript)
11- This Week's Citation Classic
- http//www.garfield.library.upenn.edu/classics1982
/A1982PB23900001.pdf - The current status
- http//books.google.com/books?idfuq94a8C0ioCprin
tsecfrontcoverdqmultipleregressioncohenclien
tfirefox-aPPP1,M1 - Keeping in mind that Anova and multiple
regression are part of the same analytic model,
we will revert to the traditional (and familiar)
Anova methods - Well see the connection clearly again when we
get to ANCOVA.
12More Anova
- One factor with 3 or more levels with post hoc
tests - Two- or more factors, including interactions
- Repeated measures Anova
- Mixed Anova Between-subjects and
within-subjects factors - Analysis of Covariance
- Weve already done this (almost).
- Add a covariate to Anova
- Actually just adding a continuous regression
predictor
13One-Way Anova
- Green, L25 Example concerning change from
baseline in number of days of cold symptoms for
three vitamin C groups 1-Placebo, 2-LoVC, 3-Hi
dose VC. - Treatment structure box
- Path diagram
- F test of omnibus H0 of no group differences
- Post hoc tests to find out which group has the
most reduction in cold symptoms - Regression approach
- SAS output for comparison
14Data list free/ group diff. Begin
data 1 12 1 -2 1 9 1 3 1 3 1 0 1 3 1 2 1 4 1 1 2 -
2 2 -3 2 3 2 -2 2 0 2 -4 2 -3 2 5 2 -9 2 -6 3 6 3
-7 3 -6 3 -6 3 -6 3 -4 3 -2 3 -6 3 6 3 5 End
data. execute. Graph the data with
boxplot. EXAMINE VARIABLESdiff BY group
/PLOTBOXPLOT/STATISTICSNONE/NOTOTAL. UNIANOVA
diff BY group /METHOD SSTYPE(3)
/INTERCEPT INCLUDE /POSTHOC group ( TUKEY
LSD QREGW ) /EMMEANS TABLES(group) /PRINT
DESCRIPTIVE ETASQ HOMOGENEITY /CRITERIA
ALPHA(.05) /DESIGN group . Run as
regression using indicator vars. Compute LoVC
0. If (group 2) LoVC 1. Compute HiVC 0. if
(group 3) HiVC 1. Execute. list. REGRESSION
/MISSING LISTWISE /STATISTICS COEFF OUTS R
ANOVA ZPP /CRITERIAPIN(.05) POUT(.10)
/NOORIGIN /DEPENDENT diff /METHODENTER LoVC
HiVC /SCATTERPLOT(SDRESID ,ZPRED )
/RESIDUALS HIST(ZRESID) .
15DATA VitC INPUT group diff DATALINES 1 12 1
-2 1 9 1 3 1 3 1 0 1 3 1 2 1 4 1 1 2 -2 2 -3 2 3 2
-2 2 0 2 -4 2 -3 2 5 2 -9 2 -6 3 6 3 -7 3 -6 3 -6
3 -6 3 -4 3 -2 3 -6 3 6 3 5 Proc print data
vitc (obs5) Run Proc boxplot data
vitc plot diffgroup run Proc Univariate
data vitc normal plot / May be more than you
want./ var diff class group run Proc GLM
data vitc /Green L25 vit c data./ class
group model diff group / ss3 means group
/ hovtest lsd tukey regwq run quit
162-Factor Anova
- Two IVs giving two main effects and interaction.
- Interaction is usually the most interesting
result - Problems/complexity come in testing simple
effects and groups to explain the interaction. - Always use UNIANOVA in SPSS and GLM in SAS
because these routines handle unbalanced designs
correctly. - Brief coverage
17Green, L26Gender, Note Taking, Freshman GPA
- Note taking instructions are given daily at start
of spring semester with random assignment to
Method 1, Method2, and Control groups. DV is
change from fall to spring GPA - Treatment structure box and path model
- Research Questions
- Main effects
- Are there gender differences in freshman gpa
improvement, overall (regardless of treatment)? - Are there differences in Method, regardless of
gender? - Interaction? Are the methods different in
effectiveness for males vs. females?
18Doing Normal Statistics
Two- way ANOVA (Dummy coding, all Ivs nm)
x1
Y (m)
x2
?
x1 x2
19Results
- J\PSYCH\MARTY\4123\GreenSalkind5Dat\Lesson
26\Lesson 26 Data File 2.sav - J\PSYCH\MARTY\4123\AV2FGL26D2
- Data arrangement
- A graphic would be nice. 2x3 box for means and
marginal means - Comparison of outputs from SAS and SPSS
- Limited coverage
- Interaction testing deserves more time than we
have - Methods of probing interaction by testing
appropriate cell differences are illustrated in
both analyses. - General issue is Simple effect tests and
follow-ups - These are covered in QM2
- SAS and SPSS programs follow
20Data list free/gender method gpaimpr. Begin
data 1 1 .25 2 2 1.00 1 3 .10 1 1 .20 2 2 .50 1 3
.15 1 1 .30 2 2 .80 1 3 .30 1 1 .30 2 2 .60 1 3 .2
0 1 1 .50 2 2 .60 1 3 .10 1 1 .40 2 2 .50 1 3 .20
1 1 .80 2 2 .80 1 3 .30 1 1 .50 2 2 .60 1 3 .40 1
1 .10 2 2 .40 1 3 .00 1 1 .00 2 2 .60 1 3 -.10 2 1
.10 1 2 .30 2 3 -.10 2 1 .00 1 2 .20 2 3 .00 2 1
.00 1 2 .25 2 3 .10 2 1 .40 1 2 .00 2 3 .40 2 1 .5
0 1 2 .60 2 3 .25 2 1 .20 1 2 .50 2 3 .00 2 1 .00
1 2 .20 2 3 .10 2 1 .00 1 2 .10 2 3 .10 2 1 .30 1
2 .50 2 3 .20 2 1 .20 1 2 .40 2 3 .00 End
data. EXAMINE VARIABLESgpaimpr BY method BY
gender /PLOTBOXPLOT/STATISTICSNONE
/NOTOTAL. GRAPH /LINE(MULTIPLE)MEAN(gpaimpr) BY
method BY gender . UNIANOVA gpaimpr BY gender
method /METHOD SSTYPE(3) /INTERCEPT
INCLUDE /POSTHOC method ( TUKEY ) /PLOT
PROFILE( gendermethod ) /EMMEANS
TABLES(gender) /EMMEANS TABLES(method)
/EMMEANS TABLES(gendermethod) compare (gender)
ADJ(sidak) /EMMEANS TABLES(gendermethod)
compare (method) ADJ(sidak) /PRINT ETASQ
OPOWER HOMOGENEITY /CRITERIA ALPHA(.05)
/DESIGN gender method gendermethod .
21DATA L262 INPUT gender method gpaimpr
condition catx('-',gender,method)/Creates
condition variable coded for all
groups./ DATALINES 1 1 .25 2 2 1.00 1 3 .10 1 1
.20 2 2 .50 1 3 .15 1 1 .30 2 2 .80 1 3 .30 1 1 .3
0 2 2 .60 1 3 .20 1 1 .50 2 2 .60 1 3 .10 1 1 .40
2 2 .50 1 3 .20 1 1 .80 2 2 .80 1 3 .30 1 1 .50 2
2 .60 1 3 .40 1 1 .10 2 2 .40 1 3 .00 1 1 .00 2 2
.60 1 3 -.10 2 1 .10 1 2 .30 2 3 -.10 2 1 .00 1 2
.20 2 3 .00 2 1 .00 1 2 .25 2 3 .10 2 1 .40 1 2 .0
0 2 3 .40 2 1 .50 1 2 .60 2 3 .25 2 1 .20 1 2 .50
2 3 .00 2 1 .00 1 2 .20 2 3 .10 2 1 .00 1 2 .10 2
3 .10 2 1 .30 1 2 .50 2 3 .20 2 1 .20 1 2 .40 2 3
.00 Proc print data l262 (obs5) Run Proc
boxplot data L262 / What's going on?/ plot
gpaimprcondition run Proc sort by
condition run Proc print data l262
(obs30) Run Proc boxplot data L262 plot
gpaimprcondition run Proc GLM data L262
/Green L26 D2 GPA Improvement data./ Title
"Green L26 D2 GPA Improvement" class gender
method model gpaimpr gender method / ss3
lsmeans gender method / PDIFF ADJUSTSIDAK
/Must use GLM and LSMeans for unbalanced
data!!/ /Watch out. This will give all
possible pairwise comparisons. What's the
problem? / run quit Plot Interactions. Based
on CS, p. 216-217. First, get the needed means
out, then plot them Proc Means Data L262 Nway
noprint/NWay restricts output to just the cell
means./ Class gender method Var
gpaimpr Output outMeans meanM_GPAIMP run SYM
BOL1 VSQUARE COLORBLACK IJOIN SYMBOL2
VCIRCLE COLORbLACK IJOIN PROC GPLOT
DATAMeans TITLE "INTERACTION PLOT" PLOT
M_GpaIMp METHODGENDER Plot M_GpaImp
GenderMethod RUN
22Analysis of Covarinace
- Extension of Anova model by adding a control
variable to the model method of statistical
control - Conceptually similar to partial correlation in
the sense that Anova is conducted on the IVS and
DV after removing the part of the relationship
predicted by the control variable (Think about
residuals) - Green, L27
- Revisit vitamin C example except now have
PREDAYS, a measure of base rate of cold
symptoms the first year. - Vitamin C treatment program applied during second
year, main DV is DAYS, the number of days with
cold symptoms the second year.
23Doing Normal Statistics
ANCOVA
X (IV)
Y (DV)
Z cov