Title: Doing ANOVA and t-tests
1Doing ANOVA and t-tests
- LISA short course by Ciro Velasco-Cruz
2ONE SAMPLE t TEST Example In a study, 15
lobsters were randomly selected from recent
catches along a certain region of the Maine shore
line. The lobsters were weighed to the nearest
ounce, with results 26 14 18 13 22 15 24 21 29
10 12 31 19 16 21 Suppose that for research
purposes it is needed that the mean lobsters
weight equal to 15 ounces. It is known that
lobster weight is normally distributed with both
mean and standard deviation unknown.
3SAS for coding
- The data step
- data lobsters_w
- input type weigth _at__at_
- datalines
- 1 26 1 14 1 18 1 13 1 22
- 1 15 1 24 1 21 1 29 1 10
- 1 12 1 31 1 19 1 16 1 21
-
4SAS for coding
- Exploratory data analysis
- proc means datalobsters_w mean std max min
median - var weigth
- run
- proc boxplot datalobsters_w
- title'BoxPlot for one sample t-test example'
- plot (weigth)type/ cframe vligb
- cboxes dagr
- cboxfill ywh
- inset mean max min /CFILL WHITE
- header "Summary"
- CTEXT RED
- run
5SAS OUTPUT
The SAS System
The MEANS Procedure
Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth
Mean Std Dev Maximum Minimum Median
19.4000000 6.2655521 31.0000000 10.0000000 19.0000000
6SAS OUTPUT
7SAS coding
- Data analysis
- proc ttest datalobsters_w h015
- title 'One sample t test example'
- var weigth
- run
8SAS OUTPUT
One sample t test example
The TTEST Procedure
Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Variable N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
weigth 15 15.93 19.4 22.87 4.5872 6.2656 9.8814 1.6178 10 31
T-Tests T-Tests T-Tests T-Tests
Variable DF t Value Pr gt t
weigth 14 2.72 0.0166
Conclusion Since the p-value is lt0.05, we reject
the Null Hypothesis, that the mean15, at 5 of
level of significance.
9Two Sample t-test example
An animal scientist is interested in comparing
two different topical treatments (A, B) against
osteoarthritis in the leg joints of horses. Seven
horses with the illness are available at the
animal clinic. For each horse it is randomly
determined which of the front legs receives
treatment A and which treatment B. After four
weeks of treat., the horses mobility is
measured. Assuming that they were two
independent samples, we can perform our tests.
10SAS data step
- data horses
- input trt horse mobility _at__at_
- cards
- 1 1 48.2 1 2 44.6 1 3 49.7 1 4 40.5
- 1 5 54.6 1 6 47.1 1 7 46.8 2 1 41.5
- 2 2 40.1 2 3 44.0 2 4 41.2 2 5 49.8
- 2 6 41.7 2 7 51.4
11SAS E.D.A.
- proc means datahorses mean std max min median
- class trt
- var mobility
- run
- proc boxplot datahorses
- title'BoxPlot for two sample t-test example'
- plot (mobility)trt/ cframe vligb
- cboxes dagr
- cboxfill ywh
- insetgroup mean max min q1 q2 q3/header
'Summary by Treatme ctext red - run
12SAS OUTPUT
The MEANS Procedure
Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility
trt N Obs Mean Std Dev Maximum Minimum Median
1 7 47.3571429 4.3523393 54.6000000 40.5000000 47.1000000
2 7 44.2428571 4.5199031 51.4000000 40.1000000 41.7000000
13SAS OUTPUT
14SAS t test
proc ttest datahorses title 'Two sample t test
example' class trt var mobility run
15SAS OUTPUT
Two sample t test example
The TTEST Procedure
Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Variable trt N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
mobility 1 7 43.332 47.357 51.382 2.8046 4.3523 9.5841 1.645 40.5 54.6
mobility 2 7 40.063 44.243 48.423 2.9126 4.5199 9.9531 1.7084 40.1 51.4
mobility Diff (1-2) -2.053 3.1143 8.2816 3.1816 4.4369 7.3242 2.3716
T-Tests T-Tests T-Tests T-Tests T-Tests T-Tests
Variable Method Variances DF t Value Pr gt t
mobility Pooled Equal 12 1.31 0.2137
mobility Satterthwaite Unequal 12 1.31 0.2137
Equality of Variances Equality of Variances Equality of Variances Equality of Variances Equality of Variances Equality of Variances
Variable Method Num DF Den DF F Value Pr gt F
mobility Folded F 6 6 1.08 0.9293
16Conclusion
- About Variance Since the p-value is larger than
5, we conclude that the variances are indeed
equal. - About means Since p-value for this test is
larger to 5 too, we conclude that the means are
equal.
17Paired t test example
- Lets consider the last example. Since treatment
A and B were both measured on the same horse.
Measurements of mobility are not independent
within horses. Then the right way to analyze the
data is by Paired t test. - Idea we look at the difference between the
response from trts A and B - DiYiA-YiB
18SAS paired test
proc ttest datanewhorses paired
MobilityAMobilityB run
The SAS System
The TTEST Procedure
Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Difference N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
MobilityA - MobilityB 7 -0.729 3.1143 6.9571 2.6775 4.1551 9.1498 1.5705 -4.6 6.7
T-Tests T-Tests T-Tests T-Tests
Difference DF t Value Pr gt t
MobilityA - MobilityB 6 1.98 0.0946
19But why is it happeing?
20One Way Anova
An experiment was conducted to study the growth
of plant tissue in the presence of hormone
solutions containing various growth inhibiting
substances. For each solution, 10 independent
tissues cultures were prepared and the growth of
the plant tissue was recorded in mm. This
experiment has One factor and 5 levels. Each has
10 replications.
21SAS data step
data peasection input trtmnt growth _at__at_ label
trtmnt 1'Control' 2'Sol.1' 3'Sol.2'
4'Mixture' 5'Sol.3' datalines 1 7.84 1 8.69
1 8.11 1 8.35 1 7.74 1 7.69 1 7.98 1 7.64 1 8.57
1 8.32 2 6.78 2 6.69 2 6.95 2 6.64 2 6.41 2 6.69 2
6.72 2 6.57 2 6.67 2 7.07 3 6.79 3 6.79 3 6.79 3
6.61 3 6.43 3 6.69 3 6.57 3 6.49 3 7.05 3 6.72 4 6
.64 4 6.57 4 6.78 4 6.48 4 6.54 4 6.36 4 6.67 4 6.
26 4 6.67 4 6.68 5 7.31 5 7.65 5 7.26 5 7.39 5 6.9
8 5 7.46 5 7.32 5 7.13 5 7.07 5 7.25
22SAS coding
proc boxplot datapeasection title'BoxPlot for
one-way ANOVA example' plot growthtrtmnt/
cframe vligb
cboxes dagr cboxfill
ywh insetgroup mean stddev q1 q2 q3/header
'Summary by Treatment' ctext red run
23SAS output
24SAS glm anyway
proc glm datapeasection class trtmnt model
growthtrtmnt lsmeans trtmnt /pdiff adjusttukey
contrast 'our first contrast with contrast'
trtmnt -1 0-1 0 2 estimate 'our first contrast
with estimate' trtmnt -1 0-1 0 2 output
outresiduals pyhat rres run
25SAS output
The GLM Procedure
Dependent Variable growth
Source DF Sum of Squares Mean Square F Value Pr gt F
Model 4 16.11827200 4.02956800 74.32 lt.0001
Error 45 2.43972000 0.05421600
Corrected Total 49 18.55799200
Source DF Type I SS Mean Square F Value Pr gt F
trtmnt 4 16.11827200 4.02956800 74.32 lt.0001
Source DF Type III SS Mean Square F Value Pr gt F
trtmnt 4 16.11827200 4.02956800 74.32 lt.0001
26SAS output
Least Squares Means
Adjustment for Multiple Comparisons Tukey
trtmnt growth LSMEAN LSMEAN Number
1 8.09300000 1
2 6.71900000 2
3 6.69300000 3
4 6.56500000 4
5 7.28200000 5
Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth
i/j 1 2 3 4 5
1 lt.0001 lt.0001 lt.0001 lt.0001
2 lt.0001 0.9991 0.5812 lt.0001
3 lt.0001 0.9991 0.7346 lt.0001
4 lt.0001 0.5812 0.7346 lt.0001
5 lt.0001 lt.0001 lt.0001 lt.0001
Contrast DF Contrast SS Mean Square F Value Pr gt F
our first contrast with contrast 1 0.08214000 0.08214000 1.52 0.2248
Parameter Estimate Standard Error t Value Pr gt t
our first contrast with estimate -0.22200000 0.18035964 -1.23 0.2248
Note that -(8.0936.693)27.282 -.222
27Remedies
- Transform the response
- Log(var(y))Coqlog(mean)
- g(y)y(1-q/2) if q different to 2
- g(y)log(y) q2 and ygt0
- g(y)log(yshift) q2 if some y lt0
- Use analysis for Gaussian data with unequal
variances Satterthwaites approximation or Welch
(for one-way anova)
28SAS E.D.A.
proc means datapeasection noprint var
growth by trtmnt output outvarmeans var
vargro meanmeangro run data varmeansset
varmeans vargrolog(vargro)meangrolog(meangro)
proc gplot datavarmeans plot
vargromeangro run proc reg datavarmeans model
vargromeangro run
29SAS output
30SAS regression
Descriptive statistics by treatmentfor pea section growth data
The REG Procedure
Model MODEL1
Dependent Variable vargro
Root MSE 0.24863 R-Square 0.8990
Dependent Mean -3.14165 Adj R-Sq 0.8654
Coeff Var -7.91405
Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates
Variable DF ParameterEstimate StandardError t Value Pr gt t
Intercept 1 -17.58795 2.79721 -6.29 0.0081
meangro 1 7.39762 1.43125 5.17 0.0141
31SAS trans. And analysis code
data trans set peasection ytgrowth-2.69881
proc glm datatrans class trtmnt model
yttrtmnt means trtmnt /hovtestlevene(typesquar
e) output outresi rres run proc boxplot
dataresi title'BoxPlot for one-way ANOVA
example' plot restrtmnt/ cframe vligb
cboxes dagr
cboxfill ywh insetgroup mean
stddev q1 q2 q3/header 'Summary by
Treatment' ctext red run
32SAS output
The GLM Procedure
Dependent Variable yt
Source DF Sum of Squares Mean Square F Value Pr gt F
Model 4 0.00004922 0.00001231 72.52 lt.0001
Error 45 0.00000764 0.00000017
Corrected Total 49 0.00005686
Source DF Type I SS Mean Square F Value Pr gt F
trtmnt 4 0.00004922 0.00001231 72.52 lt.0001
Source DF Type III SS Mean Square F Value Pr gt F
trtmnt 4 0.00004922 0.00001231 72.52 lt.0001
Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means
Source DF Sum of Squares Mean Square F Value Pr gt F
trtmnt 4 3.2E-14 8E-15 0.21 0.9297
Error 45 1.69E-12 3.75E-14
33SAS output
34Two-way ANOVA fixed factors
An educational researcher was interested in the
factors noise and solitude as they affect study
conditions. Each subject in an experiment was
asked to study an essay on American history for
15 minutes and then was tested on a 25 item quiz,
the number of correct items being the score. The
subjects differed, however, in the conditions
under which they were allowed to study Factor
Solitude with 2 levels Alone and not alone
(w/stooge) Factor Noise with 3 levels no noise,
soft background music, and loud rock and roll
music. There are 3 replication of each treatment
combination.
35SAS data step
data QuizScores input Solitude Noise Score
_at__at_ datalines Alone None 10 Alone None
6 Alone None 14 Alone Soft 21 Alone
Soft 21 Alone Soft 16 Alone Loud 5
Alone Loud 15 Alone Loud 7 Stooge None
6 Stooge None 11 Stooge None 1 Stooge Soft
6 Stooge Soft 17 Stooge Soft 13 Stooge
Loud 1 Stooge Loud 2 Stooge Loud 6
36SAS E.D.A
proc boxplot dataquizscores title'BoxPlot for
two-way ANOVA example' plot scorenoise(solitude)
/ cframe vligb
cboxes dagr cboxfill
ywh inset mean max min/postm header'The
overall summary' insetgroup mean stddev q1 q2
q3/header 'Summary by Treatment' ctext
red run proc means dataquizscores
noprint by solitude noise var score output
outmeanquizscore meanmeanquiz run symbol
ij symbol2 ij proc gplot datameanquizscore p
lot meanquizNoisesolitude plot
meanquizsolitudenoise run
37SAS output
38SAS output
39SAS output
40SAS output
proc glm dataquizscores class solitude
noise model scoresolitudenoise run
The GLM Procedure
Dependent Variable Score
Source DF Sum of Squares Mean Square F Value Pr gt F
Model 5 471.1111111 94.2222222 4.90 0.0113
Error 12 230.6666667 19.2222222
Corrected Total 17 701.7777778
Source DF Type I SS Mean Square F Value Pr gt F
Solitude 1 150.2222222 150.2222222 7.82 0.0162
Noise 2 312.4444444 156.2222222 8.13 0.0059
SolitudeNoise 2 8.4444444 4.2222222 0.22 0.8060
Source DF Type III SS Mean Square F Value Pr gt F
Solitude 1 150.2222222 150.2222222 7.82 0.0162
Noise 2 312.4444444 156.2222222 8.13 0.0059
SolitudeNoise 2 8.4444444 4.2222222 0.22 0.8060
41Slices
- On this example interaction was not significant.
But what we should do if it were? - There are a way to come out with this problem
SLICES. - Since main effects could be either significant or
not at the presence of interaction, we need to
test how they change at a given level of a
treatment. - In SAS, we use the following statement to obtain
the slices - lsmeans interaction/slicetreatment
42SAS two way ANOVA random factor
An experiment was performed to examine the effect
of time Aging on the strength of cement. From a
large number of mixes three cement mixes were
randomly selected and six specimens were produced
form each mix. After two days three randomly
selected specimens from each mix were tested for
strength with a load test and the other three
specimens were tested after seven days. This is
a two-way classification with factor Cement Mix
(three levels) and Time (2 levels) The levels of
factor Time were predetermined. The three levels
of cement mixes were randomly selected from a
large number of mixes, thus Cement Mix factor is
Random.
43SAS data input
data YieldLoads input Aging Mix Load
_at__at_ datalines 2-Days 1 574 2-Days 1 564 2-Days 1
550 2-Days 2 524 2-Days 2 573 2-Days 2 551 2-Days
3 576 2-Days 3 540 2-Days 3 592 7-Days 1 1092
7-Days 1 1086 7-Days 1 1065 7-Days 2 1028 7-Days
2 1073 7-Days 2 998 7-Days 3 1066 7-Days 3 1045
7-Days 3 1055
44SAS code
proc glm datayieldloads class aging mix model
load aging mix agingmix random mix agingmix
/test run OR USING proc mixed
datayieldloads class aging mix model load
aging random mix mixaging run
45SAS output
Source Type III Expected Mean Square
Aging Var(Error) 3 Var(AgingMix) Q(Aging)
Mix Var(Error) 3 Var(AgingMix) 6 Var(Mix)
AgingMix Var(Error) 3 Var(AgingMix)
The GLM Procedure
Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable Load
Source DF Type III SS Mean Square F Value Pr gt F
Aging 1 1107072 1107072 1965.80 0.0005
Mix 2 2957.444444 1478.722222 2.63 0.2758
Error MS(AgingMix) 2 1126.333333 563.166667
Source DF Type III SS Mean Square F Value Pr gt F
AgingMix 2 1126.333333 563.166667 1.06 0.3774
Error MS(Error) 12 6386.666667 532.222222
46Question
- Option 1. Go back and complete SLICE part
- or
- Option 2. Go ahead to the MANOVA
- ?
47MANOVA example
A researcher randomly assigns 33 subjects to one
of three groups G1 receives technical dietary
information interactively from an on-line
website. G2 receives the same information in
from a nurse practitioner G3 receives the
information from a video tape made by the same
nurse practitioner The researcher looks at three
different ratings of the presentation,
difficulty, useful and importance, to determine
if there is a difference in the modes of
presentation. In particular, the researcher is
interested in whether the interactive website is
superior because that is the most cost-effective
way of delivering the information.
48SAS code
proc glm datamanovaex class group model
useful difficulty importance group contrast
'1 vs 23' group 2 -1 -1 contrast '2 vs 3'
group 0 1 -1 manova h_all_ run Note go
to the manova.sas example