Doing ANOVA and t-tests - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Doing ANOVA and t-tests

Description:

In a study, 15 lobsters were randomly selected from recent catches along a ... inset mean max min /CFILL = WHITE. header = 'Summary' CTEXT = RED; run; SAS OUTPUT ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 49
Provided by: ericv1
Category:
Tags: anova | doing | inset | tests

less

Transcript and Presenter's Notes

Title: Doing ANOVA and t-tests


1
Doing ANOVA and t-tests
  • LISA short course by Ciro Velasco-Cruz

2
ONE SAMPLE t TEST Example In a study, 15
lobsters were randomly selected from recent
catches along a certain region of the Maine shore
line. The lobsters were weighed to the nearest
ounce, with results 26 14 18 13 22 15 24 21 29
10 12 31 19 16 21 Suppose that for research
purposes it is needed that the mean lobsters
weight equal to 15 ounces. It is known that
lobster weight is normally distributed with both
mean and standard deviation unknown.
3
SAS for coding
  • The data step
  • data lobsters_w
  • input type weigth _at__at_
  • datalines
  • 1 26 1 14 1 18 1 13 1 22
  • 1 15 1 24 1 21 1 29 1 10
  • 1 12 1 31 1 19 1 16 1 21

4
SAS for coding
  • Exploratory data analysis
  • proc means datalobsters_w mean std max min
    median
  • var weigth
  • run
  • proc boxplot datalobsters_w
  • title'BoxPlot for one sample t-test example'
  • plot (weigth)type/ cframe vligb
  • cboxes dagr
  • cboxfill ywh
  • inset mean max min /CFILL WHITE
  • header "Summary"
  • CTEXT RED
  • run

5
SAS OUTPUT
The SAS System

The MEANS Procedure

Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth Analysis Variable weigth
Mean Std Dev Maximum Minimum Median
19.4000000 6.2655521 31.0000000 10.0000000 19.0000000
 
6
SAS OUTPUT
7
SAS coding
  • Data analysis
  • proc ttest datalobsters_w h015
  • title 'One sample t test example'
  • var weigth
  • run

8
SAS OUTPUT

One sample t test example

The TTEST Procedure
Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Variable N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
weigth 15 15.93 19.4 22.87 4.5872 6.2656 9.8814 1.6178 10 31
 
T-Tests T-Tests T-Tests T-Tests
Variable DF t Value Pr gt t
weigth 14 2.72 0.0166
Conclusion Since the p-value is lt0.05, we reject
the Null Hypothesis, that the mean15, at 5 of
level of significance.
9
Two Sample t-test example
An animal scientist is interested in comparing
two different topical treatments (A, B) against
osteoarthritis in the leg joints of horses. Seven
horses with the illness are available at the
animal clinic. For each horse it is randomly
determined which of the front legs receives
treatment A and which treatment B. After four
weeks of treat., the horses mobility is
measured. Assuming that they were two
independent samples, we can perform our tests.
10
SAS data step
  • data horses
  • input trt horse mobility _at__at_
  • cards
  • 1 1 48.2 1 2 44.6 1 3 49.7 1 4 40.5
  • 1 5 54.6 1 6 47.1 1 7 46.8 2 1 41.5
  • 2 2 40.1 2 3 44.0 2 4 41.2 2 5 49.8
  • 2 6 41.7 2 7 51.4

11
SAS E.D.A.
  • proc means datahorses mean std max min median
  • class trt
  • var mobility
  • run
  • proc boxplot datahorses
  • title'BoxPlot for two sample t-test example'
  • plot (mobility)trt/ cframe vligb
  • cboxes dagr
  • cboxfill ywh
  • insetgroup mean max min q1 q2 q3/header
    'Summary by Treatme ctext red
  • run

12
SAS OUTPUT
The MEANS Procedure

Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility Analysis Variable mobility
trt N Obs Mean Std Dev Maximum Minimum Median
1 7 47.3571429 4.3523393 54.6000000 40.5000000 47.1000000
2 7 44.2428571 4.5199031 51.4000000 40.1000000 41.7000000

13
SAS OUTPUT
14
SAS t test
proc ttest datahorses title 'Two sample t test
example' class trt var mobility run
15
SAS OUTPUT
Two sample t test example

The TTEST Procedure

Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Variable trt N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
mobility 1 7 43.332 47.357 51.382 2.8046 4.3523 9.5841 1.645 40.5 54.6
mobility 2 7 40.063 44.243 48.423 2.9126 4.5199 9.9531 1.7084 40.1 51.4
mobility Diff (1-2)   -2.053 3.1143 8.2816 3.1816 4.4369 7.3242 2.3716    
 
T-Tests T-Tests T-Tests T-Tests T-Tests T-Tests
Variable Method Variances DF t Value Pr gt t
mobility Pooled Equal 12 1.31 0.2137
mobility Satterthwaite Unequal 12 1.31 0.2137
 
Equality of Variances Equality of Variances Equality of Variances Equality of Variances Equality of Variances Equality of Variances
Variable Method Num DF Den DF F Value Pr gt F
mobility Folded F 6 6 1.08 0.9293

16
Conclusion
  • About Variance Since the p-value is larger than
    5, we conclude that the variances are indeed
    equal.
  • About means Since p-value for this test is
    larger to 5 too, we conclude that the means are
    equal.

17
Paired t test example
  • Lets consider the last example. Since treatment
    A and B were both measured on the same horse.
    Measurements of mobility are not independent
    within horses. Then the right way to analyze the
    data is by Paired t test.
  • Idea we look at the difference between the
    response from trts A and B
  • DiYiA-YiB

18
SAS paired test
proc ttest datanewhorses paired
MobilityAMobilityB run


The SAS System
The TTEST Procedure
 
Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics
Difference N Lower CLMean Mean Upper CLMean Lower CLStd Dev Std Dev Upper CLStd Dev Std Err Minimum Maximum
MobilityA - MobilityB 7 -0.729 3.1143 6.9571 2.6775 4.1551 9.1498 1.5705 -4.6 6.7

T-Tests T-Tests T-Tests T-Tests
Difference DF t Value Pr gt t
MobilityA - MobilityB 6 1.98 0.0946
19
But why is it happeing?
20
One Way Anova
An experiment was conducted to study the growth
of plant tissue in the presence of hormone
solutions containing various growth inhibiting
substances. For each solution, 10 independent
tissues cultures were prepared and the growth of
the plant tissue was recorded in mm. This
experiment has One factor and 5 levels. Each has
10 replications.
21
SAS data step
data peasection input trtmnt growth _at__at_ label
trtmnt 1'Control' 2'Sol.1' 3'Sol.2'
4'Mixture' 5'Sol.3' datalines 1 7.84 1 8.69
1 8.11 1 8.35 1 7.74 1 7.69 1 7.98 1 7.64 1 8.57
1 8.32 2 6.78 2 6.69 2 6.95 2 6.64 2 6.41 2 6.69 2
6.72 2 6.57 2 6.67 2 7.07 3 6.79 3 6.79 3 6.79 3
6.61 3 6.43 3 6.69 3 6.57 3 6.49 3 7.05 3 6.72 4 6
.64 4 6.57 4 6.78 4 6.48 4 6.54 4 6.36 4 6.67 4 6.
26 4 6.67 4 6.68 5 7.31 5 7.65 5 7.26 5 7.39 5 6.9
8 5 7.46 5 7.32 5 7.13 5 7.07 5 7.25
22
SAS coding
proc boxplot datapeasection title'BoxPlot for
one-way ANOVA example' plot growthtrtmnt/
cframe vligb
cboxes dagr cboxfill
ywh insetgroup mean stddev q1 q2 q3/header
'Summary by Treatment' ctext red run
23
SAS output
24
SAS glm anyway
proc glm datapeasection class trtmnt model
growthtrtmnt lsmeans trtmnt /pdiff adjusttukey
contrast 'our first contrast with contrast'
trtmnt -1 0-1 0 2 estimate 'our first contrast
with estimate' trtmnt -1 0-1 0 2 output
outresiduals pyhat rres run
25
SAS output
The GLM Procedure
 
Dependent Variable growth

Source DF Sum of Squares Mean Square F Value Pr gt F
Model 4 16.11827200 4.02956800 74.32 lt.0001
Error 45 2.43972000 0.05421600    
Corrected Total 49 18.55799200      
 
Source DF Type I SS Mean Square F Value Pr gt F
trtmnt 4 16.11827200 4.02956800 74.32 lt.0001
 
Source DF Type III SS Mean Square F Value Pr gt F
trtmnt 4 16.11827200 4.02956800 74.32 lt.0001
 
 
26
SAS output

Least Squares Means
Adjustment for Multiple Comparisons Tukey

 
trtmnt growth LSMEAN LSMEAN Number
1 8.09300000 1
2 6.71900000 2
3 6.69300000 3
4 6.56500000 4
5 7.28200000 5
Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth Least Squares Means for effect trtmntPr gt t for H0 LSMean(i)LSMean(j)Dependent Variable growth
i/j 1 2 3 4 5
1   lt.0001 lt.0001 lt.0001 lt.0001
2 lt.0001   0.9991 0.5812 lt.0001
3 lt.0001 0.9991   0.7346 lt.0001
4 lt.0001 0.5812 0.7346   lt.0001
5 lt.0001 lt.0001 lt.0001 lt.0001  
Contrast DF Contrast SS Mean Square F Value Pr gt F
our first contrast with contrast 1 0.08214000 0.08214000 1.52 0.2248

Parameter Estimate Standard Error t Value Pr gt t
our first contrast with estimate -0.22200000 0.18035964 -1.23 0.2248
Note that -(8.0936.693)27.282 -.222

 
 
27
Remedies
  • Transform the response
  • Log(var(y))Coqlog(mean)
  • g(y)y(1-q/2) if q different to 2
  • g(y)log(y) q2 and ygt0
  • g(y)log(yshift) q2 if some y lt0
  • Use analysis for Gaussian data with unequal
    variances Satterthwaites approximation or Welch
    (for one-way anova)

28
SAS E.D.A.
proc means datapeasection noprint var
growth by trtmnt output outvarmeans var
vargro meanmeangro run data varmeansset
varmeans vargrolog(vargro)meangrolog(meangro)
proc gplot datavarmeans plot
vargromeangro run proc reg datavarmeans model
vargromeangro run
29
SAS output
30
SAS regression
Descriptive statistics by treatmentfor pea section growth data


The REG Procedure
Model MODEL1
Dependent Variable vargro
Root MSE 0.24863 R-Square 0.8990
Dependent Mean -3.14165 Adj R-Sq 0.8654
Coeff Var -7.91405    
 
Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates Parameter Estimates
Variable DF ParameterEstimate StandardError t Value Pr gt t
Intercept 1 -17.58795 2.79721 -6.29 0.0081
meangro 1 7.39762 1.43125 5.17 0.0141
 
 
31
SAS trans. And analysis code
data trans set peasection ytgrowth-2.69881
proc glm datatrans class trtmnt model
yttrtmnt means trtmnt /hovtestlevene(typesquar
e) output outresi rres run proc boxplot
dataresi title'BoxPlot for one-way ANOVA
example' plot restrtmnt/ cframe vligb
cboxes dagr
cboxfill ywh insetgroup mean
stddev q1 q2 q3/header 'Summary by
Treatment' ctext red run
32
SAS output

 
The GLM Procedure
 
Dependent Variable yt
Source DF Sum of Squares Mean Square F Value Pr gt F
Model 4 0.00004922 0.00001231 72.52 lt.0001
Error 45 0.00000764 0.00000017    
Corrected Total 49 0.00005686      
Source DF Type I SS Mean Square F Value Pr gt F
trtmnt 4 0.00004922 0.00001231 72.52 lt.0001
Source DF Type III SS Mean Square F Value Pr gt F
trtmnt 4 0.00004922 0.00001231 72.52 lt.0001
 
Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means Levene's Test for Homogeneity of yt VarianceANOVA of Squared Deviations from Group Means
Source DF Sum of Squares Mean Square F Value Pr gt F
trtmnt 4 3.2E-14 8E-15 0.21 0.9297
Error 45 1.69E-12 3.75E-14    
33
SAS output
34
Two-way ANOVA fixed factors
An educational researcher was interested in the
factors noise and solitude as they affect study
conditions. Each subject in an experiment was
asked to study an essay on American history for
15 minutes and then was tested on a 25 item quiz,
the number of correct items being the score. The
subjects differed, however, in the conditions
under which they were allowed to study Factor
Solitude with 2 levels Alone and not alone
(w/stooge) Factor Noise with 3 levels no noise,
soft background music, and loud rock and roll
music. There are 3 replication of each treatment
combination.
35
SAS data step
data QuizScores input Solitude Noise Score
_at__at_ datalines Alone None 10 Alone None
6 Alone None 14 Alone Soft 21 Alone
Soft 21 Alone Soft 16 Alone Loud 5
Alone Loud 15 Alone Loud 7 Stooge None
6 Stooge None 11 Stooge None 1 Stooge Soft
6 Stooge Soft 17 Stooge Soft 13 Stooge
Loud 1 Stooge Loud 2 Stooge Loud 6
36
SAS E.D.A
proc boxplot dataquizscores title'BoxPlot for
two-way ANOVA example' plot scorenoise(solitude)
/ cframe vligb
cboxes dagr cboxfill
ywh inset mean max min/postm header'The
overall summary' insetgroup mean stddev q1 q2
q3/header 'Summary by Treatment' ctext
red run proc means dataquizscores
noprint by solitude noise var score output
outmeanquizscore meanmeanquiz run symbol
ij symbol2 ij proc gplot datameanquizscore p
lot meanquizNoisesolitude plot
meanquizsolitudenoise run
37
SAS output
38
SAS output
39
SAS output
40
SAS output

proc glm dataquizscores class solitude
noise model scoresolitudenoise run
The GLM Procedure
 
Dependent Variable Score
 
Source DF Sum of Squares Mean Square F Value Pr gt F
Model 5 471.1111111 94.2222222 4.90 0.0113
Error 12 230.6666667 19.2222222    
Corrected Total 17 701.7777778      
 
Source DF Type I SS Mean Square F Value Pr gt F
Solitude 1 150.2222222 150.2222222 7.82 0.0162
Noise 2 312.4444444 156.2222222 8.13 0.0059
SolitudeNoise 2 8.4444444 4.2222222 0.22 0.8060
Source DF Type III SS Mean Square F Value Pr gt F
Solitude 1 150.2222222 150.2222222 7.82 0.0162
Noise 2 312.4444444 156.2222222 8.13 0.0059
SolitudeNoise 2 8.4444444 4.2222222 0.22 0.8060

41
Slices
  • On this example interaction was not significant.
    But what we should do if it were?
  • There are a way to come out with this problem
    SLICES.
  • Since main effects could be either significant or
    not at the presence of interaction, we need to
    test how they change at a given level of a
    treatment.
  • In SAS, we use the following statement to obtain
    the slices
  • lsmeans interaction/slicetreatment

42
SAS two way ANOVA random factor
An experiment was performed to examine the effect
of time Aging on the strength of cement. From a
large number of mixes three cement mixes were
randomly selected and six specimens were produced
form each mix. After two days three randomly
selected specimens from each mix were tested for
strength with a load test and the other three
specimens were tested after seven days. This is
a two-way classification with factor Cement Mix
(three levels) and Time (2 levels) The levels of
factor Time were predetermined. The three levels
of cement mixes were randomly selected from a
large number of mixes, thus Cement Mix factor is
Random.
43
SAS data input
data YieldLoads input Aging Mix Load
_at__at_ datalines 2-Days 1 574 2-Days 1 564 2-Days 1
550 2-Days 2 524 2-Days 2 573 2-Days 2 551 2-Days
3 576 2-Days 3 540 2-Days 3 592 7-Days 1 1092
7-Days 1 1086 7-Days 1 1065 7-Days 2 1028 7-Days
2 1073 7-Days 2 998 7-Days 3 1066 7-Days 3 1045
7-Days 3 1055
44
SAS code
proc glm datayieldloads class aging mix model
load aging mix agingmix random mix agingmix
/test run OR USING proc mixed
datayieldloads class aging mix model load
aging random mix mixaging run
45
SAS output
Source Type III Expected Mean Square
Aging Var(Error) 3 Var(AgingMix) Q(Aging)
Mix Var(Error) 3 Var(AgingMix) 6 Var(Mix)
AgingMix Var(Error) 3 Var(AgingMix)
The GLM Procedure
Tests of Hypotheses for Mixed Model Analysis of Variance Dependent Variable Load


Source DF Type III SS Mean Square F Value Pr gt F
Aging 1 1107072 1107072 1965.80 0.0005
Mix 2 2957.444444 1478.722222 2.63 0.2758
Error MS(AgingMix) 2 1126.333333 563.166667    
Source DF Type III SS Mean Square F Value Pr gt F
AgingMix 2 1126.333333 563.166667 1.06 0.3774
Error MS(Error) 12 6386.666667 532.222222    
46
Question
  • Option 1. Go back and complete SLICE part
  • or
  • Option 2. Go ahead to the MANOVA
  • ?

47
MANOVA example
A researcher randomly assigns 33 subjects to one
of three groups G1 receives technical dietary
information interactively from an on-line
website. G2 receives the same information in
from a nurse practitioner G3 receives the
information from a video tape made by the same
nurse practitioner The researcher looks at three
different ratings of the presentation,
difficulty, useful and importance, to determine
if there is a difference in the modes of
presentation. In particular, the researcher is
interested in whether the interactive website is
superior because that is the most cost-effective
way of delivering the information.
48
SAS code
proc glm datamanovaex class group model
useful difficulty importance group contrast
'1 vs 23' group 2 -1 -1 contrast '2 vs 3'
group 0 1 -1 manova h_all_ run Note go
to the manova.sas example
Write a Comment
User Comments (0)
About PowerShow.com