Title: Comparison of two samples
1Comparison of two samples
- F-test a two-sample t-test
- (both tests assume data normality)
2I have two samples and I want to know, if they
differ from each other
- Ten plants have been watered with normal water
and ten with water enriched with nutrients and
after one year I investigate their weight (or
number of stomata on leaves) - I have x individual of one species and y
individuals of the other one and I want to know,
if their pistil lengths differ from each other
3I have two solutions
- The samples (their parent population/distribution)
can differ from each other either in variance or
in mean
or in both...
Even two samples of one population will differ
both in variance and in mean. Thus I am
interested in, if the two samples differ so much,
that it is improbable to be taken from the same
population.
4F-test - test of homogenity of variance
H0 ?12 ?22, alternative HA ?12 ? ?22
We suppose
numerator
d.f.
denominator
Critical value for test on 5 is thus 97.5
quantile, or
5I have to multiply value of this area by two to
obtain P-value Probability.
6Example
Two-way test for variance ratio for hypothesis
Data represent numbers of moths captured during
one night by eleven traps of type 1 or by eight
traps of type 2.
Type 1 trap
Type 2 trap
moth2
moth2
moth2
moth2
Critical value depends on two degrees of freedom
Thus we dont refuse H0.
7If I conclude that variances dont differ from
each other, I can compute pooled variance
For moths sp2(218.73 107.50) / (10 7)
19.19 moth2.
8But we compare means more often than variances
We test null hypothesis H0 ?1 ?2
against alternative one HA ?1 ? ?2.
Classic t-test
Mean difference
Standard error of mean difference
9Standard error of mean difference is computed
with help of collective variance s2p
I suppose homogeneity of variances
Resulting formula is then
10Assumptions of t-test are
- Data normality
- homogeneity of variances
11Take notice value of standard error decreases
(and power of test increases) with number of
observations in groups if we have constant total
number of observations, then the error is
smallest if the groups are of the same size.
12Number of degrees of freedom is sum of degrees of
freedom for both samples, thus (n1-1) (n2-1)
n1 n2 - 2.
13Two-sample t-test for two-tailed hypothesis H0
?1 ?2 and HA ?1 ? ?2 (which can also be
written as H0 ?1 - ?2 0 a HA ?1 - ?2 ? 0).
Data are times of sedimentation (in minutes) for
human blood after vaccination of two different
medicines (B, G). Medicine B vaccinated 8.8,
8.4, 7.9, 8.7, 9.1, 9.6 Medicine G vaccinated
9.9, 9.0, 11.1, 9.6, 8.7, 10.4, 9.5 n1 6 n2
7 df1 5 df2 6 X1 8.75 min X2 9.74
min SS1 1.6950 min SS2 4.0171 min sp2
0.5193 min2 t0.05(2),?t0.05(2),11 2.201
Thus we reject H0. 0.02 lt P(?t? ? 2.475) lt 0.05
14Nowadays we better find area of tail and
(because it is two-tailed test), we multiply the
result by 2.
This area size is 0.0154 it means that P0.0308
15If homogeneity of variances is disturbed, Welch
approximate t can be used
Several other approximations exist
with approximate number of degrees of freedom
16The same number of observations in both groups is
not assumption for t-test
- But test robustness against violation of
homogeneity of variance is decreasing with very
imbalanced numbers of observations (and test for
homogeneity will be desperately weak) - Power of the test is decreasing with imbalanced
group sizes too
17Violation of data normality
- Group means and not original values are in the
t-test formula so, the means are expected to
have the normal distribution - Central limit theorem they will, if they are
based on great number of observations - With increasing number of observations increases
not only power of test, but also robustness to
violation of normality comp. with normality
tests!
18Similar for one-sample t-test, even here can we
execute the one-tailed test
- Two-tailed test I test null hypothesis
- H0 ?1 ?2 against alternative HA ?1 ? ?2.
- One-tailed test I test null hypothesis
- H0 ?1 gt ?2 against alternative HA ?1 lt ?2
- (or vice versa)
19MAKE DIFFERENCE
- test one-tailed two-tailed how null
hypothesis is formulated - one-sample t-test and two-sample (pair) one how
is experimental or observation set-up