Title: MARE 250
1Hypothesis Testing III
MARE 250 Dr. Jason Turner
2To ASSUME is to make an
However...
Four assumptions for t-test hypothesis
testing 1. Random Samples 2. Independent
Samples 3. Normal Populations (or large
samples) 4. Variances (std. dev.) are equal
3When do I do the what now?
If all 4 assumptions are met Conduct a pooled
t-test - you can pool the samples because the
variances are assumed to be equal If the samples
are not independent Conduct a paired t-test If
the variances (std. dev.) are not equal Conduct
a non-pooled t-test If the data is not normal or
has small sample size Conduct a non-parametric
t-test (Mann-Whitney)
4Significance Level
The probability of making a TYPE I Error
(rejection of a true null hypothesis) is called
the significance level (a) of a hypothesis
test TYPE II Error Probability (ß)
nonrejection of a false null hypothesis For a
fixed sample size, the smaller we specify the
significance level (a) , the larger will be the
probability (ß) , of not rejecting a false
hypothesis
5If H0 is true If H0 is false
If H0 is rejected TYPE I ERROR No Error
If H0 is not rejected No Error TYPE II ERROR
6I have the POWER!!!
The power of a hypothesis test is the probability
of not making a TYPE II error (rejecting a false
null hypothesis) t evidence to support the
alternative hypothesis POWER 1 - ß Produce a
power curve
7We need more POWER!!!
For a fixed significance level, increasing the
sample size increases the power Therefore, you
can run a test to determine if your sample size
HAS THE POWER!!!
By using a sufficiently large sample size, we can
obtain a hypothesis test with as much power as we
want
8Power - the probability of being able to detect
an effect of a given size Sample size - the
number of observations in each sample Difference
(effect) - the difference between µ for one
population and µ for the other
9- Increasing the power of the test
- There are four factors that can increase the
power of a two-sample t-test - Larger effect size (difference) - The greater the
real difference between m for the two
populations, the more likely it is that the
sample means will also be different. - Higher a-level (the level of significance) - If
you choose a higher value for a, you increase the
probability of rejecting the null hypothesis, and
thus the power of the test. (However, you also
increase your chance of type I error.) - 3. Less variability - When the standard
deviation is smaller, smaller differences can be
detected. - 4. Larger sample sizes - The more observations
there are in your samples, the more confident you
can be that the sample means represent m for the
two populations. Thus, the test will be more
sensitive to smaller differences.
10Increasing the power of the test The most
practical way to increase power is often to
increase the sample size However, you can also
try to decrease the standard deviation by making
improvements in your process or measurement
11Sample size Increasing the size of your samples
increases the power of your test You want enough
observations in your samples to achieve adequate
power, but not so many that you waste time and
money on unnecessary sampling If you provide the
power that you want the test to have and the
difference you want it to be able to detect,
MINITAB will calculate how large your samples
must be
12Data Transformations
MARE 250 Dr. Jason Turner
13Data Transformations
One advantage of using parametric statistics is
that it makes it much easier to describe your
data If you have established that it follows a
normal distribution you can be sure that a
particular set of measurements can be properly
described by its mean and standard deviation If
your data are not normally distributed you cannot
use any of the tests that assume that it is (e.g.
ANOVA, t test, regression analysis)
14Data Transformations
If your data are not normally distributed it is
often possible to normalize it by transforming
it. Transforming data to allow you to use
parametric statistics is completely
legitimate People often feel uncomfortable when
they transform data because it seems like it
artificially improves their results but this is
only because they feel comfortable with linear or
arithmetic scales However, there is no reason
for not using other scales (e.g. logarithms,
square roots, reciprocals or angles) where
appropriate (See Chapter 13)
15Data Transformations
Different transformations work for different data
types Logarithms Growth rates are often
exponential and log transforms will often
normalize them. Log transforms are particularly
appropriate if the variance increases with the
mean. Reciprocal If a log transform does not
normalize your data you could try a reciprocal
(1/x) transformation. This is often used for
enzyme reaction rate data. Square root This
transform is often of value when the data are
counts, e.g. urchins, Honu. Carrying out a
square root transform will convert data with a
Poisson distribution to a normal
distribution. Arcsine This transformation is
also known as the angular transformation and is
especially useful for percentages and proportions
16Which Transformation?
Johnson Transformation is useful when the
collected data are non-normal, but you want to
apply a methodology that requires a normal
distribution MINITAB selects one optimal
distribution function from Johnson distribution
system to transform the data to a normal
distribution
17Johnson Transformation
18Johnson Transformation
Determining the optimal transformation Johnson
Transformation uses the following algorithm to
determine an optimal transformation from the
three families of distribution SB, SL, and SU,
where B, L, and U refer to the variable being
bounded, lognormal, and unbounded Bounded -
something is of finite size, and that this is the
case if it is smaller than some other object that
has a finite size. (Otherwise it is unbounded.)
19How To?
STAT Quality Tools Johnson Transformation Ent
er what variable to be transformed, and what the
new transformed variable will be called Places
transformed data into a new column in your
MINITAB datasheet can copy this into Excel and
save FOREVER