Title: Introduction to statistics
1Introduction to statistics
- Programme Bioinformatics
- Master Grid Computing
- April 2007
Prof dr AHC van Kampen Bioinformatics
Laboratory KEBB, AMC
2Descriptive Statistics
3Describing data
4Quartile
- In descriptive statistics, a quartile is any of
the three values which divide the sorted data set
into four equal parts, so that each part
represents 1/4th of the sample or population. - first quartile (designated Q1) lower quartile
cuts off lowest 25 of data 25th percentile - second quartile (designated Q2) median cuts
data set in half 50th percentile - third quartile (designated Q3) upper quartile
cuts off highest 25 of data, or lowest 75
75th percentile - The difference between the upper and lower
quartiles is called the interquartile range.
5Variance, S.D. of a Sample
variance
Degrees of freedom
Standard deviation
In statistics, the term degrees of freedom (df)
is a measure of the number of independent pieces
of information on which the precision of a
parameter estimate is based
6Skewness
7Box-whisker plots
8Distributions
- Normal, binomial, Poisson, hypergeometric,
t-distribution, chi-square - What parameters describe their shapes
- How these distributions can be useful
9Normal distribution
10The Normal Distribution
- Also called a Gaussian distribution
- Centered around the mean ? with a width
determined by the standard deviation ? - Total area under the curve 1.0
11A Normal Distribution . . .
- For a mean of 5 and a standard deviation of 1
12What Does a Normal Distribution Describe?
- Imagine that you go to the lab and very carefully
measure out 5 ml of liquid and weigh it. - Imagine repeating this process many times.
- You wont get the same answer every time, but if
you make a lot of measurements, a histogram of
your measurements will approach the appearance of
a normal distribution.
13What Does a Normal Distribution Describe?
- Any situation in which the exact value of a
continuous variable is altered randomly from
trial to trial. - The random uncertainty or random error
14How Do You Use The Normal Distribution?
- Use the area UNDER the normal distribution
- For example, the area under the curve between xa
and xb is the probability that your next
measurement of x will fall between a and b
15A normal distribution with a mean of 75 and a
standard deviation of 10. The shaded area
contains 95 of the area and extends from 55.4 to
94.6. For all normal distributions, 95 of the
area is within 1.96 standard deviations of the
mean.
16How Do You Get ? and ??
- To draw a normal distribution you must know ? and
? - If you made an infinite number of measurements,
their mean would be ? and their standard
deviation would be ? - In practice, you have a finite number of
measurements with mean x and standard deviation s - For now, ? and ? will be given
- Later well use x and s to estimate ? and ?
17The Standard Normal Distribution
- It is tedious to integrate a new normal
distribution for every population, so use a
standard normal distribution with standard
tabulated areas. - Convert your measurement x to a standard score
(z-score) - z (x - ?) / ?
- Use the standard normal distribution
- ? 0 and ? 1
- (areas tabulated in any statistics text book)
The z-score indicates the number of standard
deviations that value x is away from the mean ?
18(No Transcript)
19Probability density function
z-transform
green curve is standard normal distribution
20Cummulative distribution functions
21Exercises 1
If scores are normally distributed with a mean
of 30 and a standard deviation of 5, what
percent of the scores is (a) greater than 30?
(b) greater than 37? (c) between 28 and
34? What proportion of a normal distribution is
within one standard deviation of the mean?
What proportion is more than 1.8 standard
deviations from the mean? A test is normally
distributed with a mean of 40 and a standard
deviation of 7. What value would be needed to be
in the 85th percentile?
Stat tables http//www.statsoft.com/textbook/stta
ble.html
22Binomial distribution
23What Does the Binomial Distribution Describe?
- yes/no experiments (two possible outcomes)
- The probability of getting all tails if you
throw a coin three times - The probability of getting all male puppies in a
litter of 8 - The probability of getting two defective
batteries in a package of six
24Exercise 2
- What is the probability of getting one 2 when
you roll six dice?
25The Binomial Distribution
- The probability of getting the result of interest
k times out of n, if the overall probability of
the result is p - Note that here, k is a discrete variable
- Integer values only
bionomial coefficient
26Binomial Distribution
- n 6 number of dice rolled
- p 1/6 probability of rolling a 2
- k 0 1 2 3 4 5 6 of 2s out of 6
0.402
27Binomial Distribution
- n 8 number of puppies in litter
- p 1/2 probability of any pup being male
- k 0 1 2 3 4 5 6 7 8 of males out of 8
28The Shape of the Binomial Distribution
- Shape is determined by values of n and p
- Only truly symmetric if p 0.5
- Approaches normal distribution if n is large,
unless ? is very small - Mean number of successes is np
- Variance of distribution is
- variance (X) n p(1- p)
29Exercise 3
- While you are in the bathroom, your little
brother claims to have rolled a Yahtzee in 6s
(five dice all 6s) in one roll of the five dice.
How justified would you be in beating him up for
cheating?
30Poisson distribution
P(µ) probability of getting n counts (0, 1,
2,...) µ average of distribution
variance mean
31Poisson distribution
Randomly placed dots over 50 scale divisions. On
average µ1 dot per interval
µ1
P(µ) probability of getting n counts µ average
of distribution
n
32Exercise 4
Pn(µ) probability of getting n counts µ average
of distribution
Average number of phone calls in 1 hour
2.1 What is probability of getting 4 calls?
33Exercise 5
Pn(µ) probability of getting a discrete value
n µ average of distribution
Average number of phone calls in 1 hour
2.1 What is probability of getting 0
calls? Does this simply the formula?
34Hypergeometric distribution
35Hypergeometric Distribution
- Suppose that we have an urn with N balls in it
of these m are white and others are black. - Then k balls are drawn from the urn without
replacement and of these X are observed to be
white. - X is a random variable following hypergeometric
distribution
N20 mn10
draw k10 balls
X6
36Hypergeometric Distribution
P(X x)
37Fishers Exact Test
- We often want to ask whether there are more white
balls in the sample than expected by chance.
P(X ? x)
- If the probability is small, it is less likely
that we get the result by chance.
38Hypergeometric example
- Extract a cluster of 36 samples from leukemia
microarray dataset - Whole dataset 47 ALL 25 AML
- Extracted 29 ALL 7 AML
- Is this sample enriched for ALL samples?
Pr(extracted ALL 29)
0.006
- Conclusion This cluster is significantly
enriched with ALL samples.
39Sampling Distribution
- Every time we take a random sample and calculate
a statistic, the value of the statistic changes
(remember, a statistic is a random variable). - If we continue to take random samples and
calculate a given statistic over time, we will
build up a distribution of values for the
statistic. This distribution is referred to as a
sampling distribution. - A sampling distribution is a distribution that
describes the chance fluctuations of a statistic
calculated from a random sample.
40Sampling Distribution of the Mean
- The probability distribution of is called
the sampling distribution of the mean. - The distribution of , for a given sample
size, n, describes the variability of sample
averages around the population mean µ.
41Sampling Distribution of the Mean
- If a random sample of size n is taken from a
normal population having mean µx and variance
, then is a random variable which is also
normally distributed with mean µx and variance
. - Further,
- is a standard normal random variable.
42Sampling Distribution of the Mean
Original population
1
3
n(100,5)
n(100,1.58)
2
4
n(100,3.54)
n(100,1)
5/sqrt(2)3.54
43Sampling Distribution of the Mean
- Example A manufacturer of steel rods claims that
the length of his bars follows a normal
distribution with a mean of 30 cm and a standard
deviation of 0.5 cm. - Assuming that the claim is true, what is the
probability that a given bar will exceed 30.1 cm? - (b) Assuming the claim is true, what is the
probability that the mean of 10 randomly chosen
bars will exceed 30.1 cm? - (c) Assuming the claim is true, what is the
probability that the mean of 100 randomly chosen
bars will exceed 30.1 cm?
44Sampling Distribution of the Mean
- Example A manufacturer of steel rods claims that
the length of his bars follows a normal
distribution with a mean of 30 cm and a standard
deviation of 0.5 cm. - Assuming that the claim is true, what is the
probability that a given bar will exceed 30.1 cm?
(z30.1-30)/0.50.2 ?p0.42) - (b) Assuming the claim is true, what is the
probability that the mean of 10 randomly chosen
bars will exceed 30.1 cm? - (z30.1-30)/(0.5/sqrt(10)0.63 ?p0.26)
- (c) Assuming the claim is true, what is the
probability that the mean of 100 randomly chosen
bars will exceed 30.1 cm? - (z30.1-30)/(0.5/sqrt(100)2 ?p0.02)
45Sampling Distribution of the Mean
.42
.42
. 26
.02
46Inference on Population Mean
- Example Suppose that it is very important to our
manufacturing process that we detect a deviation
in the bar mean of 0.1 cm or more. - Will sampling one bar allow us to detect a shift
of 0.1 cm in the population mean? - Will sampling ten bars allow us to detect a
shift of 0.1 cm in the population mean? - Will sampling one hundred bars allow us to
detect a shift of 0.1 cm in the population mean?
47Inference on Population Mean
48Inference on Population Mean
49Inference on Population Mean
50Properties of Sample Mean as Estimator of
Population Mean
- Expected value of sample mean is population mean
- Among UNBIASED estimators, the mean has the
SMALLEST variance - Variance
UNBIASED
??
??
_
??
_
As n increase, decrease.
standard error
x
x
51(No Transcript)
52When the Population is Normal Sampling
Distribution is Also Normal
Population Distribution
Central Tendency
??
??
_
x
Variation
??
Sampling Distributions
??
_
x
n 16??X 2.5
n 4??X 5
53Central Limit Theorem
As Sample Size Gets Large Enough
Sampling Distribution
Becomes almost normal regardless of shape of
population
54When The Population is Not Normal
Population Distribution
Central Tendency
? 10
??
??
_
x
Variation
? 50
X
??
Sampling Distributions
??
_
x
n 30??X 1.8
n 4??X 5
55Central Limit Theorem
- As the sample size increases the sampling
distribution of the sample mean approaches the
normal distribution with mean ? and variance
?2/n -
56Example Sampling Distribution
Standardized
Sampling Distribution
Normal Distribution
? 1
.3830
.1915
.1915
Z
???? 0
7.8 8 8.2
57Central Limit Theorem
- Rule of thumb normal approximation for will
be good if n gt 30. If n lt 30, the approximation
is only good if the population from which you are
sampling is not too different from normal. - Otherwise t-distribution
58t-Distribution
- So far, we have been assuming that we knew the
value of s. This may be true if one has a large
amount of experience with a certain process. - However, it is often true that one is estimating
s along with µ from the same set of data.
59t-Distribution
- To allow for such a situation, we will consider
the t statistic -
- which follows a t-distribution.
? standard error of the mean
60t-Distribution
t(n?) Z
t(n6)
t(n3)
61t-Distribution
- If is the mean of a random sample of size n
taken from a normal population having the mean µ
and variance s2, and - then
-
- is a random variable following the t-
distribution with parameter ? n 1, where ? is
degrees of freedom.
62t-Distribution
- The t-distribution has been tabularized.
- ta represents the t-value that has an area of a
to the right of it. - Note, due to symmetry, t1-a -ta
t.05
t.80
t.20
t.95
63Example t-Distribution
- The resistivity of batches of electrolyte follow
a normal distribution. We sample 5 batches and
get the following readings 1400, 1450, 1375,
1500, 1550. - Does this data support or refute a population
average of 1400?
64Example t-Distribution
Support
p0.025
Refute
Refute
t2.78
1.71
65Sampling Distribution of the Variance
- The probability distribution of S2 is called the
sampling distribution of the Variance. - The distribution of S2, for a given sample size,
n, describes the variability of sample variances
around the population variance s2.
66Sampling Distribution S2
- If S2 is the variance of a random sample of size
n taken from a normal population having the
variance s2, then the statistic - has a chi-squared distribution with ? n 1
degrees of freedom.
67Chi-Squared Distribution
?2(n3)
?2(n6)
?2(n11)
68- Introduction to Hypothesis Testing
69Nonstatistical Hypothesis Testing
- A criminal trial is an example of hypothesis
testing without the statistics. - In a trial a jury must decide between two
hypotheses. The null hypothesis is - H0 The defendant is innocent
- The alternative hypothesis or research hypothesis
is - H1 The defendant is guilty
- The jury does not know which hypothesis is true.
They must make a decision on the basis of
evidence presented.
70Nonstatistical Hypothesis Testing
- In the language of statistics convicting the
defendant is called rejecting the null hypothesis
in favor of the alternative hypothesis. That is,
the jury is saying that there is enough evidence
to conclude that the defendant is guilty (i.e.,
there is enough evidence to support the
alternative hypothesis). - If the jury acquits it is stating that there is
not enough evidence to support the alternative
hypothesis. Notice that the jury is not saying
that the defendant is innocent, only that there
is not enough evidence to support the alternative
hypothesis. That is why we never say that we
accept the null hypothesis. -
71Nonstatistical Hypothesis Testing
- There are two possible errors.
- A Type I error occurs when we reject a true null
hypothesis. That is, a Type I error occurs when
the jury convicts an innocent person. - A Type II error occurs when we dont reject a
false null hypothesis. That occurs when a guilty
defendant is acquitted.
72Nonstatistical Hypothesis Testing
- The probability of a Type I error is denoted as
a. - The probability of a type II error is ß.
- The two probabilities are inversely related.
Decreasing one increases the other.
73Nonstatistical Hypothesis Testing
- In the (US) system Type I errors are regarded as
more serious. We try to avoid convicting innocent
people. We are more willing to acquit guilty
people. - We arrange to make a small by requiring the
prosecution to prove its case and instructing the
jury to find the defendant guilty only if there
is evidence beyond a reasonable doubt.
74Nonstatistical Hypothesis Testing
- The critical concepts are these
- There are two hypotheses, the null and the
alternative hypotheses. - 2. The procedure begins with the assumption that
the null hypothesis is true. - 3. The goal is to determine whether there is
enough evidence to infer that the alternative
hypothesis is true. - 4. There are two possible decisions
- Conclude that there is enough evidence to support
the alternative hypothesis. - Conclude that there is not enough evidence to
support the alternative hypothesis.
75Nonstatistical Hypothesis Testing
- 5. Two possible errors can be made.
- Type I error Reject a true null hypothesis
- Type II error Do not reject a false null
hypothesis. - P(Type I error) a
- P(Type II error) ß
76Introduction
- Hypothesis testing is a procedure for making
inferences about a population. - Hypothesis testing allows us to determine whether
enough statistical evidence exists to conclude
that a belief (i.e. hypothesis) about a parameter
is supported by the data.
77Concepts of Hypothesis Testing (1)
- There are two hypotheses. One is called the null
hypothesis and the other the alternative or
research hypothesis. The usual notation is - H0 the null hypothesis
- H1 the alternative or research hypothesis
- The null hypothesis (H0) will always state that
the parameter equals the value specified in the
alternative hypothesis (H1)
78Concepts of Hypothesis Testing
- Consider example mean demand for computers
during assembly lead time. Rather than estimate
the mean demand, our operations manager wants to
know whether the mean is different from 350
units. We can rephrase this request into a test
of the hypothesis - H0 350
- Thus, our research hypothesis becomes
- H1 ? 350
This is what we are interested in determining
79Concepts of Hypothesis Testing
- The testing procedure begins with the assumption
that the null hypothesis is true. - Thus, until we have further statistical evidence,
we will assume - H0 350 (assumed to be TRUE)
80Concepts of Hypothesis Testing
- The goal of the process is to determine whether
there is enough evidence to infer that the
alternative hypothesis is true. - That is, is there sufficient statistical
information to determine if this statement - H1 ? 350, is true?
This is what we are interested in determining
81Concepts of Hypothesis Testing
- There are two possible decisions that can be
made - Conclude that there is enough evidence to support
the alternative hypothesis - (also stated as rejecting the null hypothesis in
favor of the alternative) - Conclude that there is not enough evidence to
support the alternative hypothesis - (also stated as not rejecting the null
hypothesis in favor of the alternative) - NOTE we do not say that we accept the null
hypothesis
82Concepts of Hypothesis Testing
- Once the null and alternative hypotheses are
stated, the next step is to randomly sample the
population and calculate a test statistic (in
this example, the sample mean). - If the test statistics value is inconsistent
with the null hypothesis we reject the null
hypothesis and infer that the alternative
hypothesis is true. - For example, if were trying to decide whether
the mean is not equal to 350, a large value of
(say, 600) would provide enough evidence. If
is close to 350 (say, 355) we could not say that
this provides a great deal of evidence to infer
that the population mean is different than 350.
83Concepts of Hypothesis Testing
- Two possible errors can be made in any test
- A Type I error occurs when we reject a true null
hypothesis and - A Type II error occurs when we dont reject a
false null hypothesis. - There are probabilities associated with each type
of error - P(Type I error)
- P(Type II error )
- is called the significance level.
84Types of Errors
- A Type I error occurs when we reject a true null
hypothesis (i.e. Reject H0 when it is TRUE) - A Type II error occurs when we dont reject a
false null hypothesis (i.e. Do NOT reject H0 when
it is FALSE)
85Types of Errors
- Back to our example, we would commit a Type I
error if - Reject H0 when it is TRUE
- We reject H0 ( 350) in favor of H1 (
? 350) when in fact the real value of is
350. - We would commit a Type II error in the case
where - Do NOT reject H0 when it is FALSE
- We believe H0 is correct ( 350), when in
fact the real value of is something other
than 350.
86Recap
- The null hypothesis must specify a single value
of the parameter (e.g. ___) - Assume the null hypothesis is TRUE.
- Sample from the population, and build a statistic
related to the parameter hypothesized (e.g. the
sample mean, ) - Compare the statistic with the value specified in
the first step
87Example
- A department store manager determines that a new
billing system will be cost-effective only if the
mean monthly account is more than 170. - A random sample of 400 monthly accounts is drawn,
for which the sample mean is 178. The accounts
are approximately normally distributed with a
standard deviation of 65. - Can we conclude that the new system will be
cost-effective?
88Example
- The system will be cost effective if the mean
account balance for all customers is greater than
170. - We express this belief as a our research
hypothesis, that is - H1 gt 170 (this is what we want to
determine) - Thus, our null hypothesis becomes
- H0 170 (this specifies a single value
for the parameter of interest)
89Example
- What we want to show
- H1 gt 170
- H0 170 (well assume this is true)
- We know
- n 400
- 178
- 65
- Hmm. What to do next?!
90Example
- To test our hypotheses, we can use two different
approaches - The rejection region approach (typically used
when computing statistics manually), and - The p-value approach (which is generally used
with a computer and statistical software). - We will explore both in turn
91Example. Rejection Region
- The rejection region is a range of values such
that if the test statistic falls into that range,
we decide to reject the null hypothesis in favor
of the alternative hypothesis.
is the critical value of to reject H0.
92Example
- It seems reasonable to reject the null hypothesis
in favor of the alternative if the value of the
sample mean is large relative to 170, that is if
gt .
P( gt ) is also P(rejecting H0
given that H0 is true) P(Type I error)
93Example
- All thats left to do is calculate and
compare it to 178.
we can calculate this based on any level of
significance ( ) we want
94Example
- At a 5 significance level (i.e. 0.05), we
get - Solving we compute 175.34
- Since our sample mean (178) is greater than the
critical value we calculated (175.34), we reject
the null hypothesis in favor of H1, i.e. that
- gt 170 and that it is cost effective
to install the new billing system
95Example The Big Picture
175.34
178
Reject H0 in favor of
96Standardized Test Statistic
- An easier method is to use the standardized test
statistic - and compare its result to (rejection
region z gt ) - Since z 2.46 gt 1.645 (z.05), we reject H0 in
favor of H1
97p-Value
- The p-value of a test is the probability of
observing a test statistic at least as extreme as
the one computed given that the null hypothesis
is true. - In the case of our department store example, what
is the probability of observing a sample mean at
least as extreme as the one already observed
(i.e. 178), given that the null hypothesis
(H0 170) is true?
p-value
98Interpreting the p-value
- The smaller the p-value, the more statistical
evidence exists to support the alternative
hypothesis. - We observe a p-value of .0069, hence there is
evidence to support H1 gt 170.
99Interpreting the p-value
Overwhelming Evidence (Highly Significant)
Strong Evidence (Significant)
Weak Evidence (Not Significant)
No Evidence (Not Significant)
0 .01
.05 .10
p.0069
100Interpreting the p-value
- Compare the p-value with the selected value of
the significance level - If the p-value is less than , we judge the
p-value to be small enough to reject the null
hypothesis. - If the p-value is greater than , we do not
reject the null hypothesis. - Since p-value .0069 lt .05, we reject H0
in favor of H1
101Another example
- The objective of the study is to draw a
conclusion about the mean payment period. Thus,
the parameter to be tested is the population
mean. We want to know whether there is enough
statistical evidence to show that the population
mean is less than 22 days. Thus, the alternative
hypothesis is -
- H1µ lt 22
- The null hypothesis is
-
- H0µ 22
-
102Another example
- The test statistic is
- We wish to reject the null hypothesis in favor of
the alternative only if the sample mean and hence
the value of the test statistic is small enough.
As a result we locate the rejection region in the
left tail of the sampling distribution. - We set the significance level at 10.
103Another example
- Rejection region
- Assume
- and
- p-value P(Z lt -.91) .5 - .3186 .1814
-
Conclusion There is not enough evidence to infer
that the mean is less than 22.
104One and TwoTail Testing
- The department store example was a one tail test,
because the rejection region is located in only
one tail of the sampling distribution - More correctly, this was an example of a right
tail test.
105One and TwoTail Testing
- The payment period example is a left tail test
because the rejection region was located in the
left tail of the sampling distribution.
106Right-Tail Testing
- Calculate the critical value of the mean ( )
and compare against the observed value of the
sample mean ( )
107Left-Tail Testing
- Calculate the critical value of the mean ( )
and compare against the observed value of the
sample mean ( )
108TwoTail Testing
- Two tail testing is used when we want to test a
research hypothesis that a parameter is not equal
(?) to some value
109Example
- KPN argues that its rates are such that customers
wont see a difference in their phone bills
between them and their competitors. They
calculate the mean and standard deviation for all
their customers at 17.09 and 3.87
(respectively). - They then sample 100 customers at random and
recalculate a monthly phone bill based on
competitors rates. - What we want to show is whether or not
- H1 ? 17.09. We do this by assuming that
- H0 17.09
110Example
- The rejection region is set up so we can reject
the null hypothesis when the test statistic is
large or when it is small. - That is, we set up a two-tail rejection region.
The total area in the rejection region must sum
to , so we divide this probability by 2.
stat is small
stat is large
111Example
- At a 5 significance level (i.e. .05), we
have - /2 .025. Thus, z.025 1.96 and our
rejection region is - z lt 1.96 -or- z gt 1.96
z
-z.025
z.025
0
112Example
- From the data, we calculate 17.55
- Using our standardized test statistic
- We find that
- Since z 1.19 is not greater than 1.96, nor less
than 1.96 we cannot reject the null hypothesis
in favor of H1. That is there is insufficient
evidence to infer that there is a difference
between the bills of KPN and the competitor.
113Summary of One- and Two-Tail Tests
114Probability of a Type II Error
- It is important that that we understand the
relationship between Type I and Type II errors
that is, how the probability of a Type II error
is calculated and its interpretation. - Recall previous example
- H0 170
- H1 gt 170
- At a significance level of 5 we rejected H0 in
favor of H1 since our sample mean (178) was
greater than the critical value of (175.34)
115Probability of a Type II Error
- A Type II error occurs when a false null
hypothesis is not rejected. - In our example this means that if is less
than 175.34 (our critical value) we will not
reject our null hypothesis, which means that we
will not install the new billing system. - Thus, we can see that
- P( lt 175.34 given that the null
hypothesis is false)
116Example
- P( lt 175.34 given that the null
hypothesis is false) - The condition only tells us that the mean ? 170.
We need to compute for some new value of
. For example, suppose the mean account
balance needs to be 180 in order to cost justify
the new billing system - P( lt 175.34, given that 180),
thus
117Example
Our original hypothesis
our new assumption
118Effects on of Changing
- Decreasing the significance level ,
increases the value of and vice versa. - Consider this diagram again. Shifting the
critical value line to the right (to decrease
) will mean a larger area under the lower curve
for (and vice versa)
119Judging the Test
- A statistical test of hypothesis is effectively
defined by the significance level ( ) and
the sample size (n), both of which are selected
by the statistics practitioner. - Therefore, if the probability of a Type II error
( ) is judged to be too large, we can reduce
it by - increasing , and/or
- increasing the sample size, n.
120Judging the Test
- For example, suppose we increased n from a sample
size of 400 account balances to 1,000 - The probability of a Type II error ( ) goes
to a negligible level while remains at 5
121Judging the Test
- The power of a test is defined as 1 .
- It represents the probability of rejecting the
null hypothesis when it is false.
122Error Rates and Power(H0 and H1 null and
alternative hypothes)
123Factors Affecting Power
- Increasing overall sample size increases power
- Having unequal group sizes usually reduces power
- Larger size of effect being tested increases
power - Setting lower significance level decreases power
- Violations of assumptions underlying test often
decrease power substantially
124Exercises
- Exercises see word document
125The t-test
126Recall t distribution.
- Take random sample of size n from a N(m,s2)
population. - has a standard normal
distribution. - Consider .
- This is approximately normal if n is large.
- If n is small, S is not expected to be close to
s. S introduces additional variability. Thus
this statistic will be more variable that a
standard normal random variable. - This statistic follows a t distribution with
n-1degrees of freedom.
127The t distribution.
red t with 1 d.f., green t with 5
d.f., yellow t with 10 d.f., blue standard
normal
The t distribution is similar in shape to the
normal distribution, but is more spread out. As
the degrees of freedom go to infinity the t
distribution approaches the standard normal
distribution.
128Confidence Intervals.
- Suppose that the population is normally
distributed with mean m and variance s2. Then - If s is known, a 100(1-a) confidence interval
for m is. - If s is not known, a 100(1-a) confidence
interval for m is. -
129Overview of the t-test
- The t-test is used to help make decisions about
population values. - There are two main forms of the t-test, one for a
single sample and one for two samples. - The one sample t-test is used to test whether a
population has a specific mean value - The two sample t-test is used to test whether
population means are equal, e.g., do training and
control groups have the same mean.
130One-sample t-test
- We can use a confidence interval to test or
decide whether a population mean has a given
value. - For example, suppose we want to test whether the
mean height of women at USF is less than 68
inches. - We randomly sample 50 women students at USF.
- We find that their mean height is 63.05 inches.
- The SD of height in the sample is 5.75 inches.
- Then we find the standard error of the mean by
dividing SD by sqrt(N) 5.75/sqrt(50) .81. - The critical value of t with (50-1) df is
2.01(find this in a t-table). - Our confidence interval is, therefore, 63.05
plus/minus 1.63.
131One-sample t-test example
Take a sample, set a confidence interval around
the sample mean. Does the interval contain the
hypothesized value?
132One-sample t-test Example
The sample mean is roughly six standard
deviations (St. Errors) from the hypothesized
population mean. If the population mean is
really 68 inches, it is very, very unlikely that
we would find a sample with a mean as small as
63.05 inches.
133Two-sample t-test
- Used when we have two groups, e.g.,
- Experimental vs. control group
- Males vs. females
- New training vs. old training method
- Tests whether group population means are the
same. - Can be means are just same or different
(nondirectional) - or can predict one group higher (directional).
134Sampling Distribution of Mean Differences
- Suppose we sample 2 groups of size 50 at random
from USF. - We measure the height of each person and find the
mean for each group. - Then we subtract the mean for group 1 from the
mean for group 2. Suppose we do this over and
over. - We will then have a sampling distribution of mean
differences. - If the two groups are sampled at random from 1
population, the mean of the differences in the
long run will be zero because the mean for both
groups will be the same. - The standard deviation of the sampling
distribution will be
The standard error of the difference is the root
of the sum of squared standard errors of the
mean.
135Example of the Standard Error of the Difference
in Means
Suppose that at USF the mean height is 68 inches
and the standard deviation of height is 6 inches.
Suppose we sampled people 100 at a time into two
groups. We would expect that the average mean
difference would be zero. What would the
standard deviation of the distribution of
differences be?
The standard error for each group mean is .6, for
the difference in means, it is .85.
136Estimating the Standard Error of Mean Differences
The USF scenario we just worked was based on
population information. That is
We generally dont have population values. We
usually estimate population values with sample
data, thus
All this says is that we replace the population
variance of error with the appropriate sample
estimators.
137Pooled Standard Error
We can use this formula when the sample sizes for
the two groups are equal.
When the sample sizes are not equal across
groups, we find the pooled standard error. The
pooled standard error is a weighted average,
where the weights are the groups degrees of
freedom.
138Back to the Two-Sample t
The formula for the two-sample t-test for
independent samples looks like this
This says we find the value of t by taking the
difference in the two sample means and dividing
by the standard error of the difference in means.
139Example of the two-sample t, Empathy by College
Major
Suppose we have a professionally developed test
of empathy. The test has people view film
clips and guess what people in the clips are
feeling. Scores come from comparing what
people guess to what the people in the films said
they felt at the time. We want to know
whether Psychology majors have higher scores on
average to this test than do Physics majors.
No direction, we just want to know if there is
a difference. So we find some (N15) of each
major and give each the test.
140Empathy Scores
141Empathy
142Exercise
- Exercises t-test, see word document
143Chi-square
144- Background
- 1. Suppose there are n observations.
- 2. Each observation falls into a cell (or class).
- 3. Observed frequencies in each cell O1, O2, O3,
, Ok. - Sum of the observed frequencies is n.
- 4. Expected, or theoretical, frequencies E1, E2,
E3, . . . , Ek. -
-
-
145- Goal
- 1. Compare the observed frequencies with the
expected frequencies. - 2. Decide whether the observed frequencies seem
to agree or seem to disagree with the expected
frequencies. - Methodology
- Use a chi-square statistic
- Small values of c2 Observed frequencies close to
expected frequencies. - Large values of c2 Observed frequencies do not
agree with expected frequencies.
146- Sampling Distribution of c2
- When n is large and all expected frequencies are
greater than or equal to 5, then c2 has
approximately a chi-square distribution. - Recall
- Properties of the Chi-Square Distribution
- 1. c2 is nonnegative in value it is zero or
positively valued. - 2. c2 is not symmetrical it is skewed to the
right. - 3. c2 is distributed so as to form a family of
distributions, a separate distribution for each
different number of degrees of freedom.
147- Critical values for chi-square
- 1. See Table.
- 2. Identified by degrees of freedom (df) and the
area under the curve to the right of the critical
value. - 3. c2(df, a) critical value of a chi-square
distribution with df degrees of freedom and a
area to the right. - 4. Chi-square distribution is not symmetrical
critical values associated with right and left
tails are given separately.
148- Example Find c2(16, 0.05).
Portion of Table
c2(16, 0.05) 26.3
149- Testing Procedure
- 1. H0 The probabilities p1, p2, . . . , pk are
correct. - Ha At least two probabilities are incorrect.
-
- 2. Test statistic
- 3. Use a one-tailed critical region the
right-hand tail. - 4. Degrees of freedom df k - 1.
- 5. Expected frequencies
- 6. To ensure a good approximation to the
chi-square distribution Each expected frequency
should be at least 5
150- Example A market research firm conducted a
consumer-preference experiment to determine which
of 5 new breakfast cereals was the most appealing
to adults. A sample of 100 consumers tried each
cereal and indicated the cereal he or she
preferred. The results are given in the
following table - Is there any evidence to suggest the consumers
had a preference for one cereal, or did they
indicate each cereal was equally likely to be
selected? Use a 0.05.
151- Solution
- If no preference was shown, we expect the 100
consumers to be equally distributed among the 5
cereals. Thus, if no preference is given, we
expect (100)(0.2) 20 consumers in each class. - 1. The Set-up
- a. Population parameter of concern Preference
for each cereal, the probability that a
particular cereal is selected. - b. The null and alternative hypotheses
- H0 There was no preference shown (equally
distributed). - Ha There was a preference shown (not equally
distributed). - 2. The Hypothesis Test Criteria
- a. Assumptions The 100 consumers represent a
random sample. - b. Test statistic c2 with df k - 1 5 - 1
4 - c. Level of significance a 0.05.
152- 3. The Sample Evidence
- a. Sample information Table given in the
statement of the problem. - b. Calculate the value of the test statistic
- c2 3.2
153- 4. The Probability Distribution (Classical
Approach) - a. Critical value c2(k - 1, 0.05) c2(4, 0.05)
9.49 - b. c2 is not in the critical region.
- 4. The Probability Distribution (p-Value
Approach) - a. The p-value
- Using computer P 0.5429.
- b. The p-value is larger than the level of
significance, a. - 5. The Results
- a. Decision Fail to reject H0.
- b. Conclusion At the 0.05 level of
significance, there is no evidence to suggest
the consumers showed a preference for any one
cereal.
154- r c Contingency Table
- r number of rows c number of columns.
- Used to test the independence of the row factor
and the column factor. - Degrees of freedom
- n grand total.
- 5. Expected frequency in the ith row and the jth
column -
- Each Ei,j should be at least 5.
- 6. R1, R2, . . . , Rr and C1, C2, . . . Cc
marginal totals.
155- Contingency table showing sample results and
expected values
156- 4. The Probability Distribution (Classical
Approach) - a. Critical value c2(4, 0.01) 13.3
- b. c2 is in the critical region.
- 4. The Probability Distribution (p-Value
Approach) - a. The p-value
- By computer P 0.0068.
- b. The p-value is smaller than the level of
significance, a. - 5. The Results
- a. Decision Reject H0.
- b. Conclusion There is evidence to suggest that
opinion on tax reform and political party are
not independent.
157ANOVA
158From t to F
- In the independent samples t test, you learned
how to use the t distribution to test the
hypothesis of no difference between two
population means. - Suppose, however, that we wish to know about the
relative effect of three or more different
treatments?
159From t to F
- We could use the t test to make comparisons among
each possible combination of two means. - However, this method is inadequate in several
ways. - It is tedious to compare all possible
combinations of groups. - Any statistic that is based on only part of the
evidence (as is the case when any two groups are
compared) is less stable than one based on all of
the evidence. - There are so many comparisons that some will be
significant by chance.
160From t to F
- What we need is some kind of survey test that
will tell us whether there is any significant
difference anywhere in an array of categories. - If it tells us no, there will be no point in
searching further. - Such an overall test of significance is the F
test, or the analysis of variance, or ANOVA.
161The logic of ANOVA
- Hypothesis testing in ANOVA is about whether the
means of the samples differ more than you would
expect if the null hypothesis were true. - This question about means is answered by
analyzing variances. - Among other reasons, you focus on variances
because when you want to know how several means
differ, you are asking about the variances among
those means.
162Two Sources of Variability
- In ANOVA, an estimate of variability between
groups is compared with variability within
groups. - Between-group variation is the variation among
the means of the different treatment conditions
due to chance (random sampling error) and
treatment effects, if any exist. - Within-group variation is the variation due to
chance (random sampling error) among individuals
given the same treatment.
163Variability Between Groups
- There is a lot of variability from one mean to
the next. - Large differences between means probably are not
due to chance. - It is difficult to imagine that all six groups
are random samples taken from the same
population. - The null hypothesis is rejected, indicating a
treatment effect in at least one of the groups.
164Variability Within Groups
- Same amount of variability between group means.
- However, there is more variability within each
group. - The larger the variability within each group, the
less confident we can be that we are dealing with
samples drawn from different populations.
165The F Ratio
166Two Sources of Variability
167Two Sources of Variability
168The F Ratio
mean squares between
mean squares within
169The F Ratio
sum of squares between
sum of squares within
degrees of freedom within
degrees of freedom between
Sum of Squares
Degrees of Freedom
170The F Ratio
sum of squares total
degrees of freedom total
171The F Ratio SS Between
Find each group total, square it, and divide by
the number of subjects in the group.
Grand Total (add all of the scores together, then
square the total)
Total number of subjects.
172The F Ratio SS Within
Square each individual score and then add up all
of the squared scores.
Squared group total.
Number of subjects in each group.
173The F Ratio SS Total
Grand Total (add all of the scores together, then
square the total)
Square each score, then add all of the squared
scores together.
Total number of subjects.
174An Example ANOVA
- A study compared the intensity of pain among
three groups of treatment. - Determine the significance of the difference
among groups, using the .05 level of
significance. - Treatment 1 Treatment 2 Treatment 3
- 7 12 8
- 6 8 10
- 5 9 12
- 6 11 10
175An Example ANOVA
- State the research hypothesis.
- Do ratings of the intensity of pain differ for
the three treatments? - State the statistical hypothesis.
176Nondirectional Test
- In testing the hypothesis of no difference
between two means, a distinction was made between
directional and nondirectional alternative
hypotheses. - Such a distinction no longer makes sense when the
number of means exceeds two. - A directional test is possible only in situations
where there are only two ways (directions) that
the null hypothesis could be false. - H0 may be false in any number of ways.
- Two or more group means may be alike and the
remainder differ, all may be different, and so on.
177Degrees of Freedom
178An Example ANOVA
179An Example ANOVA
180An Example ANOVA
- Calculate the test statistic.
Grand Total 104
181An Example ANOVA
- Calculate the test statistic.
Grand Total 104
182An Example ANOVA
183An Example ANOVA
- Determine if your result is significant.
- Reject H0, 9.61gt4.26
- Interpret your results.
- There is a significant difference between the
treatments. - ANOVA Summary Table
- In the literature, the ANOVA results are often
summarized in a table.
Source df SS MS F Between Groups 2 42.67 21.34 9
.61 Within Groups 9 20 2.22 Total 11 62.67
184After the F Test
- When an F turns out to be significant, we know,
with some degree of confidence, that there is a
real difference somewhere among our means. - But if there are more than two groups, we dont
know where that difference is. - Post hoc tests have been designed for doing
pair-wise comparisons after a significant F is
obtained.
185Exercise 6 ANOVA
- A psychologist interested in artistic preference
randomly assigns a group of 15 subjects to one of
three conditions in which they view a series of
unfamiliar abstract paintings. - The 5 participants in the famous condition are
led to believe that these are each famous
paintings. - The 5 participants in the critically acclaimed
condition are led to believe that these are
paintings that are not famous but are highly
thought of by a group of professional art
critics. - The 5 in the control condition are given no
special information about the paintings. - Does what people are told about paintings make a
difference in how well they are liked? Use the
.01 level of significance.
186Linear and non-linear models
187Review linear regression
- Simplest form
- Fit a straight line through
- data points xi ,yi, i1....n, ngt2
-
- y ax b
- x predictor
- y predicted value (outcome)
- a slope
- b y-axes intercept
-
- Goal determine parameters a,b
188Review linear regression
Find values for a and b such that sum of squared
error is minimized
189Review linear regression
Predicted values yaxb Measurments y minimize
A minimum of a function (R) is characterized by a
zero first derivative with respect to the
parameters
190Intermezzo minimum of function
191Review linear regression
A minimum of a function (R) is characterized by a
zero first derivative with respect to the
parameters ? this provides the parameter values
for the model function
192Review linear regression
a
Explicit expressions for parameters a and b!!
193Linear and nonlinear models 1
- (non) linear in the parameters (a, ß, ?)
- Examples of linear models
- yaßx (linear)
- yaßx? x2 (polynomial)
- yaßlog(x) (log)
194Example
y varies linear with a for fixed x
195Example
y varies linear with a for fixed x
196Linear and nonlinear models 2
- y ß0 ß1x1 ß2x2 e
- -linear model (in parameters)
- -y is linear combination of xs
-y is not a linear combination of xs -linear in
the parameters -We can use MLR if variables are
transformed x11/x1 x2x2 y ß0 ß1x1
ß2x2 e
197Linear and nonlinear models 3
- Models like
- cannot be linearized and must be solved with
nonlinear regression techniques
198Linear and nonlinear models 4
- Nonlinear model
- At least one of the derivatives of the function
wrt the parameters depends on at least one of the
parameters (thus, slope of line at fixed x is not
constant)
y ßlog(x) dy/dß log(x) Linear model
y ß0 ß1x1 ß2x2 dy/dß1 x1 Linear model
Nonlinear model
199Significance testing and multiple testing
correction
200Multiple testing
- Say that you perform a statistical test with a
0.05 threshold, but you repeat the test on twenty
different observations. - Assume that all of the observations are
explainable by the null hypothesis. - What is the chance that at least one of the
observations will receive a p-value less than
0.05?
201Multiple testing
- Say that you perform a statistical test with a
0.05 threshold, but you repeat the test on twenty
different observations. Assuming that all of the
observations are explainable by the null
hypothesis, what is the chance that at least one
of the observations will receive a p-value less
than 0.05? - Pr(making a mistake) 0.05
- Pr(not making a mistake) 0.95
- Pr(not making any mistake) 0.9520 0.358
- Pr(making at least one mistake) 1 - 0.358
0.642 - There is a 64.2 chance of making at least one
mistake.
202Percentage sugar in candy (process 1)
Percentage sugar in candy (process 2)
no difference
statistical test (alpha0.05)
100 candy bars
100 candy bars
5 change of finding a difference (e.g. p0.003)
Suppose the company is required to do an
expensive tuning of process 2 if a difference is
found. They are willing to accept an Type 1 error
of 5. Thus only 5 of making wrong decision.
203Percentage sugar in candy (process 1)
Percentage sugar in candy (process 2)
no difference
Day 1
statistical test (alpha0.05)
Day 2
statistical test (alpha0.05)
Change of 64.2 of finding at least
one significant difference Overall Type 1 error
64.2
Day 20
statistical test (alpha0.05)
204Bonferroni correction
- Assume that individual tests are independent.
- Divide the desired p-value threshold by the
number of tests performed. - For the previous example, 0.05 / 20 0.0025.
- Pr(making a mistake) 0.0025
- Pr(not making a mistake) 0.9975
- Pr(not making any mistake) 0.997520 0.9512
- Pr(making at least one mistake) 1 - 0.9512
0.0488 - meaning that the probability of one of the total
number of tests being wrongfully said to be
significantly different is of magnitude alpha
(0.0488) -
- This is also known as correcting for the Family
Wise Error (FWE). It is clear though that this
highly increases the beta error (false negative),
which is that many tests that should show an
effect get below the corrected threshold.
205Percentage sugar in candy (process 1)
Percentage sugar in candy (process 2)
no difference