Title: Statistical Inference
1 Biostatistics Academic Preview Session 3
Statistical Inference
2Outline
- The what and why of statistical inference
- Statistical estimation and confidence intervals
- Statistical significance tests
3Statistical Inference
4Statistical Inference
- Estimation the process by which sample data are
used to indicate the value of an unknown quantity
in the population. Results can be expressed as - Point estimate
- Confidence intervals
- Significance tests
- P-values
5Statistical Estimation
6Confidence intervals (cont)
- A range of values in which a population parameter
may lie is a confidence interval. - The probability that a particular value lies
within this interval is called a level of
confidence.
790 and 95 confidence intervals for ?
8Confidence intervals (cont)
- Formula
- Estimate margin of error
- Steps
- Calculate the sample statistic to use as an
estimate of the population parameter - Calculate the margin of error based on the
distribution of sample statistic - Calculate the lower (LL) and the upper limits
(UL) of the confidence interval - Write out the confidence interval
-
LL lt population parameter lt UL
LL, UL
9Understanding Confidence Intervals
- A confidence interval for a population parameter
is a set of plausible values of the parameter
that could have generated the observed data as a
likely outcome - The level of confidence tells the probability the
method produced an interval that includes the
unknown parameter - Narrow widths and high confidence levels are
desirable, but these two things affect each other
Higher confidencewider confidence limits (all
else held constant)
10Case Study 1
- Construct a 95 confidence interval for the mean
birth weight in grams - Construct a 95 confidence interval for the
percent of women who deliver low birth weight
babies
11Case Study 1 Example 1
- 95 Confidence interval for the mean birth weight
in grams - Calculate
- Calculate the margin of error m104.6
- Calculate the lower and upper limit of the
confidence interval - LL2944.7-104.62840.1
- UL29447104.63049.3
- Confidence interval
- 2840.1, 3049.3
- 2840.1 lt µ lt 3049.3
I am 95 confident that the mean birth weight for
omen in Springfield Massachusetts in 1986 is
between 2840.1 and 3049.3
12Case Study 1 Example 2
- 95 confidence interval for the percent of women
who deliver low birth weight babies - Calculate
- Calculate the margin of error m0.066
- Calculate the lower and upper limit of the
confidence interval - LL0.312-0.0660.246
- UL0.3120.0660.378
- Confidence interval 0.246, 0.378
I am 95 confident that the proportion of women
giving birth to low birth weight babies in
Springfield Massachusetts in 1986 is between
0.246 and 0.378
13Case study 1
- In order to determine if
- There is a difference in the mean birth weight
between smokers and non-smokers - There is a difference in the proportion of low
birth weight babies between white women and black
women - We need to calculate a confidence interval around
the difference in the estimates between the two
groups in the aforementioned examples
14Confidence intervals (cont)
- Formula
- Estimate (group1) estimate (group 2) margin
of error - Steps
- Calculate the sample statistics to use as an
estimate of the population parameter - Calculate the margin of error based on the
distributions of the sample statistic - Calculate the lower (LL) and the upper limits
(UL) of the confidence interval - Write out the confidence interval
- LL lt population parameter lt UL or LL, UL
15Case Study 1 Example 3
- 95 confidence interval for the difference in the
percent of women who deliver low birth weight
babies between whites and blacks - Calculate
- Calculate the margin of error m.2085
- Calculate the lower and upper limit of the
confidence interval - LL-.1835-.2085-.392
- UL-.1835.2085.025
- Confidence interval -.392, .025
I am 95 confident that the difference in the
proportion of white women and black women giving
birth to low birth weight babies in Springfield
Massachusetts in 1986 is between -.392 and .025
16Significance /Hypothesis Testing
- Statistical inference that allows one to test a
claim about a population parameter. - Using information from your study sample you can
test any desired claim
Next set of slides drafted from the following
reference Elementary Statistics by Larson and
Farber, 2nd edition
17A statistical hypothesis is a claim about a
population.
Alternative hypothesis Ha contains a statement
of inequality such as lt , ¹ or gt
Null hypothesis H0 contains a statement of
equality such as ³ , or .
If I am false, you are true
If I am false, you are true
18Writing Hypotheses
Write the claim about the population. Then, write
its complement. Either hypothesis, the null or
the alternative, can represent the claim.
- A hospital claims its ambulance response time is
less than 10 minutes.
H0
Ha
- A consumer magazine claims the proportion of
cell phone calls made during evenings and
weekends is at most 60.
H0
Ha
19Hypothesis Test Strategy
- Begin by assuming the equality condition in the
null hypothesis is true. This is regardless of
whether the claim is represented by the null
hypothesis or by the alternative hypothesis,
- Collect data from a random sample taken from the
population and calculate the necessary sample
statistics.
- If the sample statistic has a low probability of
being drawn - from a population in which the null hypothesis is
true, you will - reject H0 . (As a consequence, you will support
the alternative - hypothesis.)
- If the probability is not low enough, fail to
reject H0 .
20Underlying Rationale of Hypotheses Testing
- If, under a given observed assumption, the
probability of getting the sample is
exceptionally small, we conclude that the
assumption is probably not correct. - When testing a claim, we make an assumption
(null hypothesis) that contains equality. We then
compare the assumption and the sample results and
we form one of the following conclusions
21Underlying Rationale of Hypotheses Testing
- If the sample results can easily occur when the
assumption (null hypothesis) is true, we
attribute the relatively small discrepancy
between the assumption and the sample results to
chance. - If the sample results cannot easily occur when
that assumption (null hypothesis) is true, we
explain the relatively large discrepancy between
the assumption and the sample by concluding that
the assumption is not true.
22Errors and Level of Significance
Actual Truth of H0
H0 True
H0 False
Do not reject H0
Type II Error
Decision
Type I Error
Reject H0
A type I error Null hypothesis is actually true
but the decision is to reject it.
Level of significance, a Maximum probability of
committing a type I error.
23Types of Hypothesis Tests
Right-tail test Ha m gt value
Left-tail test Ha m lt value
Two-tail test Ha m ¹ value
Hypothesis tests can also be one or two sampled
24P-Values
The P-value is the probability of obtaining a
sample statistic with a value as extreme or more
extreme than the one determined by the sample
data.
P-value indicated area
If z is negative, twice the area in the left tail
If z is positive twice the area in the right tail
25Test Decisions with P-values
The decision about whether there is enough
evidence to reject the null hypothesis can be
made by comparing the P-value to the value of
a, the level of significance of the test.
If P ? a reject the null hypothesis
If P gt a fail to reject the null hypothesis
26Understanding p-values
- A p-value tells you the chance of getting a
statistics as extreme or more extreme than the
one calculated for the sample - measures the strength of evidence against the
null hypothesis - The smaller the p-value, the more convincing the
evidence is against the null hypothesis - A p-value is not
- the probability that the null hypothesis is
true/false - The effect size
- The significance of results
-
27Interpreting the Decision
Claim
Claim is H0
Claim is Ha
There is enough evidence to reject the claim.
There is enough evidence to support the claim.
Reject H0
Decision
There is not enough evidence to reject the claim.
There is not enough evidence to support the claim.
Fail to reject H0
28Steps in a Hypothesis Test
1. Write the null and alternative hypothesis
Write H0 and Ha as mathematical statements.
Remember H0 always contains the symbol.
2. State the level of significance
This is the maximum probability of rejecting the
null hypothesis when it is actually true. (Making
a type I error.)
3. Identify the sampling distribution
The sampling distribution is the distribution for
the test statistic assuming that the equality
condition in H0 is true and that the experiment
is repeated an infinite number of times.
294. Find the test statistic
Perform the calculations to standardize your
sample statistic.
5. Calculate the P-value for the test statistic
This is the probability of obtaining your test
statistic or one that is more extreme from the
sampling distribution.
306. Make your decision
If the P-value is less than ? (the level of
significance) reject H0. If the P value is
greater ?, fail to reject H0.
7. Interpret your decision
- If the claim is the null hypothesis you will
either reject - the claim or determine there is not enough
evidence to - reject the claim.
- If the claim is the alternative hypothesis you
will either support the claim or determine there
is not enough - evidence to support the claim.
31Case study 1
- Perform a hypothesis test to determine if the
proportion of women giving birth to low birth
weight babies is less than .50. - Perform a hypothesis test to determine if there
is a difference in the mean birth weight between
smokers and non-smokers
32Case Study 1 Example 4
- Step 1 Write out the null and alternative
hypothesis - Ho p.05
- Ha plt.05
- Step 2 level of significance a.05
- Step 3 skip (normal distribution)
- Step 4 test statistic z.5.16
- Step 5 p-valuelt.0001
- Step 6 Decision-we reject Ho because
p-valueltalpha - Step 7 Conclusion There is strong evidence to
suggest that the percent of mothers giving birth
to low birth weight babies in Springfield
Massachusetts in 1986 is less than .50
33Case Study 1 Example 5
- Step 1 Write out the null and alternative
hypothesis - Ho µsmokers- µnon-smokers0
- Ha µsmokers- µnon-smokers ?0
- Step 2 level of significance a.05
- Step 3 skip (t-distribution)
- Step 4 test statistic t-2.634
- Step 5 p-value.0092
- Step 6 Decision-we reject Ho because
p-valueltalpha - Step 7 Conclusion There is strong evidence to
suggest that there is a difference in the mean
birth weight between smokers and non-smokers for
women in Springfield Massachusetts in 1986.
34Next session
- Results of our in class survey