Title: Sampling and estimation 2
1Sampling and estimation 2
- Tron Anders Moger
- 27.09.2006
2Confidence intervals (rep.)
- Assume that X1 ,..,Xn are a random sample from a
normal distribution - Recall that has expected value ? and variance
?2/n - The interval 1.96?/vn is called a 95
confidence interval for ? - Means that the interval will contain the
population mean 95 of the time - Often interpreted as if we are 95 certain that
the population mean lies in this interval
3Hypothesis testing (rep.)
- Have a data sample
- Would like to test if there is evidence that a
parameter value calculated from the data is
different from the value in a null hypothesis H0 - If so, means that H0 is rejected in favour of
some alternative H1 - Have to construct a test statistic
- It must
- Have a higher probability for extreme values
under H1 than under H0 - Have a known distribution under H0 (when simple)
4Two important quantities
- P-value probability of the observed value
or something more extreme
assuming null hypothesis - Significance level a the value at which
we reject H0 - If the value of the test statistic is too
extreme, then H0 is rejected - P-value0.05 We want the probability that the
observed difference is due to chance to be below
5, or, equivalently - We want to be 95 sure that we do not reject H0
when it is true in reality
5Note
- There is an asymmetry between H0 and H1 In fact,
if the data is inconclusive, we end up not
rejecting H0. - If H0 is true the probability to reject H0 is
(say) 5. That DOES NOT MEAN we are 95 certain
that H0 is true! - How much evidence we have for choosing H1 over H0
depends entirely on how much more probable
rejection is if H1 is true.
6Errors of types I and II
- The above can be seen as a decision rule for H0
or H1. - For any such rule we can compute (if both H0 and
H1 are simple hypotheses)
Power 1 - ß
1-a
H0 true
H1 true
Accept H0
TYPE II error
Reject H0
TYPE I error
ß
Significance level a
7Sample size computations
- For a sample from a normal population with known
variance, the size of the conficence interval for
the mean depends only on the sample size. - So we can compute the necessary sample size to
match a required accuracy - Note If the variance is unknown, it must somehow
be estimated on beforehand to do the computation - Works also for population proportion estimation,
giving an inequality for the required sample size
8Power computations
- If you reject H0, you know very little about the
evidence for H1 versus H0 unless you study the
power of the test. - The power is 1 minus the probability of rejecting
H0 given that a hypothesis in H1 is true (1-ß). - Thus it is a function of the possible hypotheses
in H1. - We would like our tests to have as high power as
possible.
9Example 1 Normal distribution with unknown
variance
- Assume
- Then
- Thus
- So a confidence interval for , with
significance is given by
10Example 1 (Hypothesis testing)
- Hypotheses
- Test statistic under H0
- Reject H0 if or if
- Alternatively, the p-value for the test can be
computed (if ) as the such that
11Example 1 (cont.)
- Hypotheses
- Test statistic assuming
- Reject H0 if
- Alternatively, the p-value for the test can be
computed as the such that
12Energy intake in kJ
- SUBJECT INTAKE 1 5260 2
5470 3 5640 4 6180 5
6390 6 6515 7 6805
8 7515 9 7515 10 8230
11 8770
Recommended energy intake 7725kJ Want to test if
it applies to the 11 women H0 ? (mean energy
intake)7725 H1 ??7725
13From Explore in SPSS
14Test result
-
- This quantity is t-distributed with 10 degrees of
freedom (number of subjects -1) - Choose significance level a0.05
- From table 8 p.870 in the book,
t11-1,0.05/22.262 - If the H0 is true, the interval (-2.262, 2.262)
covers 95 of the distribution - Reject H0 since the test statistic is outside the
interval, or, equivalently, because - Cant find exact p-value from the table
- Could have had a0.01 or 0.1, but 0.05 is most
common
15In SPSS Analyze - Compare means - One-sample t
testTest variable intakeTest value 7725
16Differences between means
- Assume and
- , all data
independent - We would like to study the difference ?x-?y
- Three different cases
- Matched pairs
- Unknown but equal population variances
- Unknown and possibly different pop. variances
17Matched pairs
- Common situation Several measurements on each
individual, or on closely related objects - These measurements will not be independent (why?)
- Generally a problem in statistics, but simple if
you only have two measurements - The key is to use the difference between the
means, instead of each mean seperately
18Example 2 Matched pairs
- In practice, the basis is that ?x-?y0
- Set and
- We get
- Where
- Confidence interval for ?x-?y
19Example 2 (Hypothesis testing)
- Hypotheses
- Test statistic
- Reject H0 if or if
Matched pairs T test
20Example Energy intake kJ
- SUBJECT PREMENST POSTMENS
- 1 5260.0 3910.0
- 2 5470.0 4220.0
- 3 5640.0 3885.0
- 4 6180.0 5160.0
- 5 6390.0 5645.0
- 6 6515.0 4680.0
- 7 6805.0 5265.0
- 8 7515.0 5975.0
- 9 7515.0 6790.0
- 10 8230.0 6900.0
- 11 8770.0 7335.0
- Number of cases read 11 Number of cases
listed 11
Want to test if energy intake is different
before and after menstruation. H0 ?premenst
?postmenst H1 ?premenst? ?postmenst
21Confidence interval and p-values for paired
t-tests in SPSS
- Analyze - Compare Means -Paired-Samples T
Test. - Click on the two variabels you want to test, and
move them to Paired variables - Conclusion Reject H0 on 5 sig. level
22Example 3 Unknown but equal population variances
- We get
- where
- Confidence interval for
23Example 3 (Hypothesis testing)
- Hypotheses
- Test statistic
- Reject H0 if or if
T test with equal variances
24Assumptions
- Independence All observations are independent.
Achieved by taking random samples of individuals
for paired t-test independence is achieved by
using the difference between measurements - Normally distributed data (Check histograms,
tests for normal distribution, Q-Q plots) - Equal variance or standard deviations in the
groups - Assumptions can be checked in histograms, box
plots etc. (or tests for normality) - What if the variances are unequal?
25Example 4 Unknown and possibly unequal
population variances
- We get
- where
- Conf. interval for
26Example 4 (Hypothesis testing)
- Hypotheses
- Test statistic
- Reject H0 if or if
T test with unequal variances
27Example 5 The variance of a normal distribution
- Assume
- Then
- Thus
- Confidence interval for
28Example 5 Comparing variances for normal
distributions
- Assume
- We get
- Fnx-1,ny-1 is an F distribution with nx-1 and
ny-1 degrees of freedom - We can use this exactly as before to obtain a
confidence interval for and for testing
for example if - Note The assumption of normality is crucial!
29 ID GROUP ENERGY 1 0
6.13 2 0 7.05 .... 12
0 10.15 13 0 10.88
14 1 8.79 15 1 9.19
.... 21 1 11.85 22
1 12.79 Number of cases read 22 Number
of cases listed 22
Example Energy expenditure in two groups, lean
and obese. Want to test if there is any
difference. H0 ?lean ?obese H1 ?lean? ?obese
30In SPSS
- Analyze - Compare Means - Independent-Samples
T Test - Move Energy to Test-variable
- Move Group to Grouping variableClick Define
Groups and write 0 and 1 for the two groups
31Output
Above 0.05 Read first line (Equal variances
assumed) Otherwise Read second line (Equal
variances not assumed)
32Conclusion
- The observed mean for the lean was 8.1, and for
the obese 10.3 (mean difference -2.2, 95
confidence interval for the difference (-3.4,
-1.1)) - The difference between the groups was significant
on a 5-level (since the CI does not include the
value 0) - The p-value was 0.001.
- H0 is rejected
33Example 6 Population proportions
- Assume , so that
is a frequency. - Then
- Thus
- Thus
- Confidence interval for P
(approximately, for large n)
(approximately, for large n)
34Example 6 (Hypothesis testing)
- Hypotheses H0PP0 H1P?P0
- Test statistic
under H0, for large n - Reject H0 if
or if -
35Example 7 Differences between population
proportions
- Assume and
, so that and are
frequencies - Then
- Confidence interval for P1-P2
(approximately)
36Example 7 (Hypothesis testing)
- Hypotheses H0P1P2 H1P1?P2
- Test statistic
- where
- Reject H0 if
-
37- Spontanous abortions among nurses helping with
operations and other nurses - Want to test if there is difference between the
proportions of abortions in the two groups - H0 Pop.nursesPothers H1 Pop.nurses?Pothers
38Calculation
- P10.278 P20.088 n136 n234
- z
- P-value 0.04144.1, reject H0 on 5-sig.level
(cant do this in SPSS) - 95 confidence interval for P1-P2
39Next week
- Next lecture will be about modelling
relationships between continuous variables - Linear regression