Sampling and estimation 2

1 / 39

About This Presentation

Title:

Sampling and estimation 2

Description:

Confidence interval and p-values for paired t-tests in SPSS. Analyze - Compare Means ... z= P-value 0.0414=4.1%, reject H0 on 5%-sig.level (can't do this in SPSS) ... –

Number of Views:18

Avg rating:3.0/5.0

Slides: 40

Provided by: uio

Category:

more less

Transcript and Presenter's Notes

Title: Sampling and estimation 2

1
Sampling and estimation 2

Tron Anders Moger
27.09.2006

2
Confidence intervals (rep.)

Assume that X1 ,..,Xn are a random sample from a
normal distribution
Recall that has expected value ? and variance
?2/n
The interval 1.96?/vn is called a 95
confidence interval for ?
Means that the interval will contain the
population mean 95 of the time
Often interpreted as if we are 95 certain that
the population mean lies in this interval

3
Hypothesis testing (rep.)

Have a data sample
Would like to test if there is evidence that a
parameter value calculated from the data is
different from the value in a null hypothesis H0
If so, means that H0 is rejected in favour of
some alternative H1
Have to construct a test statistic
It must
Have a higher probability for extreme values
under H1 than under H0
Have a known distribution under H0 (when simple)

4
Two important quantities

P-value probability of the observed value
or something more extreme
assuming null hypothesis
Significance level a the value at which
we reject H0
If the value of the test statistic is too
extreme, then H0 is rejected
P-value0.05 We want the probability that the
observed difference is due to chance to be below
5, or, equivalently
We want to be 95 sure that we do not reject H0
when it is true in reality

5
Note

There is an asymmetry between H0 and H1 In fact,
if the data is inconclusive, we end up not
rejecting H0.
If H0 is true the probability to reject H0 is
(say) 5. That DOES NOT MEAN we are 95 certain
that H0 is true!
How much evidence we have for choosing H1 over H0
depends entirely on how much more probable
rejection is if H1 is true.

6
Errors of types I and II

The above can be seen as a decision rule for H0
or H1.
For any such rule we can compute (if both H0 and
H1 are simple hypotheses)

Power 1 - ß
1-a
H0 true
H1 true
Accept H0
TYPE II error
Reject H0
TYPE I error
ß
Significance level a
7
Sample size computations

For a sample from a normal population with known
variance, the size of the conficence interval for
the mean depends only on the sample size.
So we can compute the necessary sample size to
match a required accuracy
Note If the variance is unknown, it must somehow
be estimated on beforehand to do the computation
Works also for population proportion estimation,
giving an inequality for the required sample size

8
Power computations

If you reject H0, you know very little about the
evidence for H1 versus H0 unless you study the
power of the test.
The power is 1 minus the probability of rejecting
H0 given that a hypothesis in H1 is true (1-ß).
Thus it is a function of the possible hypotheses
in H1.
We would like our tests to have as high power as
possible.

9
Example 1 Normal distribution with unknown
variance

Assume
Then
Thus
So a confidence interval for , with
significance is given by

10
Example 1 (Hypothesis testing)

Hypotheses
Test statistic under H0
Reject H0 if or if
Alternatively, the p-value for the test can be
computed (if ) as the such that

11
Example 1 (cont.)

Hypotheses
Test statistic assuming
Reject H0 if
Alternatively, the p-value for the test can be
computed as the such that

12
Energy intake in kJ

SUBJECT INTAKE 1 5260 2
5470 3 5640 4 6180 5
6390 6 6515 7 6805
8 7515 9 7515 10 8230
11 8770

Recommended energy intake 7725kJ Want to test if
it applies to the 11 women H0 ? (mean energy
intake)7725 H1 ??7725
13
From Explore in SPSS
14
Test result

This quantity is t-distributed with 10 degrees of
freedom (number of subjects -1)
Choose significance level a0.05
From table 8 p.870 in the book,
t11-1,0.05/22.262
If the H0 is true, the interval (-2.262, 2.262)
covers 95 of the distribution
Reject H0 since the test statistic is outside the
interval, or, equivalently, because
Cant find exact p-value from the table
Could have had a0.01 or 0.1, but 0.05 is most
common

15
In SPSS Analyze - Compare means - One-sample t
testTest variable intakeTest value 7725
16
Differences between means

Assume and
, all data
independent
We would like to study the difference ?x-?y
Three different cases
Matched pairs
Unknown but equal population variances
Unknown and possibly different pop. variances

17
Matched pairs

Common situation Several measurements on each
individual, or on closely related objects
These measurements will not be independent (why?)
Generally a problem in statistics, but simple if
you only have two measurements
The key is to use the difference between the
means, instead of each mean seperately

18
Example 2 Matched pairs

In practice, the basis is that ?x-?y0
Set and
We get
Where
Confidence interval for ?x-?y

19
Example 2 (Hypothesis testing)

Hypotheses
Test statistic
Reject H0 if or if

Matched pairs T test
20
Example Energy intake kJ

SUBJECT PREMENST POSTMENS
1 5260.0 3910.0
2 5470.0 4220.0
3 5640.0 3885.0
4 6180.0 5160.0
5 6390.0 5645.0
6 6515.0 4680.0
7 6805.0 5265.0
8 7515.0 5975.0
9 7515.0 6790.0
10 8230.0 6900.0
11 8770.0 7335.0
Number of cases read 11 Number of cases
listed 11

Want to test if energy intake is different
before and after menstruation. H0 ?premenst
?postmenst H1 ?premenst? ?postmenst
21
Confidence interval and p-values for paired
t-tests in SPSS

Analyze - Compare Means -Paired-Samples T
Test.
Click on the two variabels you want to test, and
move them to Paired variables
Conclusion Reject H0 on 5 sig. level

22
Example 3 Unknown but equal population variances

We get
where
Confidence interval for

23
Example 3 (Hypothesis testing)

Hypotheses
Test statistic
Reject H0 if or if

T test with equal variances
24
Assumptions

Independence All observations are independent.
Achieved by taking random samples of individuals
for paired t-test independence is achieved by
using the difference between measurements
Normally distributed data (Check histograms,
tests for normal distribution, Q-Q plots)
Equal variance or standard deviations in the
groups
Assumptions can be checked in histograms, box
plots etc. (or tests for normality)
What if the variances are unequal?

25
Example 4 Unknown and possibly unequal
population variances

We get
where
Conf. interval for

26
Example 4 (Hypothesis testing)

Hypotheses
Test statistic
Reject H0 if or if

T test with unequal variances
27
Example 5 The variance of a normal distribution

Assume
Then
Thus
Confidence interval for

28
Example 5 Comparing variances for normal
distributions

Assume
We get
Fnx-1,ny-1 is an F distribution with nx-1 and
ny-1 degrees of freedom
We can use this exactly as before to obtain a
confidence interval for and for testing
for example if
Note The assumption of normality is crucial!

29
ID GROUP ENERGY 1 0
6.13 2 0 7.05 .... 12
0 10.15 13 0 10.88
14 1 8.79 15 1 9.19
.... 21 1 11.85 22
1 12.79 Number of cases read 22 Number
of cases listed 22
Example Energy expenditure in two groups, lean
and obese. Want to test if there is any
difference. H0 ?lean ?obese H1 ?lean? ?obese
30
In SPSS

Analyze - Compare Means - Independent-Samples
T Test
Move Energy to Test-variable
Move Group to Grouping variableClick Define
Groups and write 0 and 1 for the two groups

31
Output
Above 0.05 Read first line (Equal variances
assumed) Otherwise Read second line (Equal
variances not assumed)
32
Conclusion

The observed mean for the lean was 8.1, and for
the obese 10.3 (mean difference -2.2, 95
confidence interval for the difference (-3.4,
-1.1))
The difference between the groups was significant
on a 5-level (since the CI does not include the
value 0)
The p-value was 0.001.
H0 is rejected

33
Example 6 Population proportions

Assume , so that
is a frequency.
Then
Thus
Thus
Confidence interval for P

(approximately, for large n)
(approximately, for large n)
34
Example 6 (Hypothesis testing)

Hypotheses H0PP0 H1P?P0
Test statistic
under H0, for large n
Reject H0 if
or if

35
Example 7 Differences between population
proportions

Assume and
, so that and are
frequencies
Then
Confidence interval for P1-P2

(approximately)
36
Example 7 (Hypothesis testing)

Hypotheses H0P1P2 H1P1?P2
Test statistic
where
Reject H0 if

Spontanous abortions among nurses helping with
operations and other nurses
Want to test if there is difference between the
proportions of abortions in the two groups
H0 Pop.nursesPothers H1 Pop.nurses?Pothers

38
Calculation