Sampling and estimation 2 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Sampling and estimation 2

Description:

Confidence interval and p-values for paired t-tests in SPSS. Analyze - Compare Means ... z= P-value 0.0414=4.1%, reject H0 on 5%-sig.level (can't do this in SPSS) ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 40
Provided by: uio
Category:

less

Transcript and Presenter's Notes

Title: Sampling and estimation 2


1
Sampling and estimation 2
  • Tron Anders Moger
  • 27.09.2006

2
Confidence intervals (rep.)
  • Assume that X1 ,..,Xn are a random sample from a
    normal distribution
  • Recall that has expected value ? and variance
    ?2/n
  • The interval 1.96?/vn is called a 95
    confidence interval for ?
  • Means that the interval will contain the
    population mean 95 of the time
  • Often interpreted as if we are 95 certain that
    the population mean lies in this interval

3
Hypothesis testing (rep.)
  • Have a data sample
  • Would like to test if there is evidence that a
    parameter value calculated from the data is
    different from the value in a null hypothesis H0
  • If so, means that H0 is rejected in favour of
    some alternative H1
  • Have to construct a test statistic
  • It must
  • Have a higher probability for extreme values
    under H1 than under H0
  • Have a known distribution under H0 (when simple)

4
Two important quantities
  • P-value probability of the observed value
    or something more extreme
    assuming null hypothesis
  • Significance level a the value at which
    we reject H0
  • If the value of the test statistic is too
    extreme, then H0 is rejected
  • P-value0.05 We want the probability that the
    observed difference is due to chance to be below
    5, or, equivalently
  • We want to be 95 sure that we do not reject H0
    when it is true in reality

5
Note
  • There is an asymmetry between H0 and H1 In fact,
    if the data is inconclusive, we end up not
    rejecting H0.
  • If H0 is true the probability to reject H0 is
    (say) 5. That DOES NOT MEAN we are 95 certain
    that H0 is true!
  • How much evidence we have for choosing H1 over H0
    depends entirely on how much more probable
    rejection is if H1 is true.

6
Errors of types I and II
  • The above can be seen as a decision rule for H0
    or H1.
  • For any such rule we can compute (if both H0 and
    H1 are simple hypotheses)

Power 1 - ß
1-a
H0 true
H1 true
Accept H0
TYPE II error
Reject H0
TYPE I error
ß
Significance level a
7
Sample size computations
  • For a sample from a normal population with known
    variance, the size of the conficence interval for
    the mean depends only on the sample size.
  • So we can compute the necessary sample size to
    match a required accuracy
  • Note If the variance is unknown, it must somehow
    be estimated on beforehand to do the computation
  • Works also for population proportion estimation,
    giving an inequality for the required sample size

8
Power computations
  • If you reject H0, you know very little about the
    evidence for H1 versus H0 unless you study the
    power of the test.
  • The power is 1 minus the probability of rejecting
    H0 given that a hypothesis in H1 is true (1-ß).
  • Thus it is a function of the possible hypotheses
    in H1.
  • We would like our tests to have as high power as
    possible.

9
Example 1 Normal distribution with unknown
variance
  • Assume
  • Then
  • Thus
  • So a confidence interval for , with
    significance is given by

10
Example 1 (Hypothesis testing)
  • Hypotheses
  • Test statistic under H0
  • Reject H0 if or if
  • Alternatively, the p-value for the test can be
    computed (if ) as the such that

11
Example 1 (cont.)
  • Hypotheses
  • Test statistic assuming
  • Reject H0 if
  • Alternatively, the p-value for the test can be
    computed as the such that

12
Energy intake in kJ
  • SUBJECT INTAKE 1 5260 2
    5470 3 5640 4 6180 5
    6390 6 6515 7 6805
    8 7515 9 7515 10 8230
    11 8770

Recommended energy intake 7725kJ Want to test if
it applies to the 11 women H0 ? (mean energy
intake)7725 H1 ??7725
13
From Explore in SPSS
14
Test result
  • This quantity is t-distributed with 10 degrees of
    freedom (number of subjects -1)
  • Choose significance level a0.05
  • From table 8 p.870 in the book,
    t11-1,0.05/22.262
  • If the H0 is true, the interval (-2.262, 2.262)
    covers 95 of the distribution
  • Reject H0 since the test statistic is outside the
    interval, or, equivalently, because
  • Cant find exact p-value from the table
  • Could have had a0.01 or 0.1, but 0.05 is most
    common

15
In SPSS Analyze - Compare means - One-sample t
testTest variable intakeTest value 7725
16
Differences between means
  • Assume and
  • , all data
    independent
  • We would like to study the difference ?x-?y
  • Three different cases
  • Matched pairs
  • Unknown but equal population variances
  • Unknown and possibly different pop. variances

17
Matched pairs
  • Common situation Several measurements on each
    individual, or on closely related objects
  • These measurements will not be independent (why?)
  • Generally a problem in statistics, but simple if
    you only have two measurements
  • The key is to use the difference between the
    means, instead of each mean seperately

18
Example 2 Matched pairs
  • In practice, the basis is that ?x-?y0
  • Set and
  • We get
  • Where
  • Confidence interval for ?x-?y

19
Example 2 (Hypothesis testing)
  • Hypotheses
  • Test statistic
  • Reject H0 if or if

Matched pairs T test
20
Example Energy intake kJ
  • SUBJECT PREMENST POSTMENS
  • 1 5260.0 3910.0
  • 2 5470.0 4220.0
  • 3 5640.0 3885.0
  • 4 6180.0 5160.0
  • 5 6390.0 5645.0
  • 6 6515.0 4680.0
  • 7 6805.0 5265.0
  • 8 7515.0 5975.0
  • 9 7515.0 6790.0
  • 10 8230.0 6900.0
  • 11 8770.0 7335.0
  • Number of cases read 11 Number of cases
    listed 11

Want to test if energy intake is different
before and after menstruation. H0 ?premenst
?postmenst H1 ?premenst? ?postmenst
21
Confidence interval and p-values for paired
t-tests in SPSS
  • Analyze - Compare Means -Paired-Samples T
    Test.
  • Click on the two variabels you want to test, and
    move them to Paired variables
  • Conclusion Reject H0 on 5 sig. level

22
Example 3 Unknown but equal population variances
  • We get
  • where
  • Confidence interval for

23
Example 3 (Hypothesis testing)
  • Hypotheses
  • Test statistic
  • Reject H0 if or if

T test with equal variances
24
Assumptions
  • Independence All observations are independent.
    Achieved by taking random samples of individuals
    for paired t-test independence is achieved by
    using the difference between measurements
  • Normally distributed data (Check histograms,
    tests for normal distribution, Q-Q plots)
  • Equal variance or standard deviations in the
    groups
  • Assumptions can be checked in histograms, box
    plots etc. (or tests for normality)
  • What if the variances are unequal?

25
Example 4 Unknown and possibly unequal
population variances
  • We get
  • where
  • Conf. interval for

26
Example 4 (Hypothesis testing)
  • Hypotheses
  • Test statistic
  • Reject H0 if or if

T test with unequal variances
27
Example 5 The variance of a normal distribution
  • Assume
  • Then
  • Thus
  • Confidence interval for

28
Example 5 Comparing variances for normal
distributions
  • Assume
  • We get
  • Fnx-1,ny-1 is an F distribution with nx-1 and
    ny-1 degrees of freedom
  • We can use this exactly as before to obtain a
    confidence interval for and for testing
    for example if
  • Note The assumption of normality is crucial!

29
ID GROUP ENERGY 1 0
6.13 2 0 7.05 .... 12
0 10.15 13 0 10.88
14 1 8.79 15 1 9.19
.... 21 1 11.85 22
1 12.79 Number of cases read 22 Number
of cases listed 22
Example Energy expenditure in two groups, lean
and obese. Want to test if there is any
difference. H0 ?lean ?obese H1 ?lean? ?obese
30
In SPSS
  • Analyze - Compare Means - Independent-Samples
    T Test
  • Move Energy to Test-variable
  • Move Group to Grouping variableClick Define
    Groups and write 0 and 1 for the two groups

31
Output
Above 0.05 Read first line (Equal variances
assumed) Otherwise Read second line (Equal
variances not assumed)
32
Conclusion
  • The observed mean for the lean was 8.1, and for
    the obese 10.3 (mean difference -2.2, 95
    confidence interval for the difference (-3.4,
    -1.1))
  • The difference between the groups was significant
    on a 5-level (since the CI does not include the
    value 0)
  • The p-value was 0.001.
  • H0 is rejected

33
Example 6 Population proportions
  • Assume , so that
    is a frequency.
  • Then
  • Thus
  • Thus
  • Confidence interval for P

(approximately, for large n)
(approximately, for large n)
34
Example 6 (Hypothesis testing)
  • Hypotheses H0PP0 H1P?P0
  • Test statistic
    under H0, for large n
  • Reject H0 if
    or if

35
Example 7 Differences between population
proportions
  • Assume and
    , so that and are
    frequencies
  • Then
  • Confidence interval for P1-P2

(approximately)
36
Example 7 (Hypothesis testing)
  • Hypotheses H0P1P2 H1P1?P2
  • Test statistic
  • where
  • Reject H0 if

37
  • Spontanous abortions among nurses helping with
    operations and other nurses
  • Want to test if there is difference between the
    proportions of abortions in the two groups
  • H0 Pop.nursesPothers H1 Pop.nurses?Pothers

38
Calculation
  • P10.278 P20.088 n136 n234
  • z
  • P-value 0.04144.1, reject H0 on 5-sig.level
    (cant do this in SPSS)
  • 95 confidence interval for P1-P2

39
Next week
  • Next lecture will be about modelling
    relationships between continuous variables
  • Linear regression
Write a Comment
User Comments (0)
About PowerShow.com