Title: Biostatistics and Computer Applications
1Biostatistics and Computer Applications
- Parameter Estimation
- Confidence interval
- Sample size estimation
- Inference for variance
- SAS Programming
- 1/2/2003
2Recap (Hypothesis test)
- Steps in hypothesis test
- Regions of acceptance and rejection
- One tailed and two-tailed test
- Type I error and Type II error
- One sample hypothesis test
- Two sample independent test
- Two paired sample test
3Recap (Region of acceptance and rejection)
Accept H0
Reject H0
Reject H0
X-bar
-1.96 0 1.96
z
4Recap (Hypothesis test)
- One sample
- Two samples (independent)
- Paired t-test
5Interval estimation
- According to the sampling distribution, set an
interval for parameter (
of a population that the
probability is within the interval is
, i.e. - L1 and L2 are parameter s
confidence limits (CL) L1, L2 is confidence
interval (CI) is the confidence
coefficient. -
6Confidence Interval
- A Confidence Interval is a range (or an interval)
of values that is likely to contain the true
value of the population parameter (e.g., mean,
standard deviation). - Influenced by degree of confidence (1- ?)
- Balance between precision (as reflected in the
width of the CI) and reliability (as expressed by
the degree of confidence). Common choices are
95 and 99.
7Calculating the Confidence Interval (CI) of a
Mean
- Z distribution
- Considering sample mean
- In general,
8Interpreting the Confidence Interval
- We have and do not know , but We have
confidence to say that interval
include .
9Example
- Sugar packing machine Calculate the 95
confidence interval for .
As this CI does not include 100, we reject
at 0.05 level.
10What the Confidence Interval Does Not Mean!
- We cannot state that there is a 95 chance that
the true population mean is contained within any
particular observed confidence interval because
because the population mean is a parameter, or a
fixed value, and therefore is either inside or
outside of the estimated interval. It cannot be
inside an interval 95 of the time. - There is no uncertainty about the sample
statistics (mean, SD, etc). We are 100 sure that
we calculated them correctly.
Interpretation We do not know the population
mean, but we can be sure that on average 95 out
of 100 CIs similarly obtained would include the
population mean. If we repeat this procedure 100
times, the interval constructed in this manner
will include the true mean (?) 95 times.
11Relation between confidence intervals and
hypothesis test (significance test)
- If the 95 confidence interval does not contain
?0 , then the null hypothesis would be rejected
at the 0.05 level - Conversely, if the 95 confidence interval does
contain ?0 , then the null hypothesis is accepted
at the 0.05 level
12Another example
- Light level at floor of a tree canopy. The 95
confidence of population mean . -
As this CI include 3.0, we accept
at ? 0.05 level.
13Confidence Interval (CI) for
For z distribution, the 1- ? confidence interval
for (?1 - ?2) is For t distribution, the (1- ?)
confidence interval for (?1 - ?2) is For paired
samples, the (1- ?) confidence interval for ?d
is
14Confidence Interval (CI) for
Calculation of same
as in the hypothesis test. If the interval
L1,L2 include 0, we accept
at ? level otherwise, we reject
H0.
15Example of Confidence Interval (CI) for
Example 1 n1n2200 Example 2 Two Virus,
tobacco leaves, dead pots.
16One-sided Confidence Interval
- Only one confidence limit is calculated.
-
the confidence interval is - the
confidence interval is -
- Use different critical z or t- value for 1-?
rather than 1- ?/2. Thus, for the z-distribution,
we use 1.645 instead of 1.96. - The relationship between confidence interval and
hypothesis test is the same. If ?0 is included in
the confidence interval for one mean or 0 is
included in the confidence interval for two
means, accept H0 otherwise, reject H0.
17Sample size determination
- As standard error decreases, the confidence
interval becomes smaller, we have an precisely
estimate. - We can decrease the standard error by increasing
the sample size n. - But increasing n will cost increase other
expenses. - So we need a sample size n to guarantee the
precision of the parameter estimation from sample
with certain confidence.
18Sample size determination
- One sample mean
- replace t with t ?(df)
We use z ? as t ? to calculate n, if nlt30, we
recalculate n again use t ?(df).
19Sample size determination
- As variance increases, sample size increases
- As significance level decreases, sample size
increases - As difference between means increase, sample size
decreases
20Examples
- We measured a certain variable with s5.5 unit.
In order to get a sample mean not different from
population mean more than 1 unit with 99
confidence, how many individuals do we need to
sample? - s5.5, alpha0.01, z0.012.58
-
21Sample Size Estimation for Comparison of Two Means
- Two independent samples
- Paired samples
22Example of Sample Size Estimation
- Two treatments, A 24, 20, 29, 25 kg, B 18, 24,
15, 19 kg. Any different between two treatments?
If we want to find a difference between the
difference of sample means and population means
less than 4 kg with 95 confidence, how many
individuals do we need to measure?
23Statistical inference for variance
24One Sample Chi-Square Test
- We have a sample n, s2 and a known population
?02C. - H0 ?2 ?02 vs HA ?2 ¹ ?02
- Test statistic is
- ?2 (n-1) s2/?2 ?2(n-1)
- (if ngt30, )
- Reject if ?2gt?2?/2 (n-1) or ?2 lt?21-?/2( n-1),
- (1- ?) confidence interval for ?2
- L1 (n-1) s2/ ?2?/2, L2 (n-1) s2/ ? 21-?/2
?2?/2 (n-1)
?21-?/2( n-1),
25Example One Sample Chi-Square Test
- Packing machine ?022.Sample, n10, s22.5. If
the sample variance significantly different from
2? Whats the 95 confidence interval for ?2? - H0 ?2 ?02 vs HA ?2 ¹ ?02
- ?20.025 (9)2.70, ?2.0.975(9) 19.02
- ?2 (n-1) s2/?02 11.25
- Accept H0, as 2.7lt?2lt11.25,
- ?2 95 confidence interval (Not symmetric)
- L1 (n-1) s2/ ?2?/21.18
- L2 (n-1) s2/ 21-?/28.33
26Two Samples Variance (F Test)
- We have two samples n1, s12 n2, s22.
- H0 ?12lt ?22 vs HA ?12gt?22
- Test statistic is (always put larger s2 as s12)
- Fs12/ s22
- Reject if
- F gtF?(n1-1,n2-1)
- If n1 and n2gt100, then use z test
27Example Two Samples Variance
- Test if the variance of boys height is larger
than girls (data not real)? - n110, s1222.15 n28,s224.11
- H0 ?12lt ?22 vs HA ?12gt?22
- F 0.05(9,7) 3.68
- Fs12/ s2222.15/4.115.39
- Reject H0 as F gtF0.05
28Multiple variances (Bartlett test)
- Draw k independent sample from a normal
distributed population with n, si2. - H0 all ?2 are same vs HA at least one of ?2
different from others - Test statistic is
- Reject H0 if
- ?2 gt?2 ? (k-1)
29Example Multiple variances (Bartlett test)
K5, n20
30Statistical Inference - Proportions
- One sample
- Hypothesis test
- Confidence interval
- Two Sample
- Hypothesis test
- Confidence interval
31One sample Tests Binomial Proportion
Test if sample proportion estimated p is
different from a prescribed value p0 and estimate
the confidence interval for p. H0 p p0 vs
HA p ¹ p0 Test statistic
Confidence interval L1 -z??p , L2
z??p If H0 rejected, L1 -z?sp , L2
z?sp
32Example
Suppose that there is an equal chance that a
child is male or female. We find in a sample of
114 workers at a pesticide plant (with only one
child) that 66 of the children are female. Is
this evidence that the working condition changing
the proportion of male and female (p00.5)? Data
n 114, p66/1140.5789 H0 pp0 vs HA p
¹ p0 The critical value for a one-sided a
0.05 test is 1.96. Since the test statistic, z
1.69, smaller than the critical value, we accept
H0. 95 Confidence interval L10.5789-1.960.046
0.48 L20.57891.960.046 0.67 includes 0.50
33Hypothesis Testing for 2 Sample Proportions
The hypothesis that the two populations are the
same is addressed by the hypotheses H0
p1 p2 HA p1 ¹ p2 Test statistic (same rules
as for one sample)
34Hypothesis Testing for 2 Sample Proportions
1- ? confidence interval for (p1-p2) If H0
is rejected
35Example for 2 Sample Proportions
Seat belt safety study We can test H0 p1
p2 but we first need a common estimate (under the
null)
36Example for 2 Sample Proportions
Since z lt 1.96 we fail to reject H0 and
conclude that the observed difference is not
statistically significant at the 0.05 level. 95
confidence interval for (p1-p2) This interval
includes 0, confirms that H0 should be accepted.
37Sample Size
1-sample Proportion 2-sample Proportion
38Example for sample Size
Example We know the probability of purple flower
plant in F2 generation is p0.75. We want to a
sample p between 0.740.76 with 95 confidence,
how many plants do we need to sample?
39SAS Programming
- PROC TTEST
- TTEST performs t tests for one sample, two
samples, and paired observations. The one-sample
t test compares the mean of the sample to a given
number. The two-sample t test compares the mean
of the first sample minus the mean of the second
sample to a given number. The paired observations
t test compares the mean of the differences in
the observations to a given number.
40PROC TTEST
- PROC TTEST options
- CLASS variable
- VAR variables
- PAIRED x1x2
41PROC TTEST
- PROC TTEST options
- Options
- ALPHAp, set p value for CI
- CIEQUAL, for variance CI
- COCHRAN
- DATASAS-data-set , data set name
- H0m, set H0mium instead of m0.
42PROC TTEST
- CLASS variable Specify variable to separate
whole data set into to parts, the variable can
only have two levels for two independent samples
No class statement is needed for one sample and
paired t-test.
43PROC TTEST
- PAIRED x1x2 Test dx1-x2, if miu_d0 for paired
t-test. - VAR variable Specify variables to be analyzed.
These variables should not be in the PAIRED
statement.