Title: Inference Concerning Proportions
1Chapter 8
- Inference Concerning Proportions
2Inference for a Single Proportion (p)
- Goal Estimate proportion of individuals in a
population with a certain characteristic (p).
This is equivalent to estimating a binomial
probability - Sample Take a SRS of n individuals from the
population and observe X that have the
characteristic. The sample proportion is X/n and
has the following sampling properties
3Large-Sample Confidence Interval for p
- Take SRS of size n from population where p is
true (unknown) proportion of successes. - Observe X successes
- Set confidence level C and choose z such that
P(-z?Z ?z)C (C 90 ? z1.645 C 95 ?
z1.96 C 99 ? z2.576)
4Example - Ginkgo and Azet for AMS
- Study Goal Measure effect of Ginkgo and
Acetazolamide on occurrence of Acute Mountain
Sickness (AMS) in Himalayan Trackers - Parameter p True proportion of all trekkers
receiving GinkgoAcetaz who would suffer from
AMS. - Sample Data n126 trekkers received GA, X18
suffered from AMS
5Wilsons Plus 4 Method
- For moderate to small sample sizes, large-sample
methods may not work well wrt coverage
probabilities - Simple approach that works well in practice
(n?10) - Pretend you have 4 extra individuals, 2
successes, 2 failures - Compute the estimated sample proportion in light
of new data as well as standard error
6Example Listers Tests with Antiseptic
- Experiments with antiseptic in patients with
upper limb amputations (John Lister, circa 1870) - n12 patients received antiseptic X1 died
7Significance Test for a Proportion
- Goal test whether a proportion (p) equals some
null value p0 H0 pp0
Large-sample test works well when np0 and n(1-p0)
gt 10
8Ginkgo and Acetaz for AMS
- Can we claim that the incidence rate of AMS is
less than 25 for trekkers receiving GA? - H0 p0.25 Ha p lt 0.25
Strong evidence that incidence rate is below 25
(plt0.25)
9Comparing Two Population Proportions
- Goal Compare two populations/treatments wrt a
nominal (binary) outcome - Sampling Design Independent vs Dependent Samples
- Methods based on large vs small samples
- Contingency tables used to summarize data
- Measures of Association Absolute Risk, Relative
Risk, Odds Ratio
10Contingency Tables
- Tables representing all combinations of levels of
explanatory and response variables - Numbers in table represent Counts of the number
of cases in each cell - Row and column totals are called Marginal counts
112x2 Tables - Notation
12Example - Firm Type/Product Quality
- Groups Not Integrated (Weave only) vs
Vertically integrated (Spin and Weave) Cotton
Textile Producers - Outcomes High Quality (High Count) vs Low
Quality (Count)
Source Temin (1988)
13Notation
- Proportion in Population 1 with the
characteristic of interest p1 - Sample size from Population 1 n1
- Number of individuals in Sample 1 with the
characteristic of interest X1 - Sample proportion from Sample 1 with the
characteristic of interest - Similar notation for Population/Sample 2
14Example - Cotton Textile Producers
- p1 - True proportion of all Non-integretated
firms that would produce High quality - p2 - True proportion of all vertically
integretated firms that would produce High
quality
15Notation (Continued)
- Parameter of Primary Interest p1-p2, the
difference in the 2 population proportions with
the characteristic (2 other measures given below) - Estimator
- Standard Error (and its estimate)
- Pooled Estimated Standard Error when p1p2p
16Cotton Textile Producers (Continued)
- Parameter of Primary Interest p1-p2, the
difference in the 2 population proportions that
produce High quality output - Estimator
- Standard Error (and its estimate)
- Pooled Estimated Standard Error when p1p2p
17Confidence Interval for p1-p2 (Wilsons Estimate)
- Method adds a success and a failure to each group
to improve the coverage rate under certain
conditions - The confidence interval is of the form
18Example - Cotton Textile Production
95 Confidence Interval for p1-p2
Providing evidence that non-integrated producers
are more likely to provide high quality output
(p1-p2 gt 0)
19Significance Tests for p1-p2
- Deciding whether p1p2 can be done by
interpreting plausible values of p1-p2 from the
confidence interval - If entire interval is positive, conclude p1 gt p2
(p1-p2 gt 0) - If entire interval is negative, conclude p1 lt p2
(p1-p2 lt 0) - If interval contains 0, do not conclude that p1 ?
p2 - Alternatively, we can conduct a significance
test - H0 p1 p2 Ha p1 ? p2 (2-sided) Ha
p1 gt p2 (1-sided) - Test Statistic
- P-value 2P(Z?zobs) (2-sided) P(Z?
zobs) (1-sided)
20Example - Cotton Textile Production
Again, there is strong evidence that
non-integrated performs are more likely to
produce high quality output than integrated firms