Title: Introduction to Formal Inference
1Chapter 6
- Introduction to Formal Inference
- Part III One- and Two- Sample Inference for Means
26.3 Inference for Means
- Recap
- When we have large sample size (large-n) we can
compute (1-a)100 confidence intervals as - if s is known
- -OR- if s is unknown
- This is because of the CLT, for large enough n,
-
36.3 Inference for Means
- What happens when we dont have large enough
sample size? - If s known, generally use the same interval we
did for large-n - If s unknown, then can we still say
- ?
- Unfortunately, NO you CANT!!
46.3 Inference for Means
56.3 Inference for Means
- Note Formulas (presented in this class) for
confidence intervals based on small samples are
valid ONLY for - iid random variables
- That follow a Normal distribution
- In other wordswe still must assume that the
random variables are normal BUT we cant use a
normal distribution to compute the confidence
interval because the sample mean is no longer
normal
66.3 Inference for Means
- Fact If X1, , Xn are iid N(µ, s2) then the
random variable - follows a student t-distribution with n-1
degrees of freedom - Note A t-distribution with 8 degrees of freedom
corresponds to the standard normal distribution.
Just like the normal distribution, we will use a
table to obtain its quantiles (Table B.4)
76.3 Inference for Means
86.3 Inference for Means
- t-distribution depends on the degrees of freedom
(often labeled ?) each row of the table gives
values associated with t? for a given ? and the
columns give the quantile, Q(p) - Recall Q(p) P(T? lt Q(p)) p
- Example if ?7, then PT7 1.895 0.95, where
T7 has the t-distribution with 7 degrees of
freedom - t-distribution is symmetric so probabilities in
one tail are the same as in the other tail and we
can use the same tricks as with the normal
distribution - Example for the t7 distribution, we can also say
- PT7 -1.895 PT7 gt 1.895 1 PT7 lt
1.895 0.05 - or
- PT7 gt 1.895 1 - PT7 lt 1.895 0.05
- so 1.895 t7,a/2 when a.10
96.3 Inference for Means
- T-table
- This table gives probability of being less than t
in terms of quantiles..ie PTv lt t - So to look up ta/2 up we will need to use the
fact that - a 1-p for any of the quantiles Q(p)
- The table contains only positive values to find
probabilities associated with a negative value,
we will need to convert it to a positive value
(like on previous example)
106.3 Inference for Means
- When constructing two-sided confidence intervals
we will be interested in finding t?, a/2, where
t?,a/2 is such that PT gt t?,a/2 a /2 - It may be easier to think in terms of 1- a /2
- PT gt t?,a/2 1 - PT gt t?,1-a/2 PT lt
t?,1-a/2 - Example, if ? 15 and a .05, then we find a
/2.025. So, we want t15, .975 and look in row 15
under column Q(.975) giving t15, .975
2.131. - For one-sided intervals, we use t?,1-a and thus
we dont divide the value of a in half.
116.3 Inference for Means
- Use Table B.4 to answer the following questions
- What value would you use if you have a sample of
10 values and want to construct a two-sided 95
CI? - t9,0.025 (-t9,0.975 ) 2.262
- What if you have a sample of 20 values?
- t19,0.025 (-t19,0.975) 2.093
- A sample of 40 values?
- t39,0.025 (-t39,0.975 ) 2.022
126.3 Inference for Means
- Using Table B.4 to get p-values
- Recall a p-value is the probability of being as
extreme or more extreme than the value we got
from our observed data (i.e. p-value is a
probability) - The t-table only has a few probabilities on
itQ(.9), Q(.95), Q(.975), Q(.99), Q(.995),
Q(.999), Q(.9995) - These correspond to the a values 0.10, 0.05,
0.025, 0.01, 0.005, 0.001, 0.0005 - P-values for the t-distribution will be in the
form of inequalities, i.e. 0.05 lt p-value lt 0.10
136.3 Inference for Means
- P-values Find p-values for the following test
statistics given the associated sample sizes - t 2.25 and n 10 so we know ? 9, and
- 1.833 lt 2.25 lt 2.262 Q(.95) lt 2.25 lt Q(.975)
- (IT FLIPS!) .025 lt p-value lt .05
- t 1.15 and n 20 so we know ? 19
- 1.15 lt 1.328 1.15 lt Q(.9)
- (IT FLIPS!) p-value gt .1
- t 3.15 and n 35 so we know ? 34 (use 30 or
40) - Using 30 2.750 lt 3.15 lt 3.385 Q(.995) lt 2.25
lt Q(.999) - (IT FLIPS!) .001 lt p-value lt .005
146.3 Inference for Means
- P-values Notes
- P-values for the t-distribution will always be in
the form of an inequality - If you have a two-sided hypothesis, you still
need to multiply by 2 - The interval you find in terms of quantiles (ie.
Q(.95) to Q(.975)) will need to flip when you put
it in terms of probabilities (ie. .025 to .05)
since p 1-a - The table goes backwards across the top so
bigger p-values for smaller test-statistics and
vice versa
156.3 Inference for Means
- Small-n Two-Sided CI for µ with Unknown s2
- If X1, , Xn are iid N(µ, s2) and n is small, a
(1-a)100 CI for µ is - Small-n Test Statistics for Testing H0 µ µ0
- If X1, , Xn are iid N(µ, s2) and n is small, a
(1-a)100 CI for µ is -
166.3 Inference for Means
- Example 4 An engineer who works for Consumer
Reports is interested in the performance of
various cars. Suppose one way to measure
reliability is to determine the total cost of
maintenance/repairs for a particular make and
model during the first four years of operation.
The engineer randomly selects 20 people that have
a particular make and model and determines the
average cost of maintenance/repairs is 2,300 and
the standard deviation is 400. - Give a 99 CI for the true average cost
- Test the claim that the true average cost has
changed from 2,200 using a .01.
176.3 Inference for Means
- A 99 CI for the true average cost
- a .01 so a/2 .005
- n 20, not large enough so we must use the
- t-distribution with 19 df
- t.005,19 2.861
-
- We are 99 confident that the true average cost
of maintenance/repairs is between 2,044.10 and
2,555.90.
186.3 Inference for Means
- Test the claim that the true average cost has
changed from 2,200. - Step 1 H0 µ 2,200 vs HA µ ? 2,200
- Step 2 n 20 so we have
- where t t19
- Step 3 p-value 2PT19 gt 1.118
- 2(p-value gt .10) p-value gt .20
- Step 4 p-value .20 gt a .01 so we FTR H0
- Step 5 There is not enough sufficient evidence
to conclude that the average cost is not 2,200.
196.3 Inference for Means
- Example A special type of hybrid corn was
planted on eight different plots. The plots
produced yield values (in bushels) of 140, 70,
39, 110, 134, 104, 100 and 125. Assume the
yields follow a normal distribution. - Note the sample mean is 102.75 and the sample
variance is 1151.071. - a) Give a 95 CI for the true average yield of
this type of hybrid corn. - b) Test H0 100 vs Ha ? 100 and give the
p-value.
206.3 Inference for Means
- Give a 95 CI for the true average yield of this
type of hybrid corn - I am 95 confident that the true average yield
of this type of hybrid corn is between 74.381
bushels and 131.119 bushels.
216.3 Inference for Means
- Test H0 100 vs Ha ? 100 and give the
p-value. - Step 1 H0 µ 100 vs HA µ ? 100
- Step 2 n 8 so we have
- where t t7
- Step 3 p-value 2PT7 gt 0.22926
- 2(p-value gt .10) p-value gt .20
- Step 4 p-value .20 gt a .05 so we FTR H0
- Step 5 There is not enough sufficient evidence
to conclude that the average yield is not 100
bushels
226.3 Inference for Means
- Example Suppose the weights of 10 newly minted
U.S. pennies were recorded. The table below
contains the data. - Note the sample mean is 3.00, the sample variance
is 0.000175
236.3 Inference for Means
- Assuming the weights follow a normal
distribution, give a 90 CI for the true average
weight of newly minted penny. - We are 90 confident that the true average
weight of a newly minted penny is between 2.9919g
and 3.0081g.
246.3 Inference for Means
- Now suppose the weights of 40 newly minted U.S.
pennies were measured. The table below contains
the data.
256.3 Inference for Means
- Give a 90 CI for the true average weight of a
newly minted penny. - We are 90 confident that the true average
weight of a newly minted penny is between 2.9949
g and 3.0051 g.
266.3 Comparison of Two Means
- Up until now, we have only considered a single
population at a time - Often we want to compare(consider the difference
between) two population means i.e. µ1 µ2 - E.g. suppose the engineer of Consumer Reports
wants to compare average repair costs between the
Honda Civic and Toyota Corolla - Must consider two cases independent populations
and dependent populations
276.3.2 Comparison of Two Means (Dependent)
- If two populations are dependent, then we cant
consider them individually - Called Paired Data
- Before and After Studies
- Studies on Twins (twin A and twin B) or couples
- Two measurements on one item
- Combine observations by looking at the
differences within pairs of observations and do
analysis on the average difference, - Uses the same methodologies (large/small sample
sizes) as before with
286.3.2 Comparison of Two Means (Dependent)
- A group of engineering students took resistance
measurements on n 5 different resistors using
two different resistance meters
296.3.2 Comparison of Two Means (Dependent)
- Want to compare mean of resistance of meter 1 to
that of meter 2 - Data are clearly dependent (since each
measurement is on the same resistor) paired
data - Take the difference between the two measurements
for each object, this is our random quantity of
interest - Now measurements on different subjects are
independent (since resistors are different) so we
can proceed as we have before
306.3.2 Comparison of Two Means (Dependent)
316.3.2 Comparison of Two Means (Dependent)
- n lt 25 so we have to use small-sample methods
- What is a 90 CI for the mean difference in
resistance measurement for meter 2 and meter 1? -
326.3.2 Comparison of Two Means (Dependent)
- How do we interpret this?
- We are 90 confident that the true mean
difference in resistance measurement for meter 2
and meter 1 is between 11.878 and 12.922. - We estimate that the 2nd meter tends to read
between 11.88 and 12.92 units higher than the 1st
meter, on average.
336.3.2 Comparison of Two Means (Dependent)
- Note If n is large, we proceed with our large
sample methodology. i.e. we appeal to the CLT
to get that the distribution of is
approximately normal with mean µd and variance
sd2. Using this, we can build confidence
intervals and test hypotheses in the same fashion
as for a single mean,
346.3.3 Comparison of Two Means (Independent)
- If your samples are independent, we can consider
each population separately and compare the means
of the two groups - What could we use to estimate µ1 µ2?
- Note that
- What is the variance of our estimator?
356.3.3 Comparison of Two Means (Independent)
- Large-Sample Two-Sided CI for Difference of Means
From Two INDEPENDENT Populations (n1 25 and n2
25) - If X11, , Xn1 are iid with mean µ1 and variance
s21 and - X12, , Xn2 are iid with mean µ2 and variance
s22 - then a (1-a)100 CI for µ1 µ2 is
-
366.3.3 Comparison of Two Means (Independent)
- Large-Sample Test Statistic for Testing
- H0 µ1-µ2 with Two INDEPENDENT Populations
(n1 25 and n2 25) - If X11, , Xn1 are iid with mean µ1 and variance
s12 and - X12, , Xn2 are iid with mean µ2 and variance
s22 - then
376.3.3 Comparison of Two Means (Independent)
- Similar to the previous large sample formulas
since both populations are assumed to be large - In each formula, s1 and s2 can be replaced with
s1 and s2 when the population standard deviations
are unknown
386.3.3 Comparison of Two Means (Independent)
- Example Two brands of golf balls, Brand A and
Brand B, are to be compared with respect to
driving distance. Suppose we randomly sample 25
balls from each brand and test them using an
automatic driving device known to give normally
distributed distances with a standard deviation
of 15 yards. The mean distance for golf ball A
is 300 yards and the mean distance for golf ball
B is 320 yards.
396.3.3 Comparison of Two Means (Independent)
- Construct a two-sided 99 CI for the difference
in mean driving distances for the two types of
golf balls.
406.3.3 Comparison of Two Means (Independent)
- How do we interpret this interval?
- We are 99 confident that that true difference
in mean driving distances in golf ball B and golf
ball A is between 9.07 and 30.93 yards. - -or-
- We are 99 confident that, on average, golf ball
B travels between 9.07 and 30.93 yards farther
than golf ball A.
416.3.3 Comparison of Two Means (Independent)
- Given a 1 level of significance, (i.e. a.01),
do the two brands of golf balls differ? -
- Note this is equivalent to asking is µB µA
0? (should look like a hypothesis test!) - If we test H0 µB µA 0 vs HA µB µA ? 0,
since 0 is not inside the 99 CI from a), 0 is
not a plausible value for the difference in means
- p-value lt a0.01
- we would reject H0 and conclude that
the mean driving distances differ for brands A
and B (i.e. the mean is not 0)
426.3.3 Comparison of Two Means (Independent)
- Example Two varieties of apples, Granny Smith
and Macintosh, were analyzed for their potassium
content in milligrams. A sample of 100 Granny
Smith apples resulted in a mean of 0.30 mg and a
sd of 0.07 mg. A sample of 150 Macintosh apples
resulted in a mean of 0.27 mg and a sd of 0.05
mg. Is there clear evidence to conclude that one
variety has more potassium than another variety?
If so, which variety contains more potassium?
436.3.3 Comparison of Two Means (Independent)
- Step 1 H0 µG - µM 0 vs HA µG - µM ? 0
- Step 2
- Step 3 p-value 2PZlt -3.702 0
-
446.3.3 Comparison of Two Means (Independent)
- Step 4 Since p-value 0 lt a 0.05 (since no
significance level was specified), then we Reject
H0 - Step 5 At the a 0.05 level of significance,
there is significant evidence to conclude that
µG-µM ? 0 i.e. there is significant evidence to
conclude that there is a difference in variety. - Technically, we didnt do the proper test to
determine which one has more potassiumbut since
Z gt 0, we could probably conclude the Granny
Smith have more potassium content.
456.3.4 Small-sample Comparison of Two Means
(Independent)
- Similar to one sample methodology, for small n,
two sample methodology involving at least one
small sample relies on the assumption of
normality and results in simply changing our
reference distribution from a Normal to a
t-distribution - Again, formulas differ depending on whether or
not the samples are independent are dependent
466.3.4 Small-sample Comparison of Two Means
(Independent)
- Small-Sample Two-Sided CI for Difference of Means
From Two Independent Populations (n1 lt 25 or
n2 lt 25) - If X11, , Xn1 are iid N(µ1, s21) and
- X12, , Xn2 are iid N(µ2, s22)
- then a (1-a)100 CI for µ1 µ2 is
-
476.3.4 Small-sample Comparison of Two Means
(Independent)
- Notes
- Df (n1 1) (n2 1) n1 n2 2
- We now use a pooled variance
- Since both samples are small, we dont want an
over-inflated variance to be used. This pooled
variance like a weighted average of the variances
486.3.4 Small-sample Comparison of Two Means
(Independent)
- Small-Sample Test Statistic for Testing
- H0 µ1-µ2 with Two Independent Populations
(n1 lt 25 or n2 lt 25) - If X11, , Xn1 are iid N(µ1, s21) and
- X12, , Xn2 are iid N(µ2, s22)
- then
496.3.4 Small-sample Comparison of Two Means
(Independent)
- Note Your textbook give two formulas for the
degrees of freedomOne formula is used when we
can assume (which we will generally
do). The other formula is called the
Satterthwaite approximation will not be covered
in this class
506.3.4 Small-sample Comparison of Two Means
(Independent)
- Example Two different types of brake lining were
tested for differences in wear. Twelve cars were
used, six for each type of brake lining. A
sample of each brand was tested with the results
(listed in hundreds of miles) given in the
following table. Assuming the populations are
independent and normally distributed with equal
variances, test for evidence that true average
brake lining wear is better for brand A than
brand B. Use a .10 level of significance.
516.3.4 Small-sample Comparison of Two Means
(Independent)
- Brand A 42 58 64 40 47 50
- Brand B 48 40 30 44 54 38
- Step 1 H0 µA µB 0 vs HA µA µB gt 0
- Step 2 nA lt 25 and nB lt 25
526.3.4 Small-sample Comparison of Two Means
(Independent)
- Step 2 (contd)
- Step 3 p-value Q(.9) lt 1.5361 lt Q(.95) so
- .05 lt p-value lt .10
536.3.4 Small-sample Comparison of Two Means
(Independent)
- Step 4 since .05 lt p-value lt .10 is greater than
a.1, we will Reject H0 - Step 5 There is significant evidence to conclude
at an a .10 level of significance that the true
average brake lining wear is greater in brand A
than in brand B.
541 Sample or 2?
2 sample
1 sample
Independent or Dependent?
See next slide
independent
dependent
Is n large?
Is n large?
no
no
Find d, sd and µd
yes
yes
Find d, sd and µd
551 Sample or 2?
1 sample
2 sample
See previous slide
Use the following table
566.6 Prediction Intervals
- Recall C.I.s are meant to bracket µ, the
population mean. -
- and
- What if we wanted to try to predict one more
individual observed value? We currently have
observed n, we want to predict n1. - Again, we will want to use an interval for
uncertainty
576.6 Prediction Intervals
- Prediction Intervals from a Normal distribution
- Assume we are sampling from a normal distribution
- and S are based on a sample of size n and
Xn1 is a single additional observation not yet
in the sample - For X1, , Xn, Xn1 iid N(µ, s2),
-
586.6 Prediction Intervals
- These facts give us the following formula for a
two-sided (1-a)100 CI for xn1 is - Note that the sample mean and sample standard
deviation come from the first n observations only
596.6 Prediction Intervals
- Example (contd) Suppose we randomly chose 1 more
car that is the same make and model of the
previous 20 that we used to infer the average
cost of maintenance. - What is the 95 prediction interval for the cost
of maintenance of a single additional randomly
chosen car of this make and model?
606.6 Prediction Intervals
616.6 Prediction Intervals
- Note
- This goes to show that prediction intervals have
increased width by a factor of over
CIs. - This shouldnt be too surprising since there is
more uncertainty involved in predicting the next
value than in estimating the population mean.
626.6 Prediction Intervals
- How do we interpret this?
- To say (a, b) is a (1-a)100 prediction
interval for a single additional observation is
to say that in obtaining this interval I have
used a method of sample selection and calculation
that would work about (1- a)100 of repeated
applications.