Title: Unit IV: Introduction To Inferential Statistics
1Unit IV Introduction To Inferential Statistics
- Sampling Distributions
- Confidence Intervals (Mean Proportion)
- Hypothesis Testing
2What is Inferential Statistics?
- The branch of Statistics which allows us to draw
conclusions and/or make decisions concerning a
population based only on sample data
3Inferential Statistics
- Sample statistics Population
parameters - (known) Inference
(unknown, but can - be estimated from
- sample evidence)
4SAMPLING DISTRIBUTIONS
- What is a Sampling Distribution?
- Sampling Distribution of the Mean
- Central Limit Theorem
- Sampling Distribution of Proportions
5What is a Sampling Distribution? I
- Recall
- A statistic is defined as a numerical quantity
calculated in a sample - Some points to remember
- A random sample should represent the population
well, so sample statistics from a random sample
should provide reasonable estimates of population
parameters - All sample statistics have some error in
estimating population parameters - Because sample measurements are observed values
of random variables, the value for a sample
statistic will vary in a random manner from
sample to sample. In other words, since sample
statistics are random variables, they possess
probability distributions - A larger sample provides more information than a
smaller sample so a statistic from a large sample
should have less error than a statistic from a
small sample
6What is a Sampling Distribution? II
- The sampling distribution of a sample statistic
calculated from a sample of n measurements is the
probability distribution of the statistic. - The probability distribution of a statistic is
called its sampling distribution.
7Definitions
- An estimator of a population parameter is a
sample statistic used to estimate or predict the
population parameter. - An estimate of a parameter is a particular
numerical value of a sample statistic obtained
through sampling. - A point estimate is a single value used as an
estimate of a population parameter.
8Estimators
9Exercise
- Random samples of size 2 are drawn, without
replacement, from the finite population which
consists of the numbers 4, 5, 6 and 7. - Find the mean and variance of this population.
- Find the probability distribution for the mean
for random samples of size 2 drawn without
replacement. - Find the mean of the probability distribution.
- Find the variance of the probability distribution
of means.
10Sampling Distribution of the Mean
- For the sampling distribution of
- The mean
- Is denoted by
- Is always equal to the population mean
- Since E( ) µ, is called an unbiased
estimator of µ - The standard deviation
- Is denoted by
- Is equal to
Sampling Without Replacement if sample size is
relatively small
Sampling With Replacement
11The Central Limit Theorem
- Sometimes the population or the sample size may
be too large thus making it extremely tedious to
list out all the possible samples. - When many samples are taken from the same
population, the distribution of values for the
sample mean are centred around the population
mean (regardless of sample size) - As the sample size increases the mean of the
means are closer to the population mean - The standard deviation of the sample mean
decreases as the sample size increases - The distribution of the sample mean becomes more
symmetrical as the sample size gets larger and
becomes approximately normal for large sample
sizes
12The Central Limit Theorem (contd)
- The central limit theorem therefore states that
-
- If the sample size (n) is large enough, the
sample mean ( ) has a normal distribution
with mean µ and standard deviation
regardless of the population distribution. - A large enough sample is where n 30
13CLT The Effect of Sample Size I
14CLT The Effect of Sample Size II
15Finding Probabilities for Sampling Distributions
- Step 1 Standardize the values to be found using
- or
- Step 2 Find probabilities as usual using the
standardized values
For the Mean
For the Proportion
16Example I Sampling Distribution of the Mean
- A manufacturer of automobile batteries claims
that the distribution of the lengths of life of
its battery has a mean of 54 months and a
standard deviation of 6 months. Suppose a
consumer group decides to check the claim by
purchasing a sample of 50 of these batteries and
subjecting them to tests that determine battery
life. - Assuming that the manufacturers claim is true,
describe the sampling distribution of the mean
lifetime of a sample of 50 batteries. - Assuming that the manufacturers claim is true,
what is the probability that the consumer groups
sample has a mean life of 52 or fewer months?
17Example II Sampling Distribution of the Mean
- Certain light bulbs manufactured by a company
have a mean lifetime of 800 hours and a standard
deviation of 60 hours. Find the probability that
a random sample of 64 light bulbs taken from a
production batch will have a mean lifetime of - Less than 785 hours
- More than 820 hours
- Between 800 and 810 hours
- Between 770 and 830 hours
18Sampling Distribution of the Proportion
- The sample proportion is the percentage of
successes in n binomial trials. It is the number
of successes, X, divided by the number of trials,
n. - Sample Proportion
- As the sample size, n, increases, the sampling
distribution of approaches a normal
distribution with mean p and standard deviation
19Example Sampling Distribution of the Proportion
- In recent years, convertible sports coupes have
become very popular in Japan. Toyota is
currently shipping Celicas to Los Angeles, where
a customiser does a roof lift and ships them back
to Japan. Suppose that 25 of all Japanese in a
given income and lifestyle category are
interested in buying Celica convertibles. A
random sample of 100 Japanese consumers in the
category of interest is to be selected. What is
the probability that at least 20 of those in the
sample will express an interest in a Celica
convertible?
20CONFIDENCE INTERVALS
- Estimation Estimators
- What is a Confidence Interval?
- Confidence Intervals for Means
- Confidence Intervals for Proportions
21Estimation
- There are two types
- Point Estimation
- Interval Estimation
- Estimation act of estimating a specific value
in the population from the sample - Estimate a specific statistic to estimate the
value of the parameter - Estimator a specific value calculated from the
sample which is used to find the estimate
22Types of Estimates
- A point estimate is a single number,
- A confidence interval provides a range of values
for estimating a particular population parameter,
it therefore provides additional information
about variability
Upper Confidence Limit
Lower Confidence Limit
Point Estimate
Width of confidence interval
23Deficiencies of Point Estimation
- A specific point estimate is not likely to be
exact because it is one among many possible point
estimators - It provides no assessment of the probability that
a sample point estimate value is reasonably close
to the parameter being estimated - An interval estimate provides more information
about a population characteristic than does a
point estimate
24What is a Confidence Interval?
- A confidence interval is a range of guesses at a
population value - Means
- Proportions
- It is a type of interval estimation
- The confidence level is that chance (probability)
that the range of values captures the true
population value (or will contain the unknown
population parameter) - The general formula for a confidence interval is
Point Estimate (Critical Value)(Standard
Deviation of the Point Estimate)
Aka the Standard Error
25Confidence Level
- Confidence
- A number between 0 and 100 that reflects the
probability that the interval estimate will
include the parameter - A high confidence is desired (between 90 and
99.7) - Higher confidence levels require wider intervals
26Estimation Process
Random Sample
Population
Mean X 50
(mean, µ, is unknown)
Sample
27Confidence Interval for µ
- Confidence Interval Estimate
-
-
or - where is the point estimate
- Z is the normal distribution critical value
for a probability of ?/2 in each tail - is the standard error
28Finding the Critical Value, Z
- Consider a 95 confidence interval
Z -1.96
Z 1.96
Z units
0
Lower Confidence Limit
Upper Confidence Limit
X units
Point Estimate
Point Estimate
29Common Levels of Confidence
- Commonly used confidence levels are 90, 95, and
99
Confidence Coefficient,
Confidence Level
Z value
1.28 1.645 1.96 2.33 2.58 3.08 3.27
0.80 0.90 0.95 0.98 0.99 0.998 0.999
80 90 95 98 99 99.8 99.9
30Steps in Calculating Confidence Intervals for
Means
- Step 1 Calculate the mean for the sample
- Step 2 Calculate the square root of the variance
divided by the sample size - Step 3Calculate the critical value
- Step 4 Apply the formula
31Example - Confidence Interval for the Population
Means I
- A sample of 35 circuits has a mean resistance of
2.20 ohms. We know from past testing that the
population standard deviation is 0.35 ohms. - Determine a 95 confidence interval for the true
mean resistance of the population.
32Example - Confidence Interval for the Population
Means II
- The real estate assessor for Kingston wants to
study various characteristics of single-family
houses in the parish. A random sample of 70
houses reveals the following - Area of the house in square feet x-bar 1759, s
380. - Construct a 99 confidence interval estimate of
the population mean area of the house.
33Example Confidence Interval for Means III
- A publishing company has just published a new
college textbook. Before the company decides the
price at which to sell this textbook, it wants to
know the average price of all such textbooks in
the market. The research department at the
company took a sample of 36 comparable textbooks
and collected information on their prices. This
information produces a mean price of 70.50 for
this sample. It is known that the standard
deviation of the prices of all such textbooks is
4.50. - What is the point estimate of the mean price of
all such textbooks? - Construct a 90 confidence interval for the mean
price of all such college textbooks.
34Confidence Intervals for the Population
Proportion, p
- An interval estimate for the population
proportion (p) can be calculated by adding an
allowance for uncertainty to the sample
proportion
35Confidence Intervals for the Population
Proportion, p
(continued)
- Recall that the distribution of the sample
proportion is approximately normal if the sample
size is large, with standard deviation - We will estimate this with sample data
36Confidence Intervals for the Population
Proportion, p
(continued)
- Upper and lower confidence limits for the
population proportion are calculated with the
formula - where
- is the sample proportion
- n is the sample size
- Z is the normal distribution critical value for
a probability of ?/2 in each tail
37Example Confidence Interval for Proportions I
- A random sample of 100 people shows that 25 are
left-handed. - Form a 95 confidence interval for the true
proportion of left-handers
38Example Confidence Interval for Proportions II
- The real estate assessor for Kingston wants to
study various characteristics of single-family
houses in the parish. A random sample of 70
houses reveals the following - 42 houses have central air-conditioning
- Set up a 95 confidence interval estimate of the
population proportion of houses that have central
air-conditioning
39Sampling Error
- The required sample size can be found to reach a
desired margin of error (e) with a specified
level of confidence (1 - ?) - The margin of error is also called sampling error
- the amount of imprecision in the estimate of the
population parameter - the amount added and subtracted to the point
estimate to form the confidence interval
40Determining Sample Size
Determining
Sample Size
For the Mean
Sampling error (margin of error)
41Determining Sample Size
(continued)
Determining
Sample Size
For the Mean
Now solve for n to get
42Determining Sample Size
(continued)
- To determine the required sample size for the
mean, you must know - The desired level of confidence (1 - ?), which
determines the critical Z value - The acceptable sampling error, e
- The standard deviation, s
43Example - Sample Size Determination (Mean) I
- If ? 45, what sample size is needed to estimate
the mean within 5 with 90 confidence?
So the required sample size is n 220
(Always round up)
44Example - Sample Size Determination (Mean) II
- A department store wishes to estimate, with a
confidence level of 98 and a maximum error of
5, the true mean value of purchases per month of
its customers. Determine the minimum size of
the sample that is required to ensure this, given
that the standard deviation is 15.
45Example - Sample Size Determination (Mean) III
- An alumni association wants to estimate the mean
debt of this years university graduates. It is
known that the population standard deviation of
debts of this years college graduates is
11,800. How large a sample should be selected
so that the estimate with a 99 confidence level
is within 800 of the population mean?
46Determining Sample Size
(continued)
Determining
Sample Size
For the Proportion
Now solve for n to get
47Example - Sample Size Determination (Proportion) I
- How large a sample would be necessary to estimate
the true proportion of defectives in a large
population within 3, with 95 confidence? - (Assume a pilot sample yields 0.12)
48Example - Sample Size Determination (Proportion)
II
- A consumer agency wants to estimate the
proportion of all drivers who wear seatbelts
while driving. Assume that a preliminary study
has shown that 76 of drivers wear seatbelts
while driving. How large should the sample be so
that the 99 confidence interval for the
population proportion has a maximum error of 0.03?
49Example - Sample Size Determination (Proportion)
III
- A preliminary sample of 200 parts produced by a
new machine showed that 7 of them are defective.
How large a sample should the company select so
that the 95 confidence interval for p is within
0.02 of the population proportion.