Title: Sampling Distributions
1Sampling Distributions
2Overview
- To frame our discussion, consider
3Outline
4Population
Population
Parameter the measurement of a characteristic
of an entire population
Population the complete set of objects that you
want to study
5Sample
Sample Subset of subjects that are the focus
of ones study
- Statistic Number calculated on
- sample data quantifying a
- characteristic of the sample
Population
6Sampling (1)
- Random Sampling
- Subjects are chosen from the population at
random. - Stratified Random Sampling
- The population is divided into groups (strata)
then random sampling is applied to the groups.
7Sampling (2)
- Convenience Sampling
- The most convenient persons are chosen.
- Quota Sampling
- Subjects from various portions of the population
are chosen.
8Randomization
- Statistical methods require observations from
independent random variables. Randomization is
used to meet this requirement. - Randomization applies to the allocation of
objects, subjects, and the order of treatments.
9Why Randomization?
- By random assignment you try to keep the results
from being biased by sources of variation over
which you have no control.
10Sample Size
- The larger the variability in the population the
larger the sample needed. - The size of the sample impacts our ability to
generalize since larger samples reduce error.
11Context
- Take a random sample of n observations from a
population P. Compute the mean for the sample.
How well does the sample mean estimate the
population mean? - Notice we generate statistics as estimates of
parameters.
12Sampling Distribution - Mean(s known)
- If a random sample of size n is taken from a
population having a mean µ and variance s2 , then
is a random variable whose distribution has a
mean of µ and variance
13Normal Population Distribution
Let X1,, Xn be a random sample from a normal
distribution with mean value and standard
deviation Then for any n, is
normally distributed.
14Central Limit Theorem
- If is the mean of a sample of size n taken
from a population have mean µ and variance s2
then - is a random variable whose distribution function
approaches standard normal.
15Notes
- Central Limit Theorem holds regardless of the
population distribution. - The sampling distribution is approximately normal
when ngt30. - If the population from which you are sampling has
a normal distribution, then the sampling
distribution is a normal distribution.
http//www.ruf.rice.edu/lane/stat_sim/sampling_di
st/index.html
16Problem 1
- Company records indicate that the time spent
preparing for a code inspection is normally
distributed with a mean of 55 minutes and a
standard deviation of 15 minutes. - What is the probability an employee spends more
than 75 minutes preparing for a review?
17Solution - Problem 1
18Problem 2
- Company records indicate that the time spent
preparing for a code inspection is normally
distributed with a mean of 55 minutes and a
standard deviation of 15 minutes. - What is the probability that the average time for
the review team of 6 people exceeds75 minutes?
19Solution - Problem 2
20Problem 3
- A group of women project leaders for CompuCorp is
considering filing a sex-discrimination suit
against the corporation. A recent report stated
that the average salary for project leads at the
company is 128,000 with a standard deviation of
8,500. A random sample of 65 women taken from
the 350 female project leads at the company had
an average income of 125,000. If the population
of female project managers is assumed to have
same mean and standard deviation as project
leads, what is the probability of observing this
sample average?
21Solution - Problem 3
22Sampling Distribution - Mean(s unknown)
- If a is the mean of a random sample of size n
is taken from a normal population have a mean of
µ and variance s2 , and s2 is the variance of the
sample, then - is a random variable having the t distribution
with the parameter nn-1.
23Notes
- The parameter n is referred to as the degrees of
freedom. - t distribution is similar to normal.
- Notice the requirement of sampling from normal
population. - N(0,1) is good approximation for t distribution
when n30.
24Problem 4
- The CEO submitted a white paper indicating a few
changes in the software development process are
in order. His statements include a claim that
the average effort devoted to unit testing on
projects is 7.8 person-months. You collect
random sample of 75 effort-logs from projects
and determine the average effort for unit testing
was 7.5 person-months with a standard deviation
of 1.75 person-months. Does the data you
collected support or refute the CEO?
25Hypothesis Testing
- For many applications, we are concerned that the
standard deviation or variance of the population
exceeds some specified value.
26Sampling Distribution - Variance
- If S2 is the variance of a random sample of size
n taken from a normal population having the
variance s2, then - is a random variable having the chi-square
distribution with the parameter nn-1.
27Problem 5
- You produced an algorithm for determining the
area of polygonal regions. Your data regarding
the area of a sample of 50 cm2 regions is - 51.2, 47.5, 50.8, 51.5, 51.3, 49.5, 51.1, 50.7,
46.7, 49.2, 52.1, 48.3, 51.6, 49.2, 51.5 - Assuming the area of the polygonal regions if
normally distributed, determine s2 with 95
confidence.
28Problem 6
- A production manager must maintain the standard
deviation of the diameter of hard disk media to
less than 2mm. A random sample of 26 disks
reveal a standard deviation of 1.85mm. If the
disks diameters are normally distributed, do the
data indicate that the standard deviation is less
than 2mm? Use a.05.
29Independent Samples - Variance
- The F distribution allows us to look at the ratio
of variances from two independent random samples.
Using the F statistic we can determine whether
the two samples come from populations that have
similar variances.
30- Suppose we have two normal populations. Taking a
random sample from each population, we want to
compare the variances of the sample. We do so
using the F-statistic - This random variable has the F distribution with
31- Notice the F-distribution is not symmetric.
32Problem 7
- A company is orders components from two different
suppliers. A sample of 10 of the components from
Supplier 1 and 16 components from Supplier 2 are
chosen and tested. From testing we determine the
standard deviation for Supplier 1 components
s14.31 while the standard deviation for Supplier
2 components is s25.01. Are the two variances
sufficiently different at the .01 level?
33Left and Right Tails