Title: Chapter 8 Fundamental Sampling Distributions and Data Distributions
1Chapter 8 Fundamental Sampling Distributions and
Data Distributions
- Wen-Hsiang Lu (???)
- Department of Computer Science and Information
Engineering, - National Cheng Kung University
- 2007/05/24
28.1 Random Sampling
- Outcome of a statistical experiment
- Numerical value total value of a pair of dice
tossed - Descriptive representation blood types in blood
test - Sampling from distributions or populations
- Sample mean and sample variance
- The use of high speed computer enhance the use of
formal statistical inference with graphical
techniques.
3Random Sampling
- Definition 8.1 A population consists of the
totality of the observations with which we are
concerned. - Finite size 600 students are classified
according to blood type gt a population of size
600 - Infinite size measuring the atmospheric
pressure some infinite populations are so large - Each observation in a population is a value of a
random variable X having some probability
distribution f(x). - If one is inspecting items coming off an assembly
line for defects, then each observation in
population might be a value 0 or 1 of the
binomial random variable X with probability
distributionwhere 0 indicates a nondefective
item and 1 indicates a defective item.
4Random Sampling
- Sometimes, it is impossible or impractical to
observe the entire set of observations that make
up the population. - Definition 8.2 A sample is a subset of a
population. - Inference from the sample to the population are
to be valid - Obtain representative samples
- Bias Erroneous inferences result from selecting
convenient sampling members - Random sample independent and at random
5Random Sampling
- Definition 8.3 Let X1, X2 ,, Xn be n
independent random variables, each having the
same probability distribution f(x). We then
define X1, X2, , Xn to be a random sample of
size n from the population f(x) and write its
joint probability distribution as - If we assume the population of battery lives to
be normal, the possible values of any random
sample Xi, i 1, 2,, 8, will be precisely the
same as those in the original population, and
hence Xi has the same identical normal
distribution as X.
68.2 Some Important Statistics
- Definition 8.4 Any function of the random
variables constituting a random sample is called
a statistic. - Definition 8.5 If X1, X2 ,, Xn represent a
random sample of size n, then the sample mean is
defined by the statistic - Definition 8.6 If X1, X2 ,, Xn represent a
random sample of size n, then the sample variance
is defined by the statistic
7Some Important Statistics
- Example 8.1 A comparison of coffee prices at 4
randomly selected grocery stores in San Diego
showed increases from the previous month of 12,
15, 17, and 20 cents for a 1-pound bag. Find the
variance of this random sample of price
increases. - Solution
8Some Important Statistics
- Theorem 8.1 If S2 is the variance of a random
sample of size n, we may write - Proof
9Some Important Statistics
- Definition 8.7 The sample standard deviation,
denoted by S, is the positive square root of the
sample variance. - Example 8.2 Find the variance of the data 3, 4,
5, 6, 6, and 7, representing the number of trout
caught by a random sample of 6 fishermen. - Solution
108.3 Data Displays and Graphical Methods
- Motivation Use creative displays to extract
information about properties of a set. - The stem and leaf plots provide the viewer a look
at symmetry of the data. - Normal probability plots and quantile plots are
used to check normal distribution. - Characterize statistical analysis as the process
of drawing conclusion about system variability. - Statistics provide single measures, whereas a
graphical display adds additional information in
terms of a picture.
11Box and Whisker Plot or Boxplot
- Box and whisker plot encloses the interquartile
range of the data in a box that has median
displayed within. - Interquartile range between the 75th percentile
(upper quartile) and the 25th percentile (lower
quartile). - Boxplot provides the viewer information about
outliers which represent rare event. - Example 8.3 Nicotine content was measured in a
random sample of 40 cigarettes. The data is
displayed right. - Mild outliers 0.72, 0.85, and2.55
12Box and Whisker Plot or Boxplot
13Box and Whisker Plot or Boxplot
- Example 8.4 Consider the following data,
consisting of 30 samples measuring the thickness
of paint can ears. Figure 8.2 depicts a box and
whisker plot for this asymmetric set of data.
14Quantile Plot
- Quantile plot
- Compare samples of data
- Draw distinctions
- Depict cumulative distribution function
- Definition 8.8 A quantile of a sample, q(f), is
a value for which a specified fraction f of the
data values is less than or equal to q(f). - Sample median q(0.5) 75th percentile q(0.75)
25th percentile q(0.25) -
15Quantile Plot
- In Figure 8.3, quantile plotshows all
observations. - Large clusters slopes near zero
- Sparse data steeper slopes
- E.g.
- Sparse data 28-30
- High density 36-38
16Normal Quantile-Quantile Plot
- Approximation of quantileof normal distribution
- Definition 8.8 The normal quantile-quantile
plot is a plot of
17Normal Quantile-Quantile Plot
- Construct a normal quantile-quantile plot and
draw conclusions regarding whether or not it is
reasonable to assume that the two samples are
from the same N(?, ?) distribution. - Solution
- Far from a straight line
- Station 1 reflect a few values in the lower tail
of the distribution and several in the upper tail - Unlikely
188.4 Sampling Distribution
- Statistical inference is concerned with
generalizations and predictions. - Based on the opinions of several people
interviewed on the street, that in a forthcoming
election 60 of the eligible voters in the city
of Detroit favor a certain candidate. - Definition 8.10 The probability distribution of
a statistic is called a sampling distribution. - E.g., the probability distribution of is
called the sampling distribution of the mean. - The sampling distribution of a statistic depends
on the size of the population, the size of the
samples, and the method of choosing the samples.
198.5 Sampling Distribution of Means
- Suppose that a random sample of n observations is
taken from a normal population with mean ? and
variance ?2. - By the reproductive property of the normal
distribution established in Theorem 7.11
- Theorem 7.11 If X1, X2 ,, Xn are independent
random variables having normal distributions with
means ?1, ?2 ,, ?n and variances ?12, ?22 ,,
?n2, respectively, then the random variable
Y a1X1 a2X2 anXnhas a
normal distribution with mean
?Y a1?1 a2?2 an?nand variance
?Y2 a12?12 a22?22 an2?n2.
20Sampling Distribution of Means
- Theorem 8.2 Central Limit Theorem If is the
mean of a random sample of size n taken from a
population with mean ? and finite variance ?2,
then the limiting form of the distribution ofas
n??, is the standard normal distribution n(z 0,
1). - The normal approximation for will generally
be good if n ? 30. - If n lt 30, the approximation is good only if the
population is not too different from a normal
distribution. - If the population is known to be normal, the
sampling distribution of will follow a
normal distribution exactly, no matter how small
the size of the samples.
21Sampling Distribution of Means
- Example 8.6 An electric firm manufactures light
bulbs that have life mean equal to 800 hours and
a standard deviation of 40 hours. Find the
probability that a random sample of 16 bulbs will
have an average life of less than 775 hours. - Solution
22Sampling Distribution of Means
- Example 8.7 A engineer conjectures that the
population mean of a certain component parts is
5.0 millimeters. An experiment is conducted in
which 100 parts produced by the process are
selected randomly and the diameter measured on
each. It is known that the population standard
deviation ? 0.1. The experiment indicates a
sample average diameter 5.027 millimeters.
Does this sample information appear to support or
refute the engineers conjecture? - Solution
23Sampling Distribution of the Difference Between
Two Averages
- Theorem 8.3 If independent sample of size n1 and
n2 are drawn at random from two populations,
discrete or continuous, with means ?1 and ?2 and
variances ?12 and ?22, respectively, then the
sampling distribution of the differences of
means, is approximately normally
distributed with mean and variance given by
- Theorem 7.11 If X1, X2 ,, Xn are independent
random variables having normal distributions with
means ?1, ?2 ,, ?n and variances ?12, ?22 ,,
?n2, respectively, then the random variable
Y a1X1 a2X2 anXnhas a normal
distribution with mean ?Y a1?1
a2?2 an?nand variance ?Y2
a12?12 a22?22 an2?n2.
24Sampling Distribution of the Difference Between
Two Averages
- Example 8.8 Two independent experiments are
being run in which two different types of paints
are compared. Eighteen specimens are painted
using type A and the drying time in hours is
recorded on each. The same is done with type B.
The population standard deviations are both known
to be 1.0. Assuming that the mean drying time is
equal for the two types of paint, find - Solution
25Sampling Distribution of the Difference Between
Two Averages
- Example 8.9 The television picture tubes of
manufacturer A have a mean lifetime of 6.5 yeas
and a standard deviation of 0.9 year, while those
of manufacturer B have a mean lifetime of 6.0
years and a standard deviation of 0.8 year. What
is the probability that a random sample of 36
tubes from manufacturer A will have a mean
lifetime that is at least 1 year more than the
mean lifetime of a sample of 49 tubes from
manufacturer B? - Solution
26Sampling Distribution of S2
- If a random sample of size n is taken from a
normal population with mean ? and variance ?2,
and the sample variance S2 is computed.
Corollary If X1, X2 ,, Xn are independent
random variables having identical normal
distributions with mean ? and variances ?2
has a chi-squared
distribution with v n degrees of freedom.
27Sampling Distribution of S2
- Theorem 8.4 If S2 is the variance of a random
sample of size n taken from a normal population
having the variance ?2, then the statistichas
a chi-squared distribution with v n -1 degrees
of freedom. - It is customary to let ??2 represent the ?2
value above which we find an area of ?. This is
illustrated by the shaded region in Figure 8.10. - Table A.5
28(No Transcript)
29Sampling Distribution of S2
- Example 8.10 A manufacturer of car batteries
guarantees that his batteries will last, on the
average, 3 years with a standard deviation of 1
year. If five of these batteries have lifetimes
of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the
manufacturer still convinced that his batteries
have a standard deviation of 1 year? Assume that
the battery lifetime follows a normal
distribution. - Solution
30Degrees of Freedom As a Measure of Sample
Information
- Comparison
- Theorem 7.12 has a ?2 distribution with n
degrees of freedom. - Theorem 8.4 has a ?2 distribution with n -1
degrees of freedom.(when ? is not known, a
degree of freedom is lost in the estimation of ?,
i.e. )
31t-Distribution
- Central Limit Theorem (Theorem 8.2)
- ? might not be known.
- Consider
- In developing the sampling distribution of T, we
shall assume that our random sample was selected
from a normal population.
32t-Distribution
- Theorem 8.5 Let Z be a standard normal random
variable and V a chi-squared random variable with
v degrees of freedom. If Z and V are independent,
then the distribution of the random variable T,
whereis given by the density functionThis
is known as the t-distribution with v degrees of
freedom.
33t-Distribution
- Corollary Let X1, X2 ,, Xn be independent
random variables that are all normal with mean ?
and standard deviation ?. LetThen the random
variable has at-distribution with v
n-1 degrees of freedom. - Student t-distribution
- The probability distribution of T was first
published in 1908 in a paper by W. S. Gosset. - Employed by an Irish brewery, but disallowed
publication. - Published his work secretly under the name
Student.
34t-Distribution
- T is similar to Z symmetric about ?
0,bell-shaped. - Difference between T and Z variance of T ? 1
and depends on n - T and Z are the same n ? ?
35t-Distribution
- t-value with 10 degrees of freedom leaving an
area of 0.025 to the right is t 2.228. - t-distribution is symmetric about 0 t1-? -t?.
- Example 8.11 The t-value with v 14 degrees of
freedom that leaves an area of 0.025 to the left,
and therefore an area of 0.975 to the right, is - Example 8.12 P(-t0.025 lt T lt t0.05) 1 - 0.05
- 0.025 0.925
36t-Distribution
37t-Distribution
38t-Distribution
- Example 8.13 Find k such that P(k lt T lt -1.761)
0.045, for a random sample of size 15 selected
from a normal distribution and - Solution
39t-Distribution
- Exactly 95 of the values of a t-distribution
with v n -1 degrees of freedom lie between
t0.025 and t0.025. - A t-value that falls below t0.025 or above
t0.025 would tend to make us believe that either
a very rare event has taken place or perhaps our
assumption about ? is error. - Example 8.14 A engineer claims that the
population mean of a process is 500 grams. To
check this claim he samples 25 batches each
month. If the computed t-value falls between
t0.05 and t0.05, he is satisfied with his claim.
What conclusion should he draw from a sample that
has a mean grams and a sample
standard deviation s 40 grams? Assume the
distribution of yields to be approximately
normal. - Solution
40t-Distribution
- The t-distribution is used extensively in
problems that deal with - Inference about the population mean
- Comparative samples (two sample means)
- requires that X1, X2 ,, Xn be
normal.
41F-Distribution
- The F-distribution finds enormous application in
comparing sample variances. - Theorem 8.6 Let U and V be two independent
random variables having chi-squared distribution
with v1 and v2 degrees of freedom, respectively.
Then the distribution of the random variable
is given by the densityThis is
known as the F-distribution with v1 and v2
degrees of freedom.
42F-Distribution
- Theorem 8.7 Writing f?(v1, v2) for f? with v1
and v2 degrees of freedom, we obtain - E.g., f-value with 6 and 10 degrees of freedom,
leaving an area of 0.95 to the right,
43F-Distribution
10
4.06
44F-Distribution with Two Sample Variances
- Suppose that random samples of size n1 and n2 are
selected from two normal populations with
variances ?12 and ?22 Let
having chi-squared distribution
with v1 n1 - 1 and v2 n2 1 degrees of
freedom. Using Theorem 8.6, we obtain the
following result - Theorem 8.8 If S12 and S22 are the variances of
independent random samples of size n1 and n2
taken from normal populations with variances ?12
and ?22, respectively, thenhas an
F-distribution with v1 n1 - 1 and v2 n2 1
degrees of freedom.
45F-Distribution
- If we wish to determine if the population means
are equivalent - The normal distribution applies nicely for
two-sample situation. - However, three-sample?
- F-distribution is called the variance ratio
distribution. - Whether sample averages could have occurred by
chance depends on the variability within samples,
as quantified by SA2 and SB2, and SC2. - The notion of the important components of
variability is best seen through some simple
graphics
46Analysis of Variance with F-Distribution
- Two key sources of variability
- Variability within samples
- Variability between samples
- If the variability within samples is considerably
larger than the variability between samples,
there will be considerable overlap in the sample
data and a signal that the data could all have
come from a common distribution.
47Exercise
- 1, 14, 17, 29, 41, 51, 59