Title: The Normal
1Chapter 7 The Normal Probability
Distribution Section 7.1 Properties of
the Normal Distribution
Recall a continuous random variable is a random
variable that has an infinite number of possible
values that is not countable. To find
probabilities for continuous random variables, we
do not use probability distribution
functions (as we did for discrete random
variables). Instead, we use probability
density functions. (pdf)
2- Probability Density Function
- A probability density function is an equation
used to compute probabilities of continuous
random variables that must satisfy the following
two properties. - The area under the graph of the equation over all
possible values of the random variable must equal
one. - The graph of the equation must be greater than or
equal to zero for all possible values of the
random variable. That is, the graph of the
equation must lie on or above the horizontal axis
for all possible values of the random variable.
Discrete
Continuous
Discrete
Continuous
3The area under the graph of a density function
over some interval represents the probability of
observing a value of the random variable in that
interval.
4A continuous random variable is normally
distributed or has a normal probability
distribution if its relative frequency histogram
of the random variable has the shape of a normal
curve (bell-shaped and symmetric).
- Properties of the Normal Density Curve
- It is symmetric about its mean, ?.
- The highest point occurs at x ?.
- The area under the curve is one.
- The area under the curve to the right of ?
equals the area under the curve to the left of ?
equals ½. - As x increases without bound (gets larger and
larger), the graph approaches, but never equals,
zero. As x decreases without bound (gets larger
and larger in the negative direction) the graph
approaches, but never equals, zero. - The Empirical Rule
½
½
x
5- The Empirical Rule
- Approximately 68 of the area under the normal
- curve is between x ? - ? and x ? ?.
- Approximately 95 of the area under the normal
- curve is between x ? - 2? and x ? 2?.
- Approximately 99.7 of the area under the normal
- curve is between x ? - 3? and x ? 3?.
See p. 42
6- The Area under a Normal Curve
- Suppose a random variable X is normally
distributed with a mean ? and a standard
deviation ?. Notation X N(? ,?) The area
under the normal curve for any range of values of
the random variable X represents either - The proportion of the population with the
characteristics described by the range, or - The probability that a randomly selected
individual from the population will have the
characteristics described by the range.
e.g. 30 of ----- are between a b
e.g. x has a 0.3 probability to be between a b
7Standardizing a Normal Random Variable Suppose
the random variable X is normally distributed
with a mean ? and standard deviation ?. Then the
random variable is normally distributed
with a mean ? 0 and standard deviation ? 1.
The random variable Z is said to have the
standard normal distribution. Notation Z
N(0 ,1)
See p 43
For any given x, we can calculate the associated
Z-score using the formula above.
8Section 7.2 The Standard Normal Distribution
Z N(0 ,1)
- Properties of the Standard Normal Curve
- It is symmetric about its mean, ? 0.
- The highest point occurs at ? 0.
- The area under the curve is one. This
characteristic is required in order to satisfy
the requirement that the sum of all probabilities
in a legitimate probability distribution equals
1. - The area under the curve to the right of ? 0
equals the area under the curve to the left of ?
0 equals ½. - As z increases without bound (gets larger and
larger), the graph approaches, but never equals,
zero. As z decreases without bound (gets larger
and larger in the negative direction) the graph
approaches, but never equals, zero. - The Empirical Rule
½
½
z
9- The Empirical Rule
- Approximately 0.68 68 of the area under the
- standard normal curve is between 1 and 1.
- Approximately 0.95 95 of the area under the
- standard normal curve is between 2 and 2.
- Approximately 0.997 99.7 of the area under
the - standard normal curve is between 3 and 3.
10Notation for the Probability of a Standard Normal
Random Variable P(a lt Z lt b) represents the
probability a standard normal random variable is
between a and b P(Z gt a) represents
the probability a standard normal random variable
is greater than a P(Z lt b) represents
the probability a standard normal random variable
is less than b.
The notation z? (pronounced z sub alpha) is the
Z-score such that the area under the standard
normal curve to the right of z? is ?.
?
z?
11Table II at the back of the text is referred to
as a Z-table. It tabulates the area to the left
of a given Z-score.
?
12Z -1.33 ?
P(Z) 0.0918
Z 1.33 ?
P(Z) 0.9082
? a 1 0.0918 0.9082
? a 1 0.9082 0.0918
13Normal Curves on Board
14Section 7.3 Applications of the Normal
Distribution
- Finding the Area under any Normal Curve
- Draw a normal curve with the desired area
shaded. - Convert the values of X to Z-scores, using
-
3. Draw a standard normal curve with the area
desired shaded. 4. Find the area under the
standard normal curve. This is the area under
the normal curve drawn in Step 1.
15- Procedure for Finding the Value of a Normal
Random Variable Corresponding to a Specified
Proportion or Probability - Draw a standard normal curve with the area
corresponding to the proportion or probability
shaded. - Use the Z-table to find the Z-score that
corresponds to the shaded area. - Obtain the normal value from the fact that X ?
Z?. - We will take a look at some examples.
P(Z) some
z
16Normal Curves on Board
17Section 7.4 Assessing Normality
Suppose that we obtain a simple random sample
from a population whose distribution is unknown.
Many of the statistical tests that we perform on
small data sets (sample size less than 30)
require that the population from which the sample
is drawn be normally distributed. So, how do we
know if a data set comes from a normal
distribution?
We will use a normal probability plot to answer
the above question. This plot is also called a
normal quantile plot. A normal probability plot
plots observed data verses normal scores. A
normal score is the expected Z-score of the data
value if the distribution of the random variable
is normal. If sample data are taken from a
population that is normally distributed, a normal
probability plot of the actual values versus the
expected Z-scores will be approximately linear.
(Fat pencil test)
18The book talks in detail on how to manually draw
a normal probability plot. We will not do this
by hand. We will use JMP to draw these plots for
us.
How to Obtain a Normal Probability Plot from
JMP Click on Analyze and then Distribution.
Select a column heading for Y columns. Click
OK. You will obtain a histogram and other
output. On the output screen, find the red down
triangle ( ) and find Normal Quantile
Plot. This will yield a plot that can be used
to test for normality.
19Example Use the following normal probability
plots to assess whether the sample data could
have come from a population that is normally
distributed.
Normal
Evidence NOT Normal
20Normal
Evidence NOT Normal
21Section 7.5 Sampling Distributions And The
Central Limit Theorem
In general, a sampling distribution of a
statistic is a probability distribution (such as
the normal distribution) for all possible values
of the statistic computed from a sample of size
n. The sampling distribution of the sample mean
is a probability distribution of all possible
values of the random variable computed from a
sample of size n from a population with mean ?
and standard deviation ?.
22- The idea behind obtaining the sampling
distribution of the mean is as follows - Obtain a simple random sample of size n.
- Compute the sample mean.
- Assuming that we are sampling from a finite
population, repeat steps 1 and 2 until all
simple random samples of size n have been
obtained.
If population size N 100 and sample size n
5 What is the number of possible samples of size
5?
23Since each sample of size n will have an observed
value of and not all observed values will
be exactly the same, is a random variable.
Since is a random variable, we can ask
the following questions What is the E( )
? What is the Var( ) ? What is the
distribution of ?
24The Mean and Standard Deviation of the Sampling
Distribution of . Suppose that a simple
random sample of size n is drawn from a
population with mean ? and standard deviation ?.
The sampling distribution of will have a
mean and standard deviation The
standard deviation of the sampling distribution
of , , is called the standard error of
the mean.
Now we have answered the questions What is the
E( )? What is the Var( )?
Population mean
Population variance / sample size
25What about the distribution of ?
What happens if the distribution of X is not
normal?
.
26CENTRAL LIMIT THEOREM Suppose a random variable
X has a population mean ? and standard deviation
? and that a random sample of size n is taken
from this population. Then the sampling
distribution of
becomes approximately normal as the sample size
n increases. The mean of the distribution is
and standard deviation
.
Let us visualize this.
27100 random draws
n 5
n 100
n 25
28- When is n large enough to assume normality?
- The size of n depends on how close to normal the
original population is. - If the population is normal, n 1 is large
enough - As a rule of thumb, we will use n 30 as
sufficiently large
Hence, when n ? 30 the sampling distribution of
will be approximately normal.
29Example The length of human pregnancies is
approximately normally distributed with a mean of
266 days and standard deviation of 16 days.
- What is the probability a randomly selected
pregnancy lasts less than 260 days? - What is the probability that a random sample of
20 pregnancies have a mean gestation period of
260 days or less? - What is the probability that a random sample of
50 pregnancies have a mean gestation period of
260 days or less? - What might you conclude if a random sample of 50
pregnancies resulted in a mean gestation period
of 260 days or less?
30- What is the probability a randomly selected
pregnancy lasts less than 260 days? - What is the probability that a random sample of
20 pregnancies have a mean gestation period of
260 days or less?
313. What is the probability that a random sample
of 50 pregnancies have a mean gestation period of
260 days or less? 4. What might you
conclude if a random sample of 50 pregnancies
resulted in a mean gestation period of 260 days
or less?
We might conclude that the population from which
the 50 pregnancies were drawn from has a mean
gestation period less than 266 days. (Only 0.4
chance)
32Section 7.6 The Normal Approximation To the
Binomial Probability Distribution
- Criteria for a Binomial Probability Experiment
- A probability experiment is said to be a binomial
experiment if all the following are true - The experiment is performed on n independent
times. Each repetition of the experiment is
called a trial. - Independence means, that the outcome of one
trial will not affect the outcome of the other
trials. - For each trial, there are two mutually exclusive
outcomes, success or failure. - The probability of success, p, is the same for
each trial of the experiment.
33When we were dealing with probabilities for the
binomial distribution, we only set up an
expression, since it is very mathematically
tedious. However, we have a new way to
approximate those probabilities.
p.101
As the number of trials n in a binomial
experiment increases, the probability
distribution of the random variable X becomes
more nearly symmetric and bell-shaped. As a
general rule of thumb, if np gt 5 and nq gt 5, then
the probability distribution will be
approximately symmetric and bell shaped.
(or as in the text book npq 10)
34The Normal Approximation to the Binomial
Probability Distribution. If np gt 5 and nq gt 5,
then the binomial random variable X is
approximately normally distributed with mean ?X
np and standard deviation
What is the major difference between a binomial
random variable and a normal random variable? A
binomial random variable is a discrete random
variable and a normal random variable is a
continuous random variable.
.
Therefore, since we are using a continuous
density function to approximate a discrete
probability we must apply a correction for
continuity. The continuity correction says that
we add and subtract 0.5 from every value of x.
35Continuity Correction
P(X x) P(x 0.5 lt X lt x 0.5)
x
36Exact
Approximation
If n 200 and p 0.1, Then Exact
0.6146 Approx 0.6256
37Example Suppose a softball player safely
reaches base 45 of the time. Assuming at-bats
are independent events, use the normal
approximation to the binomial to approximate the
probability that, in the next 100 at bats
X Bin(100,0.45)
- The player reaches base safely exactly 50 times.
- The player reaches base safely 60 or more times.
- The player reaches base safely 50 or fewer times.
- The player reaches base safely between 60 and 90
times, inclusive.
38- The player reaches base safely exactly 50 times.
- The player reaches base safely 60 or more times.
45
50
0.91
1.11
0
45
60
2.92
0
39- 3. The player reaches base safely 50 or fewer
times. - 4. The player reaches base safely between 60 and
90 times, inclusive.