Title: Normal Distribution
1Normal Distribution
- Recall how we describe a distribution of
quantitative (continuous) data - plot the data (stemplot or histogram)
- look for the overall pattern (shape, peaks, gaps)
and departures from it (possible outliers) - calculate appropriate numerical measures of
center and spread (5-number summary and/or mean
s.d.) - then we may ask "can the distribution be
described by a specific model?" (one of the more
common models for symmetric, single-peaked
distributions is the normal distribution having a
certain mean and standard deviation) - can we imagine a density curve fitting fairly
closely over the histogram of the data? - a density curve is a curve that is always on or
above the horizontal axis (gt 0) and whose total
area under the curve is 1
2- An important property of a density curve is that
areas under the curve correspond to relative
frequencies - see Figures 1.25a and 1.25b below. - rel. freq287/947.303
area .293 - Note the relative frequency of vocabulary scores
lt 6 is roughly equal to the area under the
density curve lt 6.
3- We can describe the shape, center and spread of a
density curve in the same way we describe data
e.g., the median of a density curve is the
equal-areas point - the point on the horizontal
axis that divides the area under the density
curve into two equal (.5 each) parts. The mean
of the density curve is the balance point - the
point on the horizontal axis where the curve
would balance if it were made of a solid
material. (See figures 1.26b and 1.27 below)
4- For a normal density curve we see the
characteristic bell-shaped, symmetric curve
with single peak (at the mean value ?) and spread
out according to the standard deviation (?) See
Figure 1.28 for a picture of ???and ??
5- The 68-95-99.7 Rule describes the relationship
between ? and ?. See Figure 1.29 Go over
example 1.25-1.26 on page 59-61.
6- How many different normal curves are there? Ans
One for every combination of values of ? and
?but they all are alike except for their ? and
?. So we take advantage of this and consider a
process called standardization to reduce all
normals to one we call the Standard Normal
Distribution. - Denote a normal distribution with mean ? and
standard deviation ? by N(?,?). Let X correspond
to the variable whose distribution is N(?,?).
We may standardize any value of X by subtracting
? and dividing by ? - this re-writes any normal
into a variable called Z whose values represent
the number of standard deviations X is away from
its mean. The standardized value is sometimes
called a z-score. - If X is N(?,?), then Z is N(0,1), where
Z(X-?)/?. - We can find areas under Z from Table A, and these
areas equal the corresponding areas under X.
7- Consider Example 1.25. Let Xheight (inches) of
a young woman aged 18-24 years. Then X is
N(64.5", 2.5"). - What proportion of these women's heights are
between 62" and 67"? - What proportion are above 67"? Below 72"?
- What proportions of these women's heights are
between 61" and 66"? NOTE This cannot be
solved by the 68-95-99.7 rule - What proportion are below 64.5"? Below 68"?
- What proportion are between 58" and 60"?
- Etc., etc., etc. .
- What height represents the 90th percentile of
this aged woman? - All problems of this type are solvable by
sketching the picture, standardizing, and doing
appropriate arithmetic to get the final
answerthe last question above is what I call a
"backwards problem", since you're solving for an
X value while knowing an area
8- Weve seen examples of data that seem to fit the
normal model, and examples of data that dont
seem to fit Because normality is an important
property of data for specific types of analyses
well do later, it is important to be able to
decide whether a dataset is normal or not. A
histogram is one way but a better graphical
method is through the normal quantile plot - A simple description of how to draw a normal
quantile plot is given on page 68 but for us,
a normal quantile plot is always going to be
drawn by software and it will allow us to assess
the normality of our data in the following sense - if the data points fall along the straight line
(and within the bands on the plot) then the data
can be treated as normal. Systematic deviations
from the line indicate non-normal distributions -
outliers often appear as points far away from the
pattern of the points... - the y-intercept of the line corresponds to the
mean of the normal distribution and the slope of
the line corresponds to the standard deviation of
the normal distribution
9Normal quantile plot of CO2 Table 1.6 on page 26
Notice the systematic failure of the points to
fall on the line, especially at the low end where
the data is piled up. Also, note the outliers
at the high end Conclusion Not normal
10Normal quantile plot of the IQ scores of 78 7th
grades students - Data in Table 1.9 on page 29
Notice that the data points follow the line
fairly well, though there is a slight curve at
the low-middle, indicating more data than would
be expected for a normal. The y-intercept is
around 110 (mean approx. 110) and the slope is
around 10 (s.d. is approx. 10). Conclusion
Normal
11- Read section 1.3, paying careful attention to the
examples (especially 1.25-1.32). Work through
the examples yourself to make sure you understand
how they are done! - Work problems 1.108-1.110, 1.113 (applet),
1.114-1.117, 1.119, 1.120-1.139, 1.140-1.142,
1.143, 1.144, 1.148 - Try some of the Chapter 1 exercises (p. 78ff). By
test1 time, be sure you've worked as many of
the exercises in this chapter as you need to feel
comfortable with the material. - Don't forget the quizzes and homeworks on the
StatsPortal