Statistics 221 - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Statistics 221

Description:

When you plot all the 'bars' on the chart, they form the total bar area. ... top edge of the bar area looks like a smooth curve ... How do we find this area? ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 62

Provided by: margaret1

Category:

more less

Transcript and Presenter's Notes

Title: Statistics 221

1
Statistics 221

Chapter 6 - Part A
Continuous Probability Distributions

2
Continuous variables and their probabilities

Recall that the outcome of an experiment a
random variable (x) - can be classified as being
either discrete or continuous depending on the
type of data that an experiment is designed to
capture.
A discrete random variable is usually an integer
value and may assume either a finite number of
values or an infinite sequence of values.
Examples include number of children born or
number of customers arriving.
A continuous random variable can be any real
number in an interval or collection of intervals.
Examples include weights, distance, and time.

3
Probability distributions of discrete variables

Recall that in the last chapter we discussed
experiments that capture discrete outcomes (each
individual outcome being a random variable).
To obtain the probability of each outcome, we
used formulas (such as binomial or the Poisson
formulas) to calculate the expected probability
of each outcome.
After obtaining these probabilities, we created a
probability distribution.

4
Probability distributions of continuous variables

In this chapter, we do the same but for
experiments that capture outcomes that are
classified as continuous variables but we use a
different approach.
To determine the probabilities of continuous
variables, we must rely on the area
probability premise.
The following example will compare methodologies
for calculating the probabilities of discrete
vs. continuous variables.

5
Example Calculating the probability of
continuous variables

Assume that all students take a placement test
which has a maximum score of 10 and a minimum
score of 0.
Assume that hundreds of students have taken the
test over the years and based on historical data
and the assumption that the past is an indicator
of the future, we develop a frequency
distribution that shows the expected probability
that a randomly-selected student will get a
particular score.
That frequency distribution is on the next slide.

6
This is a probability distributionof a discrete
variable

The bar height expresses the probability of each
outcome (x) occurring. But its not actually the
bars height that expresses the probability, its
the bars area as a percentage of the total area
of all the bars.

7
The probability distribution of a continuous
variable

Now lets assume that test scores dont have to
be integer values but they can be any real number
value from 0 to 10 (e.g., 1.23456, 6.667, 9.750,
etc. In other words, scores is now a continuous
variable instead of a discrete variable.
Now if we create a frequency distribution based
on a data set of scores, where each score can be
any real number that falls in the interval from 0
to 10, it might look like the image on the next
slide.

8
AP Scores
All the bars adjacent to each other make this
frequency distribution look like a hump which
we call the bell-shaped curve.
0 1 2 3 4 5
6 7 8 9 10
Scores
9
The probability distribution of a continuous
variable

Similar to a discrete probability distribution
There is (in theory) a bar for each possible
score (each possible value of x).
The probability of each possible score is still
represented by the area of that scores bar as a
percentage of the total bar area.
When you plot all the bars on the chart, they
form the total bar area.
The total bar area 1.0 meaning 100

10
The probability distribution of a continuous
variable

But in contrast to a discrete probability
distribution
The top edge of the bar area looks like a smooth
curve instead of a jagged-stair step formulation.
Because there is an infinite number of possible
scores, there is an infinite number of bars in
the bar area any one bar has a width of 0, (its
bar is really just a line).
Therefore, each bars area is (theoretically) 0.
Since probability is represented by area, the
probability of getting any one specific score is
(theoretically) 0.

11
Calculating probabilities of continuous
variables

Finding the probability of getting any particular
x (e. g. test score) when x is a continuous
variable is accomplished by finding the area of
an interval under the curve line.
Let say you want to find the probability of
getting a score of 5 on the test.
But we just learned that the probability of any
one specific score is 0. Therefore
P(x 5.0) 0
Therefore, we must approximate the P(x5) by
finding the P(4.9

12
AP Scores
We find the probability that (4.9 finding the percentage that the area in yellow
is of the total bar area.
0 1 2 3 4 5
6 7 8 9 10
Scores
The total bar area 100 or 1.0. What percentage
is the yellow area of the total bar area? Thats
the probability that 4.9
13
How do we find this area?

If our probability distribution had a flat top,
it would be easy, because the area would be a
rectangle, and we could find the area by
multiplying the width times the height.
A probability distribution with a flat top all
the way across is called a uniform distribution
every outcome value has the same chance of
occurrence (like rolling a single die).
Before we answer the question of what is P(4.9 x question.

14
Lets say class length is a continuous variable
with a uniform distribution
Every ending time between 50 and 52 minutes is
equally probable.
15
Whats the probability that class will last
longer than 51.5 minutes?

To calculate probabilities of continuous
variables, we calculate the area of the bars as
a percentage of the total area under the curve
line.
Since the area from 51.5 to 52 is ¼ of the total
area, the probability that x 51.5 is 25.

.5
P(x 51.5) .25
16
The normal distribution (does not have a flat
top)

But for most variables, the probability
distribution does not have a flat top but
instead looks like a bell curve

A distribution that has a symmetric, bell-shape
is called a normal distribution.
17
Characteristics of the normal distribution

Many variables are known to have this shape of
distribution (heights, weights, test scores,
rainfall, etc.)
The mean is at the highest point of the curve.
The mean, median, and mode are equal.

18
Characteristics of the normal distribution

The distribution is symmetric 50 of the
possible outcome values lie to the left of the
mean and 50 of the possible outcome values lie
to the right of the mean.
The tails extend to infinity in both directions
but never actually touch the horizontal axis.

19
Characteristics of the normal distribution

The standard deviation determines how wide the
curve is. A distribution curve with a low
standard deviation will be more pointed and
narrow that a distribution curve with a high
standard deviation indicating more variation in
the underlying data set.

20
Characteristics of the normal distribution

The total area (of the bars) under the curve line
is 100.
Recall the empirical rule that states that
68 of the area/possible outcome values will be
within 1 std. deviation of the mean,
95 of the area area/possible outcome values will
be within 2 std deviations of the mean and
99.7 of the area / possible outcome values will
be within 3 std deviations of the mean.

21
The formula for finding an area under a curve
when the curve line is not flat

If the frequency distribution was uniform, you
can just multiply the height times the width to
get an area of a bar.
But when the frequency distribution is normal (as
most are), you must use this probability density
formula to find the area of an interval under the
curve

Where
the mean ? std. deviation ? 3.14159
e 2.71828

22
If we were to use the probability density function

We would solve for f(x) when x 4.9, then we
would solve for f(x) when x5.1.
Then we would subtract the f(4.9) from f(5.1) to
get the area under the curve line in between 4.9
and 5.1.
That area would be expressed as a percentage of
the total area under the curve.
Since area probability, if that area was, say
12, then there would be a 12 chance that a
randomly-selected student would get a score that
was 4.9 and also

23
Example 1

The sitting height (from seat to top of head) of
drivers must be considered in the design of a new
car model. Men have sitting heights that are
normally distributed with a mean of 36.0 and
standard deviation of 1.4 inches. Engineers have
provided plans that can accommodate men with
sitting heights up to 38.8 inches but taller men
cannot fit. If a man is randomly selected, find
the probability that he has a sitting height less
than 38.8 inches. Based on that result, is the
current engineering design feasible?

24
What is the P(x

This question can be simplified down to what is
the probability that a randomly-selected male
individual will have a sitting height (x) that
is less than 38.8 inches

given that ? 36 and ? 1.4 inches?

Recall that probability can be found by finding
the area of an interval under a probability
distribution curve.

25
1. Draw a picture to visualize the
area/probability youre trying to find. Place ?
and x on the x-axis
What is this area (p) ??
? 36.0
X 38.8
26
2. Transform your x-value into a z-value. A
z-value is the standardized score on a
standardized distribution.

The population of interests distribution is
mapped to the standard normal distribution
27
This is the standard normal distribution.It has
(by definition) a mean of 0 and a standard
deviation of 1.
When you calculate a z-score for your x (38.8),
you are in essence, mapping or transforming the
? of your distribution (36) to 0 and mapping the
? of your distribution (1.4) to 1.
28
The calculation of z

Recall that z expresses the distance between ?
and x as a number of ?s.
Once we transform x to a z-score, we can use the
z-tables to lookup the area under the curve
the interval on the left side of the z-line. That
area equals the p-value the probability that x

x - ?
z
?
38.8 -36.0
z
1.4
z
2.00
29
3. Use z to lookup p
When z 2.00, p .9772
30
4. Refer to the drawing and write-in p
97.72 of the area is to the left of 38.8 so the
P(x P .9772
?
X
z
P(x 31
5. Make a conclusion statement

P(x
97.72 of men have sitting heights of 38.8 inches
or less and therefore 2.28 of men are going to
be too tall to fit into this car.
Now, lets do it in Excel.

32
Open the file DataSetsForCh6 and click on the
worksheet tab Sitting Heights
33
1. Fill in the values for x, ?, and ? C3
38.8 C4 36 C5 1.4
34
2. Calculate z C5 (C3-C4)/C5
35
3. Use Excels built-in normsdist( ) formula to
lookup the area under the curve that is to the
left of the z-line C7 normsdist(C6)
36
4. Refer back to the question to see if we want
the area to the left or to the right of the
z-line. Since we want the area less than 38.8,
we want the area on the left side, so p(x) our
p-value C8 C7
37
5. Fill in the p-value on the curve and write a
conclusion statement C9 97.72 of men have
sitting heights of 38.8 or less.
38
Example 2

Air Force ECES-II ejection seats were designed
for men weighing between 140 and 211 lbs. A
person who is above or below those weight limits
risks injury if ejected.
Nowadays, women pilots may be sitting in the
ejection seat. Given that womens weights are
normally distributed with a mean of 143 lbs and a
standard deviation of 29 lbs, what percentage of
women would have weights within those limits (of
140 to 211)?

39
1. Draw a picture to visualize the
area/probability youre trying to find. Place ?
and x on the x-axis
We want the area to the LEFT of the 211 line
and also to the RIGHT of the 140 line.
Area (p) ?
X 140
X 211
? 143
40
2. Use x to calculate each z
x - ?
x - ?
z
z
?
?
211 -143
140 -143
z
z
29
29
z
2.34
z
-0.10
41
3. Use z-.10 to lookup p
When z -.10, p .4588
-.10
.4588
42
3. Use z2.33 to lookup p
When z 2.33, p .9905
2.3
.9905
43
4. Refer to the drawing and write-in p
The total area up to this line is 99.05
The total area up to this line is 45.88
Area (p) 53.17
X 140
X 211
? 143
P(140 44
5. Make a conclusion statement

P(140
53.17 of women have weights between 140 and 211
lbs. This means that 46.83 of women do not have
weights between the current limits, so far too
many women would risk injury if ejection became
necessary.
Now lets do it in Excel.

45
Open the file DataSetsForCh6 and click on the
worksheet tab Womens Weights
46
1. Fill in the values for x, ?, and ? C3
140 C4 143 C5 29
47
2. Calculate z C5 (C3-C4)/C5
48
3. Use Excels built-in normsdist( ) formula to
lookup the area under the curve that is to the
left of the z -.10 line C7 normsdist(C6)
45.88 of the area is up to this line
49
4. Fill in the values for x, ?, and ? D3
211 D4 143 D5 29
50
5. Calculate z C5 (C3-C4)/C5
51
6. Use Excels built-in normsdist( ) formula to
lookup the area under the curve that is to the
left of the z 2.34 line D7 normsdist(D6)
99.05 area is up to this line
52
7. Subtract the areas to get the area between the
lines D8 D7-C7
53.17 is the area in between the two lines.
53.17
53
8. Write a conclusion statement.
54
Example 3 Grear Tire Co.

Suppose Grear Tire Company just developed a new
steel-belted radial tire. From actual road tests,
Grear estimates the average number a miles that a
tire should last (?) is 36,500, the distribution
is normally distributed, and the standard
deviation (?) of that distribution is 5,000.
What percentage of the tires can be expected to
last more than 40,000 miles?

55
Open the file DataSetsForCh6 and click on the
worksheet tab Grear Tire I
56
1. Fill in the values for x, ?, and ? C3
40000 C4 36500 C5 5000
57
2. Calculate z C5 (C3-C4)/C5
58
3. Use Excels built-in normsdist( ) formula to
lookup the area under the curve that is to the
left of the z .7 line C7 normsdist(C6)
59
4. Subtract the percentage area from 1 to get the
area to the right of the z .7 line c8 1 -
C7
60
5. Write a conclusion statement.
61
Homework 10