Section 7.3 ~ Best-Fit Lines and Prediction - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Section 7.3 ~ Best-Fit Lines and Prediction

Description:

Number of Views:151

Avg rating:3.0/5.0

Slides: 12

Provided by: Sandi97

Category:

more less

Transcript and Presenter's Notes

Title: Section 7.3 ~ Best-Fit Lines and Prediction

1
Section 7.3 Best-Fit Lines and Prediction

2
Objective
Sec. 7.3

3
Line of Best-Fit
Sec. 7.3

The best-fit line (or regression line) on a
scatterplot is a line that lies closer to the
data points than any other possible line
This can be useful to make predictions based on
existing data
The line of best-fit should have approximately
the same number of points above it as it has
below it and it does not have to start at the
origin
The precise line of best-fit can be calculated by
hand, but is very tedious so often times it is
estimated by eye or by using a calculator

4
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3

Dont expect a best-fit line to give a good
prediction unless the correlation is strong and
there are many data points
If the sample points lie very close to the
best-fit line, the correlation is very strong and
the prediction is more likely to be accurate
If the sample points lie away from the best-fit
line by substantial amounts, the correlation is
weak and predictions tend to be much less
accurate

5
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3

Dont use a best-fit line to make predictions
beyond the bounds of the data points to which the
line was fit
Ex. The diagram below represents the
relationship between candle length and burning
time. The data that was collected dealt with
candles that all fall between 2 in. and 4 in.
Using the line of best fit to make a prediction
far off from these lengths would most likely be
inappropriate.
According to the line of best-fit, a candle with
a length of 0 in. burns for 2 minutes, an
impossibility

6
Cautions in Making Predictions from Best-Fit Lines
Sec. 7.3

A best-fit line based on past data is not
necessarily valid now and might not result in
valid predictions of the future
Ex. Economists studying historical data found
a strong correlation between unemployment and the
rate of inflation. According to this
correlation, inflation should have risen
dramatically in the recent years when the
unemployment rate fell below 6. But inflation
remained low, showing that the correlation from
old data did not continue to hold.
Dont make predictions about a population that is
different from the population from which the
sample data were drawn
Ex. you cannot expect that the correlation
between aspirin consumption and heart attacks in
an experiment involving only men will also apply
to women
Remember that a best-fit line is meaningless when
there is no significant correlation or when the
relationship is nonlinear
Ex. there is no correlation between shoe size
and IQ, so even though you can draw a line of
best-fit, it is useless in making any conclusions

7
Example 1
Sec. 7.3

State whether the prediction (or implied
prediction) should be trusted in
each of the following cases, and explain why or
why not.
Youve found a best-fit line for a correlation
between the number of hours per day that people
exercise and the number of calories they consume
each day. Youve used this correlation to predict
that a person who exercises 18 hours per day
would consume 15,000 calories per day.
This prediction would be beyond the bounds of the
data collected and should therefore not be
trusted
There is a well-known but weak correlation
between SAT scores and college grades. You use
this correlation to predict the college grades of
your best friend from her SAT scores.
Since the correlation is weak, that means that
there is much scatter in the data and you should
not expect great accuracy in the prediction
Historical data have shown a strong negative
correlation between birth rates in Russia and
affluence. That is, countries with greater
affluence tend to have lower birth rates. These
data predict a high birth rate in Russia.
We cannot automatically assume that the
historical data still apply today. In fact,
Russia currently has a very low birth rate,
despite also having a low level of affluence.

8
Example 1 Contd
Sec. 7.3

A study in China has discovered correlations that
are useful in designing museum exhibits that
Chinese children enjoy. A curator suggests using
this information to design a new museum exhibit
for Atlanta-area school children.
The suggestion to use information from the
Chinese study for an Atlanta exhibit assumes that
predictions made from correlations in China also
apply to Atlanta. However, given the cultural
differences between China and Atlanta, the
curators suggestion should not be considered
without more information to back it up.
Scientific studies have shown a very strong
correlation between childrens ingesting of lead
and mental retardation. Based on this
correlation, paints containing lead were banned
Given the strength of the correlation and the
severity of the consequences, this prediction and
the ban that followed seem quite reasonable. In
fact, later studies established lead as an actual
cause of mental retardation, making the rationale
behind the ban even stronger.

9
The Correlation Coefficient and Best-Fit Lines
Sec. 7.3

Recall that the correlation coefficient (r)
refers to the strength of a correlation
The correlation coefficient can also be used to
say something about the validity of predictions
with best-fit lines
The coefficient of determination, r², is the
proportion of the variation in a variable that is
accounted for by the best-fit line
Ex. The correlation coefficient for the diamond
weight and price from the scatterplot on p.307 is
r 0.777, so r² 0.604. This means that about
60 of the variation in the diamond prices is
accounted for by the best-fit line relating
weight and price and 40 of the variation in
price must be due to other factors.

10
Example 2
Sec. 7.3

You are the manager of a large department store.
Over the years, youve found a reasonably strong
positive correlation between your September sales
and the number of employees youll need to hire
for peak efficiency during the holiday season.
The correlation coefficient is 0.950. This year
your September sales are fairly strong. Should
you start advertising for help based on the
best-fit line?
r² 0.903, which means that 90 of the variation
in the number of peak employees can be accounted
for by a linear relationship with September
sales, leaving only 10 unaccounted for
Because 90 is so high, it is a good idea to
predict the number of employees youll need using
the best-fit line

11
Multiple Regression
Sec. 7.3

Multiple regression is a technique that allows us
to find a best-fit equation relating one variable
to more than one other variable
Ex. Price of diamonds in comparison to carat,
cut, clarity, and color
The coefficient of determination (R²) is the most
common measure in a multiple regression
This tells us how much of the scatter in the data
is accounted for by the best-fit equation
If R²is close to 1, the best-fit equation should
be very useful for making predictions within the
range of the data
If R²is close to 0, the predictions are
essentially useless