Descriptive methods in regression and correlation - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Descriptive methods in regression and correlation

Description:

4.1 Linear Equations with One Independent Variable ... intercept form you probably saw in algebra class, but with different letters. ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 36

Provided by: grif54

Category:

more less

Transcript and Presenter's Notes

Title: Descriptive methods in regression and correlation

1
MATH 1530 Elements of StatisticsDr. Kirsten Boyd

Chapter 4
Descriptive methods in regression and correlation

Slides adapted from Ms Smyth, Dr. Griffy and the
Weiss Text
2
(No Transcript)
3
Sec. 4.1 Linear Equations with One Independent
Variable

So the graph of that equation is a straight line
with y-intercept b0 and slope b1. This is the
same as the ymxb slope-intercept form you
probably saw in algebra class, but with different
letters.
example y 5 2x
y-intercept, b0, is?
slope, b1, is?
Graph this line

4
Slope
Word problem interpretation of slope Whenever
x increases by one unit, y increases by b1 units
(or decreases if b1 is negative, or stays the
same if b1 is zero).
5
Problem 4.6, page 160

A repair shop charges 55 per hour plus a 30
service charge. Let x denote the number of hours
required for the job and let y denote the total
cost to the customer.
Part a. Find the equation that expresses y in
terms of x.
Part b. Determine b0 and b1 .
Part c. Construct a table like Table 4.1 on p.
157 for the x-values 0.5, 1, and 2.25 hours.
Part d. Draw the graph of the equation from Part
a. by plotting the points from Part c. and
connecting with a line.
Part e. Use the graph from Part d. to estimate
visually the cost of a job that takes 1.75 hours.
Then calculate the cost exactly, using the
equation from Part a.

6
Sec. 4.2 The Regression Equation

Regression Equations explain a (linear) pattern
in scatterplot data
x is the explanatory or predictor variable
y is the response variable

7
Scatterplot(Table 4.2 and Fig. 4.7, page 162)
8
Regression Equation

The goal is to construct a line with the smallest
possible distances from the data points to the
corresponding points on the line.
This line is the graph of the regression
equation.

9
Example 4.3 (p. 163) Which Line Is Better?
10
Example 4.3 Comparing Lines
Sum of the squared errors is less for B than for
A. Line B is a better fit than A. error e
y- y
Notation For any x-value, y is actual value and
y is value from line.
11
Best Fitting Line Possible
12
Computing the Regression Equation
13
Use of Regression Equations

Regression equation models (not perfectly) data
x-values (explanatory variable) predict values of
y (response variable)

WARNING You can predict accurately only within
the spread of the x-values.
14
Extrapolation

Using an x-value outside the range of data is
unacceptable because the trend could change.
Making predictions for x-values outside the range
is called extrapolation, and should be avoided.

15
Extrapolation
16
Outliers and Influential Observations

Outlier is a data point that lies vertically far
from the regression line relative to the other
points
Influential Observation is a data point that lies
horizontally far from the rest of the data and
whose removal will considerably change the slope
of the regression line

17
Outliers and Influential Observations(Fig. 4.12,
p. 169)
18
Data Must Be Linear
19
Problem 4.53, page 174
20
4.53 a
21
4.53 b
22
4.53 c-g

Emission increases as weight increases
?y/?x ?Emissions/?Potato Weight 0.16/1
For each gram of potato plant, the emissions
increase 0.16 hundred nanograms, which is 16
nanograms.
y 3.5240.1628 75 15.73
Predictor x weight of potato plant
Response y emission quantity
none

23
Finding the Regression Equation Using Your
Calculator

1. Enter x-values in L1 and y-values in L2 (or
alternatively, use INS to insert new lists with
whatever names you want)
2. Stat gt Calc gt 8 LinReg(abx) gt
3. Use LIST to enter L1, L2 (or whatever the
names of your lists are)dont forget the comma
between themthen press Enter
4. Calculator tells you a and b, which
correspond to b0 and b1 in book (equation is y
a bx b0 b1x)

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
24
Sec. 4.3 The Coefficient of Determination

The coefficient of determination is denoted r2
Always between 0 and 1
Measures how well the regression equation
describes the relationship between x and y
Close to 0 (0 to 0.4) gt regression is not useful
Close to 1 (0.6 to 1) gt regression is useful

25
Formulas for r2
26
SSE
Error of Sum of Squares, SSE, is the variation in
the observed values of y that is not explained by
the regression.
SSE SST-SSR
27
Coefficient of Determination, r2

To calculate r2, use your calculatorfollow same
instructions as for obtaining regression
equation.
We will not calculate r2 by hand.

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
28
Percentage of Variation

The percentage of variation in the y-values that
is explained by the variation in the x-values is
r2 100

29
4.91 (p. 185, same data as 4.53)

Ignore (a) (computing SSR, SST, SSE) in all of
Sec 4.3
0.1096
r2 100 10.96
Not useful

30
Sec. 4.4 Linear Correlation
31
The Linear Correlation Coefficient, r

The sign of r indicates the slope of the data
positive r means positive relationship
negative r-value means negative relationship
The magnitude of r indicates the strength of the
linear relationship (magnitude how far from
zero)

Weak gt between -0.6 and 0.6 Strong gt less than
-0.75 or more than 0.75
32
The Linear Correlation Coefficient, r

Always between -1 and 1, inclusive
Sign of r is same as sign of b1 (slope of
regression line)
Square r and you should always get r2
To find r, use calculatorfollow same
instructions as for finding regression equation
and r2

Be sure to turn your diagnostic on Catalog gt
Diagnostic On gt Enter (Catalog is the 2nd
function above the 0 key)
33
Linear Relationships
34
Interpreting r

r only has meaning if the data is linear
r can be computed for nonlinear data, but data
may not be linear even if r is strong

35
4.125 (p. 194, same data as 4.53 and 4.91)

Ignore using the computing formula part of (a)
in all of Sec. 4.4 and use your calculator
r 0.3311
Weak, positive, linear relationship
Very scattered, not close to line
r2 0.33112 0.1096 correlation coefficient

Write a Comment

User Comments (0)