REGRESSION

About This Presentation

Title:

REGRESSION

Description:

Graph of the regression equation is a straight line. ... b0 is the y intercept of the line. The graph is called the estimated regression line. ... – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 72

Provided by: tarekbu

Category:

more less

Transcript and Presenter's Notes

Title: REGRESSION

1
REGRESSION
2
Simple Linear Regression

Simple Linear Regression Model
Least Squares Method
Coefficient of Determination
Model Assumptions
Testing for Significance
Using the Estimated Regression Equation
for Estimation and Prediction
Computer Solution

3
Simple Linear Regression Model

Regression analysis is a statistical technique
that attempts to explain movements in one
variable, the dependent variable, as a function
of movements in a set of other variables, called
independent (or explanatory) variables through
the quantification of a single equation.
However, a regression result no matter how
statistically significant, cannot prove
causality. All regression analysis can do is test
whether a significant quantitative relationship
exists.
Model Assumption X and Y are linearly related.

4
Simple Linear Regression Model

The equation that describes how y is related
to x and
an error term is called the regression
model.

The simple linear regression model is

y b0 b1x e

where
b0 and b1 are called parameters of the model,
e is a random variable called the error term.

5
Simple Linear Regression Equation

The simple linear regression equation is

E(y) ?0 ?1x

Graph of the regression equation is a straight
line.

b0 is the y intercept of the regression line.

b1 is the slope of the regression line.

E(y) is the expected value of y for a given x
value.

6
Simple Linear Regression Equation

Positive Linear Relationship

Regression line
Intercept b0
Slope b1 is positive
7
Simple Linear Regression Equation

Negative Linear Relationship

Regression line
Intercept b0
Slope b1 is negative
8
Simple Linear Regression Equation

No Relationship

Regression line
Intercept b0
Slope b1 is 0
9
Estimated Simple Linear Regression Equation

The estimated simple linear regression equation

The graph is called the estimated regression
line.

b0 is the y intercept of the line.

b1 is the slope of the line.

10
Least Squares Method

Least Squares Criterion
where
yi observed value of the dependent variable
for the ith observation
yi estimated value of the dependent variable
for the ith observation
This regression technique that calculates the ?
so as to minimize the sum of the squared
residuals.

11
The Least Squares Method

Slope for the Estimated Regression Equation
y-Intercept for the Estimated Regression Equation
b0 y - b1x
where
xi value of independent variable for ith
observation
yi value of dependent variable for ith
observation
x mean value for independent variable
y mean value for dependent variable
n total number of observations

_
_
_
_
12
Example XYZ Auto Sales

Simple Linear Regression
XYZ Auto periodically has a special week-long
sale. As part of the advertising campaign XYZ
runs one or more television commercials during
the weekend preceding the sale. Data from a
sample of 5 previous sales are shown below.
Number of TV Ads Number of Cars
Sold
2
17
2
21
2
18
1
17
3
27

13
Estimated Regression Equation
14
Excel Output
15
Scatter Diagram and Trend Line
16
Relationship Among SST, SSR, SSE
.

observed
.
SSE

SST
estimated
SSR
mean

where
SST total sum of squares
SSR sum of squares due to regression
SSE sum of squares due to error

17
Relationship Among SST, SSR, SSE

Relationship Among SST, SSR, SSE
SST SSR SSE
where
SST total sum of squares
SSR sum of squares due to regression
SSE sum of squares due to error

18
Degrees of Freedom

Relationship Among SST, SSR, SSE
SST SSR SSE
SST DF n-1
SSR DF of independent variables
SSE DF n - of independent variables (p) -1

19
Relationship Among SST, SSR, SSE
20
The Coefficient of Determination

The coefficient of determination is the
proportion of the variability in the dependent
variable Y that is explained by X.
r2 SSR/SST
where
SST total sum of squares
SSR sum of squares due to regression
SSE sum of squares due to error

21
Example XYZ Auto

Coefficient of Determination
r2 SSR/SST 50/72 .69
The regression relationship is strong since 69
of the variation in number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.

22
The Correlation Coefficient

Sample Correlation Coefficient
where
b1 the slope of the estimated regression
equation

23
Example XYZ Auto Sales

Sample Correlation Coefficient
The sign of b1 in the equation
is .
rxy .8333

24
Testing for Significance
To test for a significant regression
relationship, we must conduct a hypothesis test
to determine whether the value of b1 is zero.
Two tests are commonly used
t Test
F Test
and
Both the t test and F test require an estimate
of s 2, the variance of e in the regression
model.
25
Testing for Significance

An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s2 MSE SSE/(n-p-1)
where

26
Testing for Significance

An Estimate of s
To estimate s we take the square root of s 2.
The resulting s is called the standard error of
the estimate.

27
Sampling Distribution of b1

Sampling Distribution of b1
Expected Value
Standard Deviation
Estimated Standard Deviation of b1 (Also referred
to as the standard error of b1

28
Testing for Significance t Test

Hypotheses
H0 ?1 0
Ha ?1 0
Test Statistic
Rejection Rule
Reject H0 if t lt -t????or t gt t????
where t??? is based on a t distribution with
n p-1 degrees of freedom.

29
Example XYZ Auto Sales

t Test
Hypotheses H0 ?1 0
Ha ?1 0
Rejection Rule
For ? .05 and d.f. 3, t.025 3.182
Reject H0 if t gt 3.182
Test Statistics
t 5/1.91 2.61
Conclusions
Do Not Reject H0

30
Confidence Interval for ?1

We can use a 95 confidence interval for ?1 to
test the hypotheses just used in the t test.
H0 is rejected if the hypothesized value of ?1
is not included in the confidence interval for
?1.

31
Confidence Interval for ?1

The form of a confidence interval for ?1 is

b1 is the point estimator
32
Example XYZ Auto Sales

Rejection Rule
Reject H0 if 0 is not included in the
confidence interval for ?1.
95 Confidence Interval for ?1
5 /- 3.182(1.91) 5 /- 6.07
or -1.07 to 11.07
Conclusion
Cannot Reject H0

33
Testing for Significance F Test
Hypotheses H0 ?1 0
Ha ?1 0 Test Statistic F
MSR/MSE MSRSSR/Regression Degrees of
Freedom MSRSSR/Number of Independent
Variables MSR MEAN SQUARE REGRESSION
34
F- Test

With only one independent variable, the F test
will provide the same conclusion as the t test.
Rejection Rule
Reject H0 if F gt F?
where F? is based on an F distribution with 1
d.f. in
the numerator and n - 2 d.f. in the denominator.

35
Example XYZ Auto Sales

F Test
Hypotheses H0 ?1 0
Ha ?1 0
Rejection Rule
For ? .05 and d.f. 1, 3 F.05
10.13
Reject H0 if F gt 10.13.
Test Statistic
F MSR/MSE 50/7.33 6.81
Conclusion
We cannot reject H0.

36
Some Cautions about theInterpretation of
Significance Tests

Rejecting H0 b1 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a
cause-and-effect
relationship is present between x and y.

Just because we are able to reject H0 b1 0
and
demonstrate statistical significance does not
enable
us to conclude that there is a linear
relationship
between x and y.

37
Using the Estimated Regression Equationfor
Estimation and Prediction
Confidence Interval Estimate of E(yp)the mean or
expected value of the dependent variable y
corresponding to the given value
x_p Prediction Interval Estimate of yp yp
t?/2 sind where the confidence coefficient is
1 - ? and t?/2 is based on a t distribution
with n - 2 d.f.
38
Using the Estimated Regression Equationfor
Estimation and Prediction

Confidence Interval Estimate of E(yp)Standard
Deviation
Where s sqrt(MSE)2.708
X_pThe particular or given value of the
independent variable x
Y-hat_pThe point estimate of E(yp) when xx_p

39
CONFIDENCE INTERVAL

Point Estimation
If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be
y 10 5(3) 25 cars
Confidence Interval for E(yp)
95 confidence interval estimate of the mean
number of cars sold when 3 TV ads are run is
25 (3.182)2.265 17.79 to 32.20 cars

40
Prediction Interval

Prediction Interval Estimate of yp
yp t?/2 sind
where the confidence coefficient is 1 - ? and
t?/2 is based on a t distribution with n - 2
d.f.

41
PREDICTION

Prediction Interval for yp
95 prediction interval estimate of the number
of cars sold in one particular week (new
situation in the future same, population) when 3
TV ads are run is y 10 5(3) 25 cars
25 (3.182)3.53 13.8 to 36.2 cars

42
Some Cautions about theInterpretation of
Significance Tests

Rejecting H0 b1 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y.
Just because we are able to reject H0 b1 0 and
demonstrate statistical significance does not
enable us to conclude that there is a linear
relationship between x and y.

43
Assumptions About the Error Term ?
1. The error ? is a random variable with mean
of zero.
2. The variance of ? , denoted by ? 2, is the
same for all values of the independent
variable.
3. The values of ? are independent.
4. The error ? is a normally distributed
random variable.
44
Residual

The assumption of Constant Variance can be
checked by looking at residual versus fit plot
yi yi

45
Residual Plot Against x

If the assumption that the variance of e is the
same for all values of x is valid, and the
assumed regression model is an adequate
representation of the relationship between the
variables, then

The residual plot should give an overall
impression of a horizontal band of points
46
Residual Plot Against x
Good Pattern
Residual
0
x
47
CONSTANT VARIANCE

Residual

0

48
Non Constant Variance

0

0
Residual
Residual

49
Residual Plot Against x
Nonconstant Variance
Residual
0
x
50
Residual Plot Against x
Model Form Not Adequate
Residual
0
x
51
Example XYZ Auto Sales

Residual Plot

52
Standardized Residuals

Method to test normal distribution assumption
(error term)
Standardized Residual For Observation i
Where
And

53
Standardized Residuals

If the assumption is satisfied we should expect
to see 95 of the standardized residuals between
2 and 2

54
Influential Observation

55
Continued

56
Continued

An influential observation has h that is greater
than 6/n.
In this case we do not have an Influential
observation.

57
Standard Deviation of the ith Residual

The Standard Error of the estimate S.77

58
Standardized Residual for Observations i

Ex y1.02.45(x)
If the assumption is satisfied we should expect
to see 95 of the standardized residuals between
2 and 2

59
Example With Excel

Page 587 Problem 45
Go to Excel, Select Tools, Choose Data Analysis,
Choose Regression from the list of Analysis
tools. Click OK.
Enter the Y input Range, Enter the X range,
select labels, select confidence levels. Select
Residuals, Residuals Plot, Standardized Residuals.

60
(No Transcript)
61
Output

Excel Output

62
(No Transcript)
63
(No Transcript)
64
Checking for Outliers.

We are going to use the scatter plot of x versus
y and the Standardized Residual versus the
predicted plot. The outlier will not fit the
trend shown by the remaining data.

65
Leverage Observation

We will detect Influential Observation using
An influential observation has h that is greater
than 6/n

66
Problem 51 Using Excel

Consider the following data
Go to Excel, Select Tools, Choose Data Analysis,
Choose Regression from the list of Analysis
tools. Click OK.
Enter the Y input Range, Enter the X range,
select labels, select confidence levels. Select
Residuals, Residuals Plot

67
Continued
68
Continued
69
Continued

We identify an observation as having high
leverage if hi gt 6/n for these data, 6/n 6/8
.75. Since the leverage for the observation x
22, y 19 is .76, We would identify observation
8 as a high leverage point. Thus, we conclude
that observation 8 is an influential observation.

70
Continued (Excel)
The The last two observations in the data set
appear to be outliers since the standardized
residuals for these observations are 2.00 and
-2.16, respectively.
71
Continued
The scatter diagram indicates that the
observation x 22, y 19 is an influential
observation.

Write a Comment

User Comments (0)