Title: REGRESSION
1REGRESSION
2Simple Linear Regression
- Simple Linear Regression Model
- Least Squares Method
- Coefficient of Determination
- Model Assumptions
- Testing for Significance
- Using the Estimated Regression Equation
- for Estimation and Prediction
- Computer Solution
3Simple Linear Regression Model
- Regression analysis is a statistical technique
that attempts to explain movements in one
variable, the dependent variable, as a function
of movements in a set of other variables, called
independent (or explanatory) variables through
the quantification of a single equation. - However, a regression result no matter how
statistically significant, cannot prove
causality. All regression analysis can do is test
whether a significant quantitative relationship
exists. - Model Assumption X and Y are linearly related.
4Simple Linear Regression Model
- The equation that describes how y is related
to x and - an error term is called the regression
model.
- The simple linear regression model is
y b0 b1x e
- where
- b0 and b1 are called parameters of the model,
- e is a random variable called the error term.
5Simple Linear Regression Equation
- The simple linear regression equation is
E(y) ?0 ?1x
- Graph of the regression equation is a straight
line.
- b0 is the y intercept of the regression line.
- b1 is the slope of the regression line.
- E(y) is the expected value of y for a given x
value.
6Simple Linear Regression Equation
- Positive Linear Relationship
Regression line
Intercept b0
Slope b1 is positive
7Simple Linear Regression Equation
- Negative Linear Relationship
Regression line
Intercept b0
Slope b1 is negative
8Simple Linear Regression Equation
Regression line
Intercept b0
Slope b1 is 0
9Estimated Simple Linear Regression Equation
- The estimated simple linear regression equation
- The graph is called the estimated regression
line.
- b0 is the y intercept of the line.
- b1 is the slope of the line.
10Least Squares Method
- Least Squares Criterion
- where
- yi observed value of the dependent variable
- for the ith observation
- yi estimated value of the dependent variable
- for the ith observation
- This regression technique that calculates the ?
so as to minimize the sum of the squared
residuals.
11The Least Squares Method
- Slope for the Estimated Regression Equation
- y-Intercept for the Estimated Regression Equation
- b0 y - b1x
- where
- xi value of independent variable for ith
observation - yi value of dependent variable for ith
observation - x mean value for independent variable
- y mean value for dependent variable
- n total number of observations
_
_
_
_
12Example XYZ Auto Sales
- Simple Linear Regression
- XYZ Auto periodically has a special week-long
sale. As part of the advertising campaign XYZ
runs one or more television commercials during
the weekend preceding the sale. Data from a
sample of 5 previous sales are shown below. - Number of TV Ads Number of Cars
Sold - 2
17 - 2
21 - 2
18 - 1
17 - 3
27
13Estimated Regression Equation
14Excel Output
15Scatter Diagram and Trend Line
16Relationship Among SST, SSR, SSE
.
observed
.
SSE
SST
estimated
SSR
mean
- where
- SST total sum of squares
- SSR sum of squares due to regression
- SSE sum of squares due to error
17Relationship Among SST, SSR, SSE
- Relationship Among SST, SSR, SSE
- SST SSR SSE
- where
- SST total sum of squares
- SSR sum of squares due to regression
- SSE sum of squares due to error
18Degrees of Freedom
- Relationship Among SST, SSR, SSE
- SST SSR SSE
- SST DF n-1
- SSR DF of independent variables
- SSE DF n - of independent variables (p) -1
19Relationship Among SST, SSR, SSE
20The Coefficient of Determination
- The coefficient of determination is the
proportion of the variability in the dependent
variable Y that is explained by X. - r2 SSR/SST
- where
- SST total sum of squares
- SSR sum of squares due to regression
- SSE sum of squares due to error
21Example XYZ Auto
- Coefficient of Determination
- r2 SSR/SST 50/72 .69
- The regression relationship is strong since 69
of the variation in number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.
22The Correlation Coefficient
- Sample Correlation Coefficient
-
- where
- b1 the slope of the estimated regression
- equation
23Example XYZ Auto Sales
- Sample Correlation Coefficient
-
- The sign of b1 in the equation
is . - rxy .8333
-
24Testing for Significance
To test for a significant regression
relationship, we must conduct a hypothesis test
to determine whether the value of b1 is zero.
Two tests are commonly used
t Test
F Test
and
Both the t test and F test require an estimate
of s 2, the variance of e in the regression
model.
25Testing for Significance
- An Estimate of s 2
- The mean square error (MSE) provides the estimate
- of s 2, and the notation s2 is also used.
- s2 MSE SSE/(n-p-1)
- where
26Testing for Significance
- An Estimate of s
- To estimate s we take the square root of s 2.
- The resulting s is called the standard error of
the estimate.
27Sampling Distribution of b1
- Sampling Distribution of b1
- Expected Value
- Standard Deviation
- Estimated Standard Deviation of b1 (Also referred
to as the standard error of b1
28Testing for Significance t Test
- Hypotheses
- H0 ?1 0
- Ha ?1 0
- Test Statistic
- Rejection Rule
- Reject H0 if t lt -t????or t gt t????
- where t??? is based on a t distribution with
- n p-1 degrees of freedom.
29Example XYZ Auto Sales
- t Test
- Hypotheses H0 ?1 0
- Ha ?1 0
- Rejection Rule
- For ? .05 and d.f. 3, t.025 3.182
- Reject H0 if t gt 3.182
- Test Statistics
- t 5/1.91 2.61
- Conclusions
- Do Not Reject H0
30Confidence Interval for ?1
- We can use a 95 confidence interval for ?1 to
test the hypotheses just used in the t test. - H0 is rejected if the hypothesized value of ?1
is not included in the confidence interval for
?1.
31Confidence Interval for ?1
- The form of a confidence interval for ?1 is
b1 is the point estimator
32Example XYZ Auto Sales
- Rejection Rule
- Reject H0 if 0 is not included in the
confidence interval for ?1. - 95 Confidence Interval for ?1
- 5 /- 3.182(1.91) 5 /- 6.07
- or -1.07 to 11.07
- Conclusion
- Cannot Reject H0
33Testing for Significance F Test
Hypotheses H0 ?1 0
Ha ?1 0 Test Statistic F
MSR/MSE MSRSSR/Regression Degrees of
Freedom MSRSSR/Number of Independent
Variables MSR MEAN SQUARE REGRESSION
34F- Test
- With only one independent variable, the F test
will provide the same conclusion as the t test. - Rejection Rule
- Reject H0 if F gt F?
- where F? is based on an F distribution with 1
d.f. in - the numerator and n - 2 d.f. in the denominator.
35Example XYZ Auto Sales
- F Test
- Hypotheses H0 ?1 0
- Ha ?1 0
- Rejection Rule
- For ? .05 and d.f. 1, 3 F.05
10.13 - Reject H0 if F gt 10.13.
- Test Statistic
- F MSR/MSE 50/7.33 6.81
- Conclusion
- We cannot reject H0.
36Some Cautions about theInterpretation of
Significance Tests
- Rejecting H0 b1 0 and concluding that the
- relationship between x and y is significant does
not enable us to conclude that a
cause-and-effect - relationship is present between x and y.
- Just because we are able to reject H0 b1 0
and - demonstrate statistical significance does not
enable - us to conclude that there is a linear
relationship - between x and y.
37Using the Estimated Regression Equationfor
Estimation and Prediction
Confidence Interval Estimate of E(yp)the mean or
expected value of the dependent variable y
corresponding to the given value
x_p Prediction Interval Estimate of yp yp
t?/2 sind where the confidence coefficient is
1 - ? and t?/2 is based on a t distribution
with n - 2 d.f.
38Using the Estimated Regression Equationfor
Estimation and Prediction
- Confidence Interval Estimate of E(yp)Standard
Deviation - Where s sqrt(MSE)2.708
- X_pThe particular or given value of the
independent variable x - Y-hat_pThe point estimate of E(yp) when xx_p
39CONFIDENCE INTERVAL
- Point Estimation
- If 3 TV ads are run prior to a sale, we expect
the mean number of cars sold to be - y 10 5(3) 25 cars
- Confidence Interval for E(yp)
- 95 confidence interval estimate of the mean
number of cars sold when 3 TV ads are run is - 25 (3.182)2.265 17.79 to 32.20 cars
40Prediction Interval
- Prediction Interval Estimate of yp
- yp t?/2 sind
- where the confidence coefficient is 1 - ? and
- t?/2 is based on a t distribution with n - 2
d.f.
41PREDICTION
- Prediction Interval for yp
- 95 prediction interval estimate of the number
of cars sold in one particular week (new
situation in the future same, population) when 3
TV ads are run is y 10 5(3) 25 cars - 25 (3.182)3.53 13.8 to 36.2 cars
42Some Cautions about theInterpretation of
Significance Tests
- Rejecting H0 b1 0 and concluding that the
relationship between x and y is significant does
not enable us to conclude that a cause-and-effect
relationship is present between x and y. - Just because we are able to reject H0 b1 0 and
demonstrate statistical significance does not
enable us to conclude that there is a linear
relationship between x and y.
43Assumptions About the Error Term ?
1. The error ? is a random variable with mean
of zero.
2. The variance of ? , denoted by ? 2, is the
same for all values of the independent
variable.
3. The values of ? are independent.
4. The error ? is a normally distributed
random variable.
44Residual
- The assumption of Constant Variance can be
checked by looking at residual versus fit plot - yi yi
45Residual Plot Against x
- If the assumption that the variance of e is the
same for all values of x is valid, and the
assumed regression model is an adequate
representation of the relationship between the
variables, then
The residual plot should give an overall
impression of a horizontal band of points
46Residual Plot Against x
Good Pattern
Residual
0
x
47CONSTANT VARIANCE
Residual
0
48Non Constant Variance
0
0
Residual
Residual
49Residual Plot Against x
Nonconstant Variance
Residual
0
x
50Residual Plot Against x
Model Form Not Adequate
Residual
0
x
51Example XYZ Auto Sales
52Standardized Residuals
- Method to test normal distribution assumption
(error term) - Standardized Residual For Observation i
- Where
- And
53Standardized Residuals
- If the assumption is satisfied we should expect
to see 95 of the standardized residuals between
2 and 2
54Influential Observation
55Continued
56Continued
- An influential observation has h that is greater
than 6/n. - In this case we do not have an Influential
observation.
57Standard Deviation of the ith Residual
- The Standard Error of the estimate S.77
58Standardized Residual for Observations i
- Ex y1.02.45(x)
- If the assumption is satisfied we should expect
to see 95 of the standardized residuals between
2 and 2
59Example With Excel
- Page 587 Problem 45
- Go to Excel, Select Tools, Choose Data Analysis,
Choose Regression from the list of Analysis
tools. Click OK. - Enter the Y input Range, Enter the X range,
select labels, select confidence levels. Select
Residuals, Residuals Plot, Standardized Residuals.
60(No Transcript)
61Output
62(No Transcript)
63(No Transcript)
64Checking for Outliers.
- We are going to use the scatter plot of x versus
y and the Standardized Residual versus the
predicted plot. The outlier will not fit the
trend shown by the remaining data.
65Leverage Observation
- We will detect Influential Observation using
- An influential observation has h that is greater
than 6/n
66Problem 51 Using Excel
- Consider the following data
- Go to Excel, Select Tools, Choose Data Analysis,
Choose Regression from the list of Analysis
tools. Click OK. - Enter the Y input Range, Enter the X range,
select labels, select confidence levels. Select
Residuals, Residuals Plot
67Continued
68Continued
69Continued
- We identify an observation as having high
leverage if hi gt 6/n for these data, 6/n 6/8
.75. Since the leverage for the observation x
22, y 19 is .76, We would identify observation
8 as a high leverage point. Thus, we conclude
that observation 8 is an influential observation.
70Continued (Excel)
The The last two observations in the data set
appear to be outliers since the standardized
residuals for these observations are 2.00 and
-2.16, respectively.
71Continued
The scatter diagram indicates that the
observation x 22, y 19 is an influential
observation.