Title: Outline
1Outline
- Least Squares Methods
- Estimation Least Squares
- Interpretation of estimators
- Properties of OLS estimators
- Variance of Y, b, and a
- Hypothesis Test of b and a
- ANOVA table
- Goodness-of-Fit and R2
2Linear regression model
3Terminology
- Dependent variable (DV) response variable
left-hand side (LHS) variable - Independent variables (IV) explanatory
variables right-hand side (RHS) variables
regressor (excluding a or b0) - a (b0) is an estimator of parameter a, ß0
- b (b1) is an estimator of parameter ß, ß1
- a and b are the intercept and slope
4Least Squares Method
- How to draw such a line based on data points
observed? - Suppose a imaginary line of y a bx
- Imagine a vertical distance (or error) between
the line and a data point. EY-E(Y) - This error (or gap) is the deviation of the data
point from the imaginary line, regression line - What is the best values of a and b?
- A and b that minimizes the sum of such errors
(deviations of individual data points from the
line)
5Least Squares Method
6Least Squares Method
- Deviation does not have good properties for
computation - Why do we use squares of deviation? (e.g.,
variance) - Let us get a and b that can minimize the sum of
squared deviations rather than the sum of
deviations. - This method is called least squares
7Least Squares Method
- Least squares method minimizes the sum of squares
of errors (deviations of individual data points
form the regression line) - Such a and b are called least squares estimators
(estimators of parameters a and ß). - The process of getting parameter estimators
(e.g., a and b) is called estimation - Regress Y on X
- Lest squares method is the estimation method of
ordinary least squares (OLS)
8Ordinary Least Squares
- Ordinary least squares (OLS)
- Linear regression model
- Classical linear regression model
- Linear relationship between Y and Xs
- Constant slopes (coefficients of Xs)
- Least squares method
- Xs are fixed Y is conditional on Xs
- Error is not related to Xs
- Constant variance of errors
9Least Squares Method 1
How to get a and b that can minimize the sum of
squares of errors?
10Least Squares Method 2
- Linear algebraic solution
- Compute a and b so that partial derivatives with
respect to a and b are equal to zero
11Least Squares Method 3
Take a partial derivative with respect to b and
plug in a you got, aYbar bXbar
12Least Squares Method 4
Least squares method is an algebraic solution
that minimizes the sum of squares of errors
(variance component of error)
Not recommended
13OLS Example 10-5 (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)2
1 43 128 -14.5 -8.5 123.25 210.25
2 48 120 -9.5 -16.5 156.75 90.25
3 56 135 -1.5 -1.5 2.25 2.25
4 61 143 3.5 6.5 22.75 12.25
5 67 141 9.5 4.5 42.75 90.25
6 70 152 12.5 15.5 193.75 156.25
Mean 57.5 136.5
Sum 345 819 541.5 561.5
14OLS Example 10-5 (2), NO!
No x y xy x2
1 43 128 5504 1849
2 48 120 5760 2304
3 56 135 7560 3136
4 61 143 8723 3721
5 67 141 9447 4489
6 70 152 10640 4900
Mean 57.5 136.5
Sum 345 819 47634 20399
15OLS Example 10-5 (3)
16What Are a and b ?
- a is an estimator of its parameter a
- a is the intercept, a point of y where the
regression line meets the y axis - b is an estimator of its parameter ß
- b is the slope of the regression line
- b is constant regardless of values of Xs
- b is more important than a since that is what
researchers want to know.
17How to interpret b?
- For unit increase in x, the expected change in y
is b, holding other things (variables) constant. - For unit increase in x, we expect that y
increases by b, holding other things (variables)
constant. - For unit increase in x, we expect that y
increases by .964, holding other variables
constant.
18Properties of OLS estimators
- The outcome of least squares method is OLS
parameter estimators a and b. - OLS estimators are linear
- OLS estimators are unbiased (precise)
- OLS estimators are efficient (small variance)
- Gauss-Markov Theorem Among linear unbiased
estimators, least square estimator (OLS
estimator) has minimum variance. ?BLUE (best
linear unbiased estimator)
19Hypothesis Test of a an b
- How reliable are a and b we compute?
- T-test (Wald test in general) can answer
- The standardized effect size (effect size /
standard error) - Effect size is a-0 and b-0 assuming 0 is the
hypothesized value H0 a0, H0 ß0 - Degrees of freedom is N-K, where K is the number
of regressors 1 - How to compute standard error (deviation)?
20Variance of b (1)
- b is a random variable that changes across
samples. - b is a weighted sum of linear combinations of
random variable Y
21Variance of b (2)
- Variance of Y (error) is s2
- Var(kY) k2Var(Y) k2s2
22Variance of a
- aYbar bXbar
- Var(b)s2/SSx , SSx ?(X-Xbar)2
- Var(?Y)Var(Y1)Var(Y2)Var(Yn)ns2
Now, how do we compute the variance of Y, s2?
23Variance of Y or error
- Variance of Y is based on residuals (errors),
Y-Yhat - Hat means an estimator of the parameter
- Y hat is predicted (by a bX) value of Y plug
in x given a and b to get Y hat - Since a regression model includes K parameters (a
and b in simple regression), the degrees of
freedom is N-K - Numerator is SSE in the ANOVA table
24Illustration (1)
No x y x-xbar y-ybar (x-xb)(y-yb) (x-xbar)2 yhat (y-yhat)2
1 43 128 -14.5 -8.5 123.25 210.25 122.52 30.07
2 48 120 -9.5 -16.5 156.75 90.25 127.34 53.85
3 56 135 -1.5 -1.5 2.25 2.25 135.05 0.00
4 61 143 3.5 6.5 22.75 12.25 139.88 9.76
5 67 141 9.5 4.5 42.75 90.25 145.66 21.73
6 70 152 12.5 15.5 193.75 156.25 148.55 11.87
Mean 57.5 136.5
Sum 345 819 541.5 561.5 127.2876
SSE127.2876, MSE31.8219
25Illustration (2) Test b
- How to test whether beta is zero (no effect)?
- Like y, a and ß follow a normal distribution a
and b follows the t distribution - b.9644, SE(b).2381,dfN-K6-24
- Hypothesis Testing
- 1. H0ß0 (no effect), Haß?0 (two-tailed)
- 2. Significance level.05, CV2.776, df6-24
- 3. TS(.9644-0)/.23814.0510t(N-K)
- 4. TS (4.051)gtCV (2.776), Reject H0
- 5. Beta (not b) is not zero. There is a
significant impact of X on Y
26Illustration (3) Test a
- How to test whether alpha is zero?
- Like y, a and ß follow a normal distribution a
and b follows the t distribution - a81.0481, SE(a)13.8809, dfN-K6-24
- Hypothesis Testing
- 1. H0a0, Haa?0 (two-tailed)
- 2. Significance level.05, CV2.776
- 3. TS(81.0481-0)/.13.88095.8388t(N-K)
- 4. TS (5.839)gtCV (2.776), Reject H0
- 5. Alpha (not a) is not zero. The intercept is
discernable from zero (significant intercept).
27Questions
- How do we test H0 ß0(a)ß1ß2 0?
- Remember that t-test compares only two group
means, while ANOVA compares more than two group
means simultaneously. - The same thing in linear regression.
- Construct the ANOVA table by partitioning
variance of Y F test examines the above H0 - The ANOVA table provides key information of a
regression model
28Partitioning Variance of Y (1)
29Partitioning Variance of Y (2)
30Partitioning Variance of Y (3)
81.96X
No x y yhat (y-ybar)2 (yhat-ybar)2 (y-yhat)2
1 43 128 122.52 72.25 195.54 30.07
2 48 120 127.34 272.25 83.94 53.85
3 56 135 135.05 2.25 2.09 0.00
4 61 143 139.88 42.25 11.39 9.76
5 67 141 145.66 20.25 83.94 21.73
6 70 152 148.55 240.25 145.32 11.87
Mean 57.5 136.5 SST SSM SSE
Sum 345 819 649.5000 522.2124 127.2876
- 122.5281.9643, 148.6.81.9670
- SSTSSMSSE, 649.5522.2127.3
31ANOVA Table
- H0 all parameters are zero, ß0 ß1 0
- Ha at least one parameter is not zero
- CV is 12.22 (1,4), TSgtCV, reject H0
Sources Sum of Squares DF Mean Squares F
Model SSM K-1 MSMSSM/(K-1) MSM/MSE
Residual SSE N-K MSESSE/(N-K)
Total SST N-1
Sources Sum of Squares DF Mean Squares F
Model 522.2124 1 522.2124 16.41047
Residual 127.2876 4 31.8219
Total 649.5000 5
32R2 and Goodness-of-fit
- Goodness-of-fit measures evaluates how well a
regression model fits the data - The smaller SSE, the better fit the model
- F test examines if all parameters are zero.
(large F and small p-value indicate good fit) - R2 (Coefficient of Determination) is SSM/SST that
measures how much a model explains the overall
variance of Y. - R2SSM/SST522.2/649.5.80
- Large R square means the model fits the data
33Myth and Misunderstanding in R2
- R square is Karl Pearson correlation coefficient
squared. r2.89672.80 - If a regression model includes many regressors,
R2 is less useful, if not useless. - Addition of any regressor always increases R2
regardless of the relevance of the regressor - Adjusted R2 give penalty for adding regressors,
Adj. R21-(N-1)/(N-K)(1-R2) - R2 is not a panacea although its interpretation
is intuitive if the intercept is omitted, R2 is
incorrect. - Check specification, F, SSE, and individual
parameter estimators to evaluate your model A
model with smaller R2 can be better in some cases.
34Interpolation and Extrapolation
- Confidence interval of E(YX), where x is within
the rage of data x interpolation - Confidence interval of YX, where x is beyond the
range of data x extrapolation - Extrapolation involves penalty and danger, which
widens the confidence interval less reliable