Title: Inference
1Chapter 5
October 4, 1997
Inference in the Simple Regression Model
2 Assumptions of the Simple Linear Regression
Model
- 1. yt ?1 ?2x t e t
- 2. E(e t) 0 E(yt) ?1 ?2x t
- 3. var(e t ) ??2 var(yt)
- 4. cov(e t , e s ) cov(y t , y s ) 0
- 5. x t not constant for every observation
- 6. e t N(0,??2) ytN(?1 ?2x t,??2)
3Probability Distribution of Least Squares
Estimators
4 Error Variance Estimation
Unbiased estimator of the error variance
Has a chi-square distribution
?2
5Simple Linear Regression
- yt ?1 ?2x t e t????where E e t 0
- yt N(?1 ?2x t , ??2)
since Eyt ?1 ?2x t - e t ?yt ? ?1 ? ?2x t
- Therefore, e t N(0,??2) .
6The Chi-Square Distribution
7In the Regression Model
e t N(0,??2) ?e t /???? N(0,?)
Standard Normal . ?e t /????? ???????
Chi-Square .
8Sum of Chi-Squares
- ?t 1?e t /??????e1 /??????????????eT /????
- Uncorrelated normals are independent
- ????? ????? . . .????????? ?????
- ?t 1?e t /???????????????
9Chi-Square degrees of freedom
- The true errors e t ?yt ? ?1 ? ?2x t
are not observable. - Unlike the errors, the least squares residuals
arenot independent since they use up two
degreesof freedom by using b1 and b2 to estimate
?1 and ?2. -
10(No Transcript)
11Create a standardized normal random variable, Z,
by subtracting the mean of b2 and dividing by
its standard deviation
12The t-Distribution Ch 2.7
13(No Transcript)
14(No Transcript)
15 notice the cancellations
16(No Transcript)
17Students t - statistic
t has a Student-t Distribution with T??2
degrees of freedom.
18Figure 5.1 Student-t Distribution
f(t)
(???)
?/2
?/2
0
-tc
t
tc
See Table 2 at Front of Book
19Student-t vs. Normal Distribution
- 1. Both are symmetric bell-shaped
distributions. - 2. Student-t distribution has fatter tails than
the normal. - 3. Student-t converges to the normal for
infinite sample. - 4. Student-t conditional on degrees of freedom
(df). - 5. Normal is a good approximation of Student-t
for the first few decimal places when df 30 or
so.
20Probability statements
P( t tc ) ???
P(-tc
21Deriving a Confidence Interval
22(No Transcript)
23Confidence Intervals
A (1??)x100 C.I. for ?1
b1 ? t?/2se(b1), b1 t?/2se(b1)
A (1??)x100 C.I. for ?2
b2 ? t?/2se(b2), b2 t?/2se(b2)
24Text Example
Where is ?2? In or out? Who knows? BUT,
procedure we used works 95 of time.
25data food
create dataset infile 'table3.1'
open data file input y x
read
variables proc reg
estimate regression model model y
x/covb specify
the model Section 5.1.2 data interval
create dataset b2
.128289
b2 seb2 .03053925
se(b2) df 38
degrees of freedom tc
tinv(.975,df)
t-critical value lb b2 - tcseb2
lower bound ub b2 tcseb2
upper bound proc
print
print results run
26Model MODEL1 Dependent Variable Y
Analysis of Variance
Sum of
Mean Source DF
Squares Square F Value ProbF
Model 1 25221.22299
25221.22299 17.647 0.0002
Error 38 54311.33145 1429.24556
C Total 39 79532.55444
Root MSE 37.80536
R-square 0.3171 Dep
Mean 130.31300 Adj R-sq 0.2991
C.V. 29.01120
Parameter Estimates
Parameter
Standard T for H0 Variable DF
Estimate Error Parameter0 Prob
T INTERCEP 1 40.767556
22.13865442 1.841 0.0734
X 1 0.128289 0.03053925
4.201 0.0002
Covariance of Estimates
COVB INTERCEP
X INTERCEP
490.12001955 -0.650986935
X -0.650986935
0.000932646
OBS B2 SEB2 DF TC
LB UB 1 0.12829 0.030539 38
2.02439 0.066466 0.19011
Interval Estimate for ?2
27What the heck is Confidence?
- We are confident in the procedure used to create
the interval estimates. - When procedure used in many samples of size T,
then (1-?) of constructed intervals will contain
? - Any one interval, based on one sample of size T,
may or may not contain ?
28Experiment Collect 10 samples of size T40 and
estimate Food expenditure model. Estimates
change from sample to sample.
29Construct Interval Estimates (95 Confidence
Intervals) for the Ten Samples. Note how their
location and width changes from sample to sample
(lblower bound, ubupper bound
30Hypothesis Tests
- 1. A null hypothesis, H0.
- 2. An alternative hypothesis, H1.
- 3. A test statistic.
- 4. A rejection region.
31The Null hypothesis
This hypothesis is a belief we maintain until
proven otherwise
32The Alternative Hypotheses
33The Test Statistic
If null hypothesis true
If null hypothesis not true t has some other
probability distribution
34Two-Tailed Test
f(t)
reject
reject
do not reject
?/2
?/2
0
-tc
t
tc
Probability of a Type I error ?
35One-Tailed Test
f(t)
reject
do not reject
?
0
t
tc
Note ignore left tail
Probability of a Type I error ?
36Rejection Rules
- 1. Two-Sided Test If the value of the test
statistic falls in the critical region in either
tail of the t-distribution, then we reject the
null hypothesis in favor of the alternative.
Otherwise, we do not reject the null hypothesis. - 2. Left-Tail TestIf the value of the test
statistic falls in the critical region which lies
in the left tail of the t-distribution, then we
reject the null hypothesis in favor of the
alternative. Otherwise, we do not reject the null
hypothesis - 3. Right-Tail TestIf the value of the test
statistic falls in the critical region which lies
in the right tail of the t-distribution, then we
reject the null hypothesis in favor of the
alternative. Otherwise, we do not reject the null
hypothesis
37Failure to Reject does not mean Hypothesis is
True!
When we fail to reject the null hypothesis, that
does not mean that we can conclude that the null
hypothesis is true.
Failure to reject the null hypothesis is a rather
weak conclusion since it only means that the data
are compatible with the null hypothesis.
If the null hypothesis ?20 is not rejected, then
the null hypotheses ?20.1, ?2-.2, etc. may well
be compatible with the data too, so tests of
these hypotheses would not be rejected either.
38p-value for a 2-tail test
p-value .000155
P/2.0000775
p/2.0000775
Ignore scale
t4.20
t -4.20
39Using p-values
When the p-value is less than or equal to ?, the
level of significance of the test, we reject H0
and accept H1.
When the p-value is greater than ?, we do not
reject H0.
Logic if p ? ? then t-critical value is inside
computed t-value. Since t ? tc, or t ? - tc, we
reject H0.
40Where is ?.05 critical value?
p-value .000155, or about .0002
ignore scale
t4.20
t -4.20
41- We make a correct decision if
- The null hypothesis is false and we decide to
reject it. - The null hypothesis is true and we decide not to
reject it. - Our decision is incorrect if
- The null hypothesis is true and we decide to
reject it. This is a type I error. - The null hypothesis is false and we decide not to
reject it. This is a type II error.
42Type I and Type II errors
- Type I errorWe make the mistake of rejecting
the null hypothesis when it is true.?
P(rejecting H0 when it is true). - Type II errorWe make the mistake of failing to
reject the null hypothesis when it is false.?
P(failing to reject H0 when it is false).
43Format for Hypothesis Testing
- 1. Determine null and alternative hypotheses.
- 2. Specify the test statistic and its
distribution as if the null hypothesis were
true. - 3. Select ? and determine the rejection region.
- 4. Calculate the sample value of test
statistic. - 5. State your conclusion.
44Text Example
Do not reject H0
45data ttest create
dataset b2 .128289
b2 seb2 .03053925
se(b2) df 38
degrees of freedom tc tinv(.975,df)
t-critical value tstat b2/seb2
t-stat for h0beta20 vs. h1beta2
not 0 Section 5.2.7 pval
2(1-probt(abs(tstat),df)) p-value for two
tailed test proc print
print results run
OBS B2 SEB2 DF
TC TSTAT PVAL 1
0.12829 0.030539 38 2.02439
4.20079 .00015494
46p-value for a 1-tail test
p-value .0000775
p.0000775
ignore scale
t4.20
47Text Example
(One Tail test)
Reject H0, Accept H1
48 Section 5.2.9 data ttest
create dataset b2 .128289
b2 seb2 .03053925
se(b2) df 38
degrees of freedom tc tinv(.95,df)
upper t-critical value for one-tail
test tstat (b2)/seb2
t-stat for h0beta20 vs. h1beta20 pval
1-probt(abs(tstat),df) p-value for
one-tailed test (upper tail) proc print run
OBS B2 SEB2 DF TC
TSTAT PVAL 1 0.12829 0.030539
38 1.68595 4.20079 .000077472
49practical vs. statistical significance in
economics
- Practically but not statistically
significantWhen sample size is very small, a
large average gap between the salaries of men and
women might not be statistically significant. - Statistically but not practically
significantWhen sample size is very large, a
small correlation (say, ? 0.00000001) between
the winning numbers in the PowerBall Lottery and
the Dow-Jones Stock Market Index might be
statistically significant.
50Prediction Intervals
A (1??)x100 prediction interval for yo is
51An Example Salary - Education
yBeginning Salary, in thousands x Education,
in years T93
Data are from Harris Bank, Chicago, 1977
52data salary create
dataset infile 'salary.dat' read data
file input y x input
variables data out
create dataset input y x
input variables
input data cards . 13 data all
create combined dataset set salary
out combine datasets proc reg
esimate regression model y
x/cli specify model and cli
option data create
dataset tc2 tinv(.975,91) two-tailed
critical value tc1 tinv(.95,91)
one-tailed critical value proc print
print critical values run
53Parameter Estimates
Parameter Standard T for
H0 Variable DF Estimate
Error Parameter0 Prob T INTERCEP 1
3818.559794 377.43765814 10.117
0.0001 X 1 128.085932
29.69671216 4.313 0.0001
OBS TC2 TC1 1 1.98638
1.66177
54Is Education a Significant Factor in Explaining
Starting Salary?
5595 Interval Estimate for ?2
b2 ? t?/2se(b2), b2 t?/2se(b2)
129.09?? 1.986(29.7), 129.09?? 1.986(29.7)
We estimate that an additional 1 year of
education will increase starting salary by
70.10 to 188.07