Lecture 5 The Problem of Statistical Inference Chapters 5 and 8 presentation

About This Presentation

Title:

Lecture 5 The Problem of Statistical Inference Chapters 5 and 8

Description:

Hypothesis Testing in the Two-Variable Regression Model-- Continued ... Here k is the umber of parameters in the original (full or unrestricted) model ... –

Number of Views:270

Avg rating:3.0/5.0

Slides: 54

Provided by: farrokh5

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 5 The Problem of Statistical Inference Chapters 5 and 8

1
Lecture 5The Problem of Statistical
Inference(Chapters 5 and 8)

Hypothesis Testing in the Two-Variable Regression
Model-- Continued
Testing Hypotheses about a Regression Coefficient
Test of Significance Approach ?
Analysis of Variance Approach ?

2
Hypothesis Testing in the Multiple Regression
Model

Introduction
Testing Joint Hypotheses
Testing Significance of a Group of Coefficients
Testing Significance of the Overall Model
Testing for Causality
Testing Linear Restrictions on Coefficients
Testing Equality of Two Regression Coefficients
Testing Structural Stability of Regression Models

3
Quick Review

Last time we saw that we can use the CNLR model
and suppose to test a null hypothesis such as H0
ß2 ß2 against say an alternative two-sided
hypothesis such as H1 ß2? ß2
We said one way to do this is to use the t-test
of significance, where
t (ß2 - ß2)/SE(ß2) tn -2

4
Quick Review

So, once we estimate the regression equation, we
compute the above t ratio.
Next, we choose a level of significance, ?, and
use it to look up the critical t value from the t
table.
Finally, we use an appropriate decision rule to
decide whether or not we should reject the null
in favor of the alternative.

5
Choosing the Level of Significance

How should we choose the level of significance?
There is no general rule to follow.
It is customary to use 1, 5, or 10.
Sometimes the choice can be made based on the
cost of committing type I error relative to that
of committing a type II error.
You should choose a high level of significance
if you suspect the test has a low power.

6

The P-Value

Instead of using an arbitrary level of
significance, nowadays we use the p-value, which
is also known as the exact level of significance
or the marginal significance level
This is the lowest level of significance at which
a given null hypothesis can be rejected
Note that for a given sample size, as t
increases, the p-value decreases

7

P-Value Two Examples

Variable Coefficient Std. Error
t-Statistic Prob.
C 0.01738 0.00287 6.052519
0.000
X 0.21637 0.18839 1.148471 0.258
Variable Coefficient Std. Error
t-Statistic Prob.
C -0.00020 7.16E-0 -2.866233 0.006
X 0.49379 0.00906 9.49734 0.000

8

Testing Hypothesis in the Two-Variable
Model Analysis of Variance Approach

As we said earlier, there are three alternative
approaches for testing a null hypothesis
confidence interval approach
test of significance approach
analysis of variance approach
Having studied the test of significance approach,
we now turn to the analysis of variance approach.

9

Analysis of Variance Approach

Analysis of variance (ANOVA) means examining the
various sums of squares in the relation,TSS ESS
RSS in the context of regression analysis.
In this approach, the first step is to determine
the degrees of freedom of the above sums of
squares.
In the two-variable model these are as follows
TSS has n - 1 degrees of freedom
RSS has n - 2 degrees of freedom
ESS has 1 degree of freedom

10
Analysis of Variance Approach

Next, we define the mean sum of squares
associated with a given sum of squares as the
ratio of that sum of squares to its degrees of
freedom
Mean total sum of squares TSS/(n-1)
Mean residual sum of squares RSS/(n-2) ?2
Mean explained sum of squares ESS/1 ESS
A table containing this information is called an
ANOVA table.

11
Analysis of Variance Approach

We use the information ina an ANOVA table to
construct the following statistic, which is used
for testing H0 ß2 0 in the two-variable model
ESS ESS
F
RSS/(n-2) ?2
In the two-variable CNLR model this statistic has
an F distribution with 1 degree of freedom in the
numerator and n-2 degrees in the denominator.
It can be used to test the statistical
significance of the only slope coefficient in the
bivariate model.

12
Analysis of Variance Approach

Large values of F (i.e., large ESS relative to
?2) lead to the rejection of H0,while small
values of F would be consistent with H0.
Of course, question remains as to how large is
large and how small is small?
As with the t test, the answer is, relative to
the critical value of the test (here F)
statistic.
In fact, to apply this test, which is known as
the F test, we follow the same procedure as
with t test.

13
Analysis of Variance Approach

First, using sample data, we compute the F ratio.
Next, we choose a level of significance, and use
the F table to find the critical F value with 1
and n-2 degrees of freedom.
Finally, we use the usual decision rule for
rejecting or not rejecting the null hypothesis,
i.e., we reject the null if the calculated F
exceeds the critical F, otherwise we dont reject
the null.

14
Analysis of Variance Approach An Example

Lets use the U.S. consumption function we
estimated earlier, where ß2 0.76, ESS
4,598,500.9, and RSS 6,107.3 to test H0 ß2 0
against H1 ß2 ? 0 at the 5 level.
Noting that this is a bivariate model (i.e., k
2) we determine that ESS has k-1 1 degree of
freedom, and RSS has 32 - 2 30 degrees of
freedom so that the F ratio is,
F 4,598,500.9/(6,107.3/30) 22,588.55

15
Analysis of Variance Approach An Example

At the 5 level and with 1 and 30 degrees of
freedom, the critical F value is 4.17.
Since the computed F is greater than the critical
F we reject the null in favor of the
alternative.
Thus we conclude that at the 5 level our point
estimate of ß2, i.e., 0.76 is statistically
significantly different from zero.

16
Analysis of Variance Approach Some Remarks

In the two-variable model, this F test is only
applicable to zero null hypothesis.
But, as we will see later on, in multiple
regression variants of the F statistic can be
used for testing a large variety of null
hypotheses involving several regression
coefficients.
The F test is a two-tail tests.

17
Analysis of Variance Approach Some Remarks

In the two-variable model, regardless of
whether we use the t or F test, the final
decision (outcome) is the same.
This is because F1, n-2 t2n-2

18
Analysis of Variance Approach Some Remarks

It can be shown that
F (n-2)R2/(1-R2)
From this, it follows that F ? 0 as R2 ? 0.
And as F ? ? as R2 ? 1.
You see, R2 and F move together
Thus we can use the F statistic to test the
statistical significance of R2, that is test
H0R20 against H1R2? 0

19
Introduction

In multiple regression sometimes we concerned
with the joint effect of explanatory variables,
in addition to their partial or individual
effects.
This means that in multiple regression, we can
test not only hypotheses that involve a single
regression coefficient, but also hypotheses that
include several regression coefficients.
We begin with hypotheses that involve a single
regression coefficient.

20
Testing Hypothesis Involving a Single Partial
Regression Coefficient

As in the two-variable regression model, we can
use either the t test or the F test.
However, the F test for testing hypotheses on a
single regression coefficient is somewhat
different in the multiple regression model
relative to the two-variable model.
In particular, F ESS/?2, which we used in the
two-variable model to test the statistical
significance of the only slope coefficient, ?2,
can no longer be used in the multiple regression
to test the same hypothesis.

21
Testing Hypothesis Involving a Single Partial
Regression Coefficient

In multiple regression, the procedure for
performing an F test of statistical significance
of a single regression coefficient is a special
case of the general F testing procedure used to
test a host of different hypotheses.
Lets see how this is so by studying the general
F testing procedure.

22
The ANOVA Approach in the Multiple Regression
Model

In the multiple regression model, the ANOVA
approach, known as Wald test, involves the same
set of steps regardless of what form the null
hypothesis takes.
The idea is to once assume the null hypothesis is
true, and another time assume the alternative is
true and then determine which model, that
corresponding to the null or the alternative
hypothesis, fits the data better.

23
Steps in Wald Test
1. Assume the null hypothesis is true, and
find out what the model would look like in this
case. Call this the restricted
model 2. Estimate the restricted model and save
the RSS. Denote this RSSr 3. This time
assume the alternative hypothesis is true, in
which case the original model, which we call
the full or unrestricted model applies.
Estimate this, obtain the RSS and call it RSSu

24
Steps in Wald Test

4. Construct the following statistic
(RSSr - RSSu)/m
F ---------------------
RSSu/(n-k)
Here k is the umber of parameters in the
original (full or unrestricted) model including
the intercept, and m is the difference in the
number of coefficients in the full and
restricted models.
Note that because RSSr gt RSSu,the above F
ratio is a nonnegative number
In the multiple CNLR model the above ratio has
an F distribution with m and n-k degrees of
freedom

25
Steps in Wald Test

5. Compute the above Wald F statistic and
compare it with the critical F value at the
chosen of level of significance.
The decision rule is as usual.
We can express the above F in terms of R2 from
the unrestricted and restricted models
(R2u - R2r)/m
F ----------------
(1-R2u)/(n - k)

26
Applications of Wald Test

In using the Wald test, the main task is to find
the restricted model.
Below I present the restricted model for testing
a number of useful hypotheses in the context of
the following quad-variate model,
Yt ß1 ß2X2t ß3X3t ß4X4t ut
Note that this will be the unrestricted model
regardless of the null hypothesis considered.

27
Testing Statistical Significance of an individual
Regression Coefficient

H0 ß2 0 vs. H1 ß2 ? 0
In this case the restricted model is as follows,
Yt ß1 ß3X3t ß4X4t ut

28
Testing a Non-Zero Joint Hypothesis

H0 ß2ß2 and ß3ß3 vs. H1 ß2?ß2 or
ß3?ß3
Here ß2 and ß3 are hypothesized (known) values
of ß2 and ß3, respectively, e.g., 0 and 1,...
In this case the restricted model is,
Yt ß1 ß2X2t ß3X3t ß4X4t ut
or Yt - ß2X2t - ß3X3t ß1 ß4X4t ut

29
Testing Joint Significance of a Group of
Coefficients

H0 ß2 ß3 0 vs. H1 ß2 ?ß3 ? 0
This is a special case of the previous test,
where ß is zero.
The restricted model is,
Yt ß1 ß4X4t ut

30
Granger Non-Causality Test

This is a useful application of the above test of
significance of a group of coefficients.
I ask you to rely on your own notes and the text
for this topic.

Warning You are expected (polite for required)
to study Section 17.14, Causality in Economics
The Granger Test, pp. 620-23 of Gujarati
31
Testing the Overall Significance of the Model

H0 ß2 ß3 ß4 0 vs. H1 ß2 ? ß3
? ß4 ? 0
This amounts to testing H0 R2 0 vs. H1 R2 ?
0
In this case, the restricted model is
Yt ß1 ut
If you estimate such a model, youd find Yt ß1
In practice, we dont estimate the above
restricted model to test the overall significance
of the model.

32
Applications of Wald Test Testing the Overall
Significance of the Model

Instead, we use the F statistic we used for the
same purpose in the two-variable model namely,
ESS/(k-1)
F ---------------
RSS/(n-k)

33
Testing Linear Restrictions

H0 ß2 ß3 c versus H1 ß2 ß3 ? c
where c is a known constant, e.g, 0, 1, 1/2, etc.
Find the restricted model by solving the null
hypothesis for one of the parameters as a
function of the other, e.g., ß2 c - ß3
Substitute this in the original model,
Yt ß1 (c - ß3)X2t ß3X3t ß4X4t ut
or Yt - cX2t ß1 ß3(X3t - X2t) ß4X4t
ut

34
Testing Linear Restrictions

Thus, in order to find the RSS associated with
the restricted model, you should generate two
variables, Yt - cX2t and X3t - X2t and regress
the former on the latter, a constant, and X4.
The above procedure is known as Restricted Least
Squares (RLS).
Note that the restriction under H0 is linear
since it holds as an equality.
An example of a nonlinear restrictions would be
ß2 ß3 lt c, which cannot be handled by F test.

35
Testing Equality of two Regression Coefficients

H0 ß2 ß3 vs. H1 ß2 ? ß3
The restricted model is
Yt ß1 ß2X2t ß2X3t ß4X4t ut
ß1 ß2(X2t X3t) ß4X4t ut

36
Testing Stability of the Model

When we estimate a regression model, we assume
implicitly that the regression coefficients are
constant over time, that is, the model is stable.
However, regime changes can cause structural
changes in the model.
Thus, it is important to test the the assumption
of constancy or stability of the parameters of
the regression model.

37
Testing Stability of the Model

Let the model representing the period before the
event in question (the first n1 observations)
be...
Yt ?1 ?2X2t ?3X3t u1 t 1, 2, , n1
Let the model representing the period following
the change (the remaining n2 observations) be...
Yt ?1 ?2X2t ?3X3t u2 t 1, 2, , n2
The null hypothesis is NO structural change,
i.e., the models representing the two sub-periods
are one and the same, H0 ?1 ?1, ?2 ?2, ?3
?3

38
Applications of Wald Test

If H0 turns out to be true (i.e., if it is not
rejected), we can estimate a single regression
over the entire period by pooling the two
sub-samples (using the full sample of n n1 n2
observations).
The null hypothesis is tested as follows
1. Estimate the model using the first
sub- sample of n1observations, and save the
RSS . Call this RSS1.
2. Estimate the model over the second
sub- sample using n2 observations, find the
RSS, and call it RSS2.

39
Applications of Wald Test
3. The unrestricted RSS, which assumes H1 is
true (i.e. assumes there is a break in the
regression line) equals RSS1 RSS2. 4.
Estimate the model using all of the available
observations, that is the full sample of n n1
n2 observations. Obtain the RSS and denote
it RSSr. This is the restricted RSS because
estimating the model over the entire sample
period is valid only if H0 is true, that is if
there is no break in the model.
40
Applications of Wald Test

5. Construct the following ratio
(RSSr - RSSu)/k
F ---------------------
RSSu/(n - 2k)
This has an F distribution with k and n-2k
degrees of freedom.
The decision rule is as usual.
The above test is known as Chow Breakpoint Test
and is available in EViews.

41

Other Applications of the t Test

As simple as it is, the t test has many
applicat-ions, and when used properly has a high
power.
So far, we have studied its use for testing zero
and non-zero hypotheses on regression
coefficients.
We see how it can be used for testing hypotheses
involving more than one regression coefficients,
which are typically tested using the F test.
We will also see how the t test can be used to
test hypotheses on the simple correlation
coefficient.

42
Testing Linear Restrictions using the t Test

Consider the following trivariate model,
Yt ß1 ß2X2t ß3X3t ut
Suppose you ant to test H0 ß2 ß3 c versus
H1 ß2 ß3 ? c, where c is a known constant.
Rewrite the null hypothesis as, ß2 ß3 - c 0.

43
Testing Linear Restrictions using the t Test

Construct the following t ratio,
t ß2ß3-c/?(Var(ß2)Var(ß3)2Cov(ß2, ß3)
This has a t distribution with n-2 degrees of
freedom.
The decision rule is as usual.

44
Testing Equality of two Regression Coefficients
using the t Test

Consider the following trivariate model,
Yt ß1 ß2X2t ß3X3t ut
Suppose you ant to test H0 ß2 ß3 versus
H1ß2 ? ß3.
Write the null hypothesis as, ß2 - ß3 0.

45
Testing Equality of two Regression Coefficients
using the t Test

Construct the following t ratio,
t ß2 - ß3/?(Var(ß2)Var(ß3) - 2Cov(ß2, ß3)
This has a t distribution with n-2 degrees of
freedom.
The decision rule is as usual.

46

Testing Hypothesis on the Correlation
Coefficient using the t Test

Recall the simple correlation coefficient between
any two random variables is given by,
r12 S12/?(S11S22)
In the CNLR model,
t r12/ SE(r12) tn-2
follows the t distribution with df n-2.
Here, SE(r12) ?(1 - r2)/(n - 2)

47
Testing Hypothesis on the Correlation
Coefficient using the t Test

The above t statistic can be used to test a
number of hypotheses about the correlation
coefficient.
Some hypotheses of interest are
H0 r 0 versus H1 r lt 0 (one-tailed)
H0 r 0 versus H1 r gt 0 (one-tailed)
H0 r 0 versus H1 r ? 0
(two-tailed)
The decision rule is as with any t test, both
one-tailed and two-tailed.

48
Practical Aspects of Hypothesis Testing

Please study Section 5.8, pp. 129-134 of Gugarati

49
Reporting Results of Regression Analysis

If there is only one equation, report it as
follows
Yi 91.1 20.5Xi u
(1.75) (2.67)
Significant at the 10 level (two tail)
Significant at the 1 level (one tail)
Indicate whether the numbers in parentheses are
estimated standard errors, t ratios, or p values.
In the first two cases, the asterisks ( and
) would be needed, but not if you choose to
report the p values, as long as you make it
clear.

50
Reporting Results of Regression Analysis

If the data are time-series, report the
estimation period and frequency of data, e.g.,
1969-1988 for annual data, 1969.1-1988.4 for
quarterly data, or 1969.01-1988.12 if the data
are monthly.
It is also desirable to report the sample mean
value of the dependent variable (and perhaps
those of the independent variables).

51
Reporting Results of Regression Analysis

If there are several estimated equations,
construct a table with the estimated parameters
in rows or columns.
Define all the variables of the model.
Report data sources.
See the example below.

Table 1
Ordinary Least Squares Estimates of
Output Per Labor Hour in Selected Sectors of the
U.S. Economy
1955.1-1995.4
(t-values in parentheses)
Mining Farming Services
Constant 0.12657 0.25672 1.11298
(2.09)
(2.58) (1.09)
L 0.11659 0.40048
0.99801
(1.99) (2.31)
(0.98)
K 0.16667 0.33437
1.28359
(2.39) (1.88)
(1.11)
_
R2 0.54667 0.35347 0.58179
F 12.38 18.45
11.98
SEE 0.0096 0.0210
0.0061
Significant at the 10 level.
Significant at the 5 level.

Table 1-- continued
Glossary
L Natural log of hours of work of all persons
K Natural log of capital stock in the private
non-farm business sector (1992 dollars).
Source of Data
The original source of all data is the U.S.
Department of Labor, Bureau of Labor Statistics.
The data used in this study are taken from the
DRI Basic Economics data tape, Chapter 7
(Capacity and Productivity), Section 2
(Productivity and Unit Costs), pages 7-3.

Write a Comment

User Comments (0)

About PowerShow.com