Title: Lecture 5 The Problem of Statistical Inference Chapters 5 and 8
1Lecture 5The Problem of Statistical
Inference(Chapters 5 and 8)
- Hypothesis Testing in the Two-Variable Regression
Model-- Continued - Testing Hypotheses about a Regression Coefficient
- Test of Significance Approach ?
- Analysis of Variance Approach ?
2Hypothesis Testing in the Multiple Regression
Model
- Introduction
- Testing Joint Hypotheses
- Testing Significance of a Group of Coefficients
- Testing Significance of the Overall Model
- Testing for Causality
- Testing Linear Restrictions on Coefficients
- Testing Equality of Two Regression Coefficients
- Testing Structural Stability of Regression Models
3Quick Review
- Last time we saw that we can use the CNLR model
and suppose to test a null hypothesis such as H0
ß2 ß2 against say an alternative two-sided
hypothesis such as H1 ß2? ß2 - We said one way to do this is to use the t-test
of significance, where -
-
- t (ß2 - ß2)/SE(ß2) tn -2
-
4Quick Review
- So, once we estimate the regression equation, we
compute the above t ratio. - Next, we choose a level of significance, ?, and
use it to look up the critical t value from the t
table. - Finally, we use an appropriate decision rule to
decide whether or not we should reject the null
in favor of the alternative. -
-
5Choosing the Level of Significance
- How should we choose the level of significance?
- There is no general rule to follow.
- It is customary to use 1, 5, or 10.
- Sometimes the choice can be made based on the
cost of committing type I error relative to that
of committing a type II error. - You should choose a high level of significance
if you suspect the test has a low power.
6 The P-Value
- Instead of using an arbitrary level of
significance, nowadays we use the p-value, which
is also known as the exact level of significance
or the marginal significance level - This is the lowest level of significance at which
a given null hypothesis can be rejected - Note that for a given sample size, as t
increases, the p-value decreases
7 P-Value Two Examples
- Variable Coefficient Std. Error
t-Statistic Prob. - C 0.01738 0.00287 6.052519
0.000 - X 0.21637 0.18839 1.148471 0.258
- Variable Coefficient Std. Error
t-Statistic Prob. - C -0.00020 7.16E-0 -2.866233 0.006
- X 0.49379 0.00906 9.49734 0.000
8 Testing Hypothesis in the Two-Variable
Model Analysis of Variance Approach
- As we said earlier, there are three alternative
approaches for testing a null hypothesis - confidence interval approach
- test of significance approach
- analysis of variance approach
- Having studied the test of significance approach,
we now turn to the analysis of variance approach.
9 Analysis of Variance Approach
- Analysis of variance (ANOVA) means examining the
various sums of squares in the relation,TSS ESS
RSS in the context of regression analysis. - In this approach, the first step is to determine
the degrees of freedom of the above sums of
squares. - In the two-variable model these are as follows
- TSS has n - 1 degrees of freedom
- RSS has n - 2 degrees of freedom
- ESS has 1 degree of freedom
10Analysis of Variance Approach
-
- Next, we define the mean sum of squares
associated with a given sum of squares as the
ratio of that sum of squares to its degrees of
freedom - Mean total sum of squares TSS/(n-1)
-
- Mean residual sum of squares RSS/(n-2) ?2
- Mean explained sum of squares ESS/1 ESS
- A table containing this information is called an
ANOVA table.
11Analysis of Variance Approach
-
- We use the information ina an ANOVA table to
construct the following statistic, which is used
for testing H0 ß2 0 in the two-variable model - ESS ESS
- F
- RSS/(n-2) ?2
- In the two-variable CNLR model this statistic has
an F distribution with 1 degree of freedom in the
numerator and n-2 degrees in the denominator. - It can be used to test the statistical
significance of the only slope coefficient in the
bivariate model.
12Analysis of Variance Approach
-
- Large values of F (i.e., large ESS relative to
?2) lead to the rejection of H0,while small
values of F would be consistent with H0. - Of course, question remains as to how large is
large and how small is small? - As with the t test, the answer is, relative to
the critical value of the test (here F)
statistic. - In fact, to apply this test, which is known as
the F test, we follow the same procedure as
with t test.
13Analysis of Variance Approach
- First, using sample data, we compute the F ratio.
- Next, we choose a level of significance, and use
the F table to find the critical F value with 1
and n-2 degrees of freedom. - Finally, we use the usual decision rule for
rejecting or not rejecting the null hypothesis,
i.e., we reject the null if the calculated F
exceeds the critical F, otherwise we dont reject
the null.
14Analysis of Variance Approach An Example
-
- Lets use the U.S. consumption function we
estimated earlier, where ß2 0.76, ESS
4,598,500.9, and RSS 6,107.3 to test H0 ß2 0
against H1 ß2 ? 0 at the 5 level. - Noting that this is a bivariate model (i.e., k
2) we determine that ESS has k-1 1 degree of
freedom, and RSS has 32 - 2 30 degrees of
freedom so that the F ratio is, - F 4,598,500.9/(6,107.3/30) 22,588.55
15Analysis of Variance Approach An Example
- At the 5 level and with 1 and 30 degrees of
freedom, the critical F value is 4.17. - Since the computed F is greater than the critical
F we reject the null in favor of the
alternative. - Thus we conclude that at the 5 level our point
estimate of ß2, i.e., 0.76 is statistically
significantly different from zero.
16Analysis of Variance Approach Some Remarks
- In the two-variable model, this F test is only
applicable to zero null hypothesis. - But, as we will see later on, in multiple
regression variants of the F statistic can be
used for testing a large variety of null
hypotheses involving several regression
coefficients. - The F test is a two-tail tests.
17Analysis of Variance Approach Some Remarks
- In the two-variable model, regardless of
whether we use the t or F test, the final
decision (outcome) is the same. - This is because F1, n-2 t2n-2
-
18Analysis of Variance Approach Some Remarks
- It can be shown that
- F (n-2)R2/(1-R2)
- From this, it follows that F ? 0 as R2 ? 0.
- And as F ? ? as R2 ? 1.
- You see, R2 and F move together
- Thus we can use the F statistic to test the
statistical significance of R2, that is test
H0R20 against H1R2? 0
19Introduction
- In multiple regression sometimes we concerned
with the joint effect of explanatory variables,
in addition to their partial or individual
effects. - This means that in multiple regression, we can
test not only hypotheses that involve a single
regression coefficient, but also hypotheses that
include several regression coefficients. - We begin with hypotheses that involve a single
regression coefficient.
20Testing Hypothesis Involving a Single Partial
Regression Coefficient
- As in the two-variable regression model, we can
use either the t test or the F test. - However, the F test for testing hypotheses on a
single regression coefficient is somewhat
different in the multiple regression model
relative to the two-variable model. -
- In particular, F ESS/?2, which we used in the
two-variable model to test the statistical
significance of the only slope coefficient, ?2,
can no longer be used in the multiple regression
to test the same hypothesis.
21Testing Hypothesis Involving a Single Partial
Regression Coefficient
- In multiple regression, the procedure for
performing an F test of statistical significance
of a single regression coefficient is a special
case of the general F testing procedure used to
test a host of different hypotheses. - Lets see how this is so by studying the general
F testing procedure.
22The ANOVA Approach in the Multiple Regression
Model
- In the multiple regression model, the ANOVA
approach, known as Wald test, involves the same
set of steps regardless of what form the null
hypothesis takes. - The idea is to once assume the null hypothesis is
true, and another time assume the alternative is
true and then determine which model, that
corresponding to the null or the alternative
hypothesis, fits the data better.
23Steps in Wald Test
1. Assume the null hypothesis is true, and
find out what the model would look like in this
case. Call this the restricted
model 2. Estimate the restricted model and save
the RSS. Denote this RSSr 3. This time
assume the alternative hypothesis is true, in
which case the original model, which we call
the full or unrestricted model applies.
Estimate this, obtain the RSS and call it RSSu
24Steps in Wald Test
- 4. Construct the following statistic
- (RSSr - RSSu)/m
- F ---------------------
- RSSu/(n-k)
- Here k is the umber of parameters in the
original (full or unrestricted) model including
the intercept, and m is the difference in the
number of coefficients in the full and
restricted models. - Note that because RSSr gt RSSu,the above F
ratio is a nonnegative number - In the multiple CNLR model the above ratio has
an F distribution with m and n-k degrees of
freedom
25Steps in Wald Test
- 5. Compute the above Wald F statistic and
compare it with the critical F value at the
chosen of level of significance. - The decision rule is as usual.
- We can express the above F in terms of R2 from
the unrestricted and restricted models - (R2u - R2r)/m
- F ----------------
- (1-R2u)/(n - k)
26Applications of Wald Test
- In using the Wald test, the main task is to find
the restricted model. - Below I present the restricted model for testing
a number of useful hypotheses in the context of
the following quad-variate model, - Yt ß1 ß2X2t ß3X3t ß4X4t ut
- Note that this will be the unrestricted model
regardless of the null hypothesis considered.
27Testing Statistical Significance of an individual
Regression Coefficient
- H0 ß2 0 vs. H1 ß2 ? 0
- In this case the restricted model is as follows,
- Yt ß1 ß3X3t ß4X4t ut
28Testing a Non-Zero Joint Hypothesis
- H0 ß2ß2 and ß3ß3 vs. H1 ß2?ß2 or
ß3?ß3 - Here ß2 and ß3 are hypothesized (known) values
of ß2 and ß3, respectively, e.g., 0 and 1,... - In this case the restricted model is,
- Yt ß1 ß2X2t ß3X3t ß4X4t ut
- or Yt - ß2X2t - ß3X3t ß1 ß4X4t ut
29Testing Joint Significance of a Group of
Coefficients
- H0 ß2 ß3 0 vs. H1 ß2 ?ß3 ? 0
- This is a special case of the previous test,
where ß is zero. - The restricted model is,
- Yt ß1 ß4X4t ut
30Granger Non-Causality Test
- This is a useful application of the above test of
significance of a group of coefficients. - I ask you to rely on your own notes and the text
for this topic. -
Warning You are expected (polite for required)
to study Section 17.14, Causality in Economics
The Granger Test, pp. 620-23 of Gujarati
31Testing the Overall Significance of the Model
- H0 ß2 ß3 ß4 0 vs. H1 ß2 ? ß3
? ß4 ? 0 - This amounts to testing H0 R2 0 vs. H1 R2 ?
0 - In this case, the restricted model is
- Yt ß1 ut
-
- If you estimate such a model, youd find Yt ß1
- In practice, we dont estimate the above
restricted model to test the overall significance
of the model.
32Applications of Wald Test Testing the Overall
Significance of the Model
- Instead, we use the F statistic we used for the
same purpose in the two-variable model namely, - ESS/(k-1)
- F ---------------
- RSS/(n-k)
33Testing Linear Restrictions
- H0 ß2 ß3 c versus H1 ß2 ß3 ? c
- where c is a known constant, e.g, 0, 1, 1/2, etc.
- Find the restricted model by solving the null
hypothesis for one of the parameters as a
function of the other, e.g., ß2 c - ß3 - Substitute this in the original model,
- Yt ß1 (c - ß3)X2t ß3X3t ß4X4t ut
- or Yt - cX2t ß1 ß3(X3t - X2t) ß4X4t
ut
34Testing Linear Restrictions
- Thus, in order to find the RSS associated with
the restricted model, you should generate two
variables, Yt - cX2t and X3t - X2t and regress
the former on the latter, a constant, and X4. - The above procedure is known as Restricted Least
Squares (RLS). - Note that the restriction under H0 is linear
since it holds as an equality. - An example of a nonlinear restrictions would be
ß2 ß3 lt c, which cannot be handled by F test.
35Testing Equality of two Regression Coefficients
- H0 ß2 ß3 vs. H1 ß2 ? ß3
- The restricted model is
- Yt ß1 ß2X2t ß2X3t ß4X4t ut
- ß1 ß2(X2t X3t) ß4X4t ut
36Testing Stability of the Model
- When we estimate a regression model, we assume
implicitly that the regression coefficients are
constant over time, that is, the model is stable. - However, regime changes can cause structural
changes in the model. - Thus, it is important to test the the assumption
of constancy or stability of the parameters of
the regression model.
37Testing Stability of the Model
- Let the model representing the period before the
event in question (the first n1 observations)
be... - Yt ?1 ?2X2t ?3X3t u1 t 1, 2, , n1
- Let the model representing the period following
the change (the remaining n2 observations) be... - Yt ?1 ?2X2t ?3X3t u2 t 1, 2, , n2
- The null hypothesis is NO structural change,
i.e., the models representing the two sub-periods
are one and the same, H0 ?1 ?1, ?2 ?2, ?3
?3
38 Applications of Wald Test
- If H0 turns out to be true (i.e., if it is not
rejected), we can estimate a single regression
over the entire period by pooling the two
sub-samples (using the full sample of n n1 n2
observations). - The null hypothesis is tested as follows
- 1. Estimate the model using the first
sub- sample of n1observations, and save the
RSS . Call this RSS1. - 2. Estimate the model over the second
sub- sample using n2 observations, find the
RSS, and call it RSS2.
39 Applications of Wald Test
3. The unrestricted RSS, which assumes H1 is
true (i.e. assumes there is a break in the
regression line) equals RSS1 RSS2. 4.
Estimate the model using all of the available
observations, that is the full sample of n n1
n2 observations. Obtain the RSS and denote
it RSSr. This is the restricted RSS because
estimating the model over the entire sample
period is valid only if H0 is true, that is if
there is no break in the model.
40 Applications of Wald Test
- 5. Construct the following ratio
- (RSSr - RSSu)/k
- F ---------------------
- RSSu/(n - 2k)
- This has an F distribution with k and n-2k
degrees of freedom. - The decision rule is as usual.
- The above test is known as Chow Breakpoint Test
and is available in EViews.
41 Other Applications of the t Test
- As simple as it is, the t test has many
applicat-ions, and when used properly has a high
power. - So far, we have studied its use for testing zero
and non-zero hypotheses on regression
coefficients. - We see how it can be used for testing hypotheses
involving more than one regression coefficients,
which are typically tested using the F test. - We will also see how the t test can be used to
test hypotheses on the simple correlation
coefficient.
42Testing Linear Restrictions using the t Test
- Consider the following trivariate model,
- Yt ß1 ß2X2t ß3X3t ut
- Suppose you ant to test H0 ß2 ß3 c versus
H1 ß2 ß3 ? c, where c is a known constant. - Rewrite the null hypothesis as, ß2 ß3 - c 0.
43Testing Linear Restrictions using the t Test
- Construct the following t ratio,
-
- t ß2ß3-c/?(Var(ß2)Var(ß3)2Cov(ß2, ß3)
- This has a t distribution with n-2 degrees of
freedom. - The decision rule is as usual.
44Testing Equality of two Regression Coefficients
using the t Test
- Consider the following trivariate model,
- Yt ß1 ß2X2t ß3X3t ut
- Suppose you ant to test H0 ß2 ß3 versus
H1ß2 ? ß3. - Write the null hypothesis as, ß2 - ß3 0.
45Testing Equality of two Regression Coefficients
using the t Test
- Construct the following t ratio,
-
- t ß2 - ß3/?(Var(ß2)Var(ß3) - 2Cov(ß2, ß3)
- This has a t distribution with n-2 degrees of
freedom. - The decision rule is as usual.
46 Testing Hypothesis on the Correlation
Coefficient using the t Test
- Recall the simple correlation coefficient between
any two random variables is given by, -
- r12 S12/?(S11S22)
- In the CNLR model,
- t r12/ SE(r12) tn-2
- follows the t distribution with df n-2.
- Here, SE(r12) ?(1 - r2)/(n - 2)
47Testing Hypothesis on the Correlation
Coefficient using the t Test
- The above t statistic can be used to test a
number of hypotheses about the correlation
coefficient. - Some hypotheses of interest are
- H0 r 0 versus H1 r lt 0 (one-tailed)
- H0 r 0 versus H1 r gt 0 (one-tailed)
- H0 r 0 versus H1 r ? 0
(two-tailed) - The decision rule is as with any t test, both
one-tailed and two-tailed.
48Practical Aspects of Hypothesis Testing
- Please study Section 5.8, pp. 129-134 of Gugarati
49Reporting Results of Regression Analysis
- If there is only one equation, report it as
follows -
- Yi 91.1 20.5Xi u
- (1.75) (2.67)
- Significant at the 10 level (two tail)
- Significant at the 1 level (one tail)
- Indicate whether the numbers in parentheses are
estimated standard errors, t ratios, or p values. - In the first two cases, the asterisks ( and
) would be needed, but not if you choose to
report the p values, as long as you make it
clear.
50Reporting Results of Regression Analysis
- If the data are time-series, report the
estimation period and frequency of data, e.g.,
1969-1988 for annual data, 1969.1-1988.4 for
quarterly data, or 1969.01-1988.12 if the data
are monthly. - It is also desirable to report the sample mean
value of the dependent variable (and perhaps
those of the independent variables).
51Reporting Results of Regression Analysis
- If there are several estimated equations,
construct a table with the estimated parameters
in rows or columns. - Define all the variables of the model.
- Report data sources.
- See the example below.
52- Table 1
- Ordinary Least Squares Estimates of
- Output Per Labor Hour in Selected Sectors of the
U.S. Economy - 1955.1-1995.4
- (t-values in parentheses)
- Mining Farming Services
- Constant 0.12657 0.25672 1.11298
(2.09)
(2.58) (1.09) - L 0.11659 0.40048
0.99801 - (1.99) (2.31)
(0.98) - K 0.16667 0.33437
1.28359 - (2.39) (1.88)
(1.11) - _
- R2 0.54667 0.35347 0.58179
- F 12.38 18.45
11.98 - SEE 0.0096 0.0210
0.0061 - Significant at the 10 level.
- Significant at the 5 level.
53- Table 1-- continued
- Glossary
- L Natural log of hours of work of all persons
- K Natural log of capital stock in the private
non-farm business sector (1992 dollars). - Source of Data
- The original source of all data is the U.S.
Department of Labor, Bureau of Labor Statistics.
- The data used in this study are taken from the
DRI Basic Economics data tape, Chapter 7
(Capacity and Productivity), Section 2
(Productivity and Unit Costs), pages 7-3.