Simple Regression - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Simple Regression

Description:

This chapter starts with a linear regression model with one ... It then discusses two methods of estimation: the method of ... nonrandom variables. ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 42

Provided by: ACE5135

Category:

more less

Transcript and Presenter's Notes

Title: Simple Regression

1
Chapter 3

Simple Regression

2
What is in this Chapter?

This chapter starts with a linear regression
model with one explanatory variable, and states
the assumptions of this basic model
It then discusses two methods of estimation the
method of moments (MM) and the method of least
squares (LS).
The method of maximum likelihood (ML) is
discussed in the appendix

3
3.1 Introduction

Example 1 Simple Regression
y sale
x advertising expenditures
Here we try to determine the relationship between
sales and advertising expenditures.

4
3.1 Introduction

Example 2 Multiple Regression
y consumption expenditures of a family
x1 family income
x2 financial assets of the family
x3 family size

5
3.1 Introduction

There are several objectives in studying these
relationships.
They can be used to
1. Analyze the effects of policies that involve
changing the individual x's. In Example 1 this
involves analyzing the effect of changing
advertising expenditures on sales
2. Forecast the value of y for a given set of
x's.
3. Examine whether any of the x's have a
significant effect on y.

6
3.1 Introduction

Given the way we have set up the problem until
now, the variable y and the x variables are not
on the same footing
Implicitly we have assumed that the x's are
variables that influence y or are variables that
we can control or change and y is the effect
variable.
There are several alternative terms used in the
literature for y and x1, x2,..., xk.
These are shown in Table 3.1.

7
3.1 Introduction

Table 3.1 Classification of variables in
regression analysis
Presdictand
Predictors
Regressand
Regressors
Explained variable
Explanatory variables
Dependent variable Independent
variables
Effect variable Causal
variables
Endogenous variable Exogenous
variables
Target variable Control
variables

8
3.2 Specification of the Relationships

As mentioned in Section 3.1, we will discuss the
case of one explained (dependent) variable, which
we denote by y, and one explanatory (independent)
variable, which we denote by x.
The relationship between y and x is denoted by
y f(x)
Where f(x) is a function of x

9
3.2 Specification of the Relationships

Going back to equation (3.1), we will assume that
the function f(x) is linear in x, that is,
And we will assume that this relationship is a
stochastic relationship, that is,
Where ,which is called an error or
disturbance, has a known probability distribution
(i.e., is a random variable).

10
3.2 Specification of the Relationships
11
3.2 Specification of the Relationships

In equation (3.2), is the
deterministic component of y and u is the
stochastic or random component.
and are called regression coefficients
or regression parameters that we estimate from
the data on y and x

12
3.2 Specification of the Relationships

Why should we add an error term u ?
What are the sources of the error term u in
equation (3.2)?
There are three main sources

13
3.2 Specification of the Relationships

Unpredictable element of randomness in human
responses.
ex. If y consumption expenditure of a household
and x disposable income of the household, there
is an unpredictable element of randomness in each
household's consumption.
The household does not behave like a
machine. In one month the people in the
household are on a spending spree. In another
month they are tightfisted.

14
3.2 Specification of the Relationships

Effect of a large number of omitted variables.
Again in our example x is not the only variable
influencing y. The family size, tastes of the
family, spending habits, and so on, affect the
variable y.
The error u is a catchall for the effects of all
these variables, some of which may not even be
quantifiable, and some of which may not even be
identifiable.
To a certain extent some of these variables are
those that we refer to in source 1.

15
3.2 Specification of the Relationships

3. Measurement error in y.
In our example this refers to measurement error
in the household consumption. That is, we cannot
measure it accurately.
This argument for u is somewhat difficult to
justify, particularly if we say that there is no
measurement error in x (household disposable
income). The case where both y and x are measured
with error is discussed in Chapter 11.
Since we have to go step by step and not
introduce all the complications initially, we
will accept this argument that is, there is a
measurement error in y but not in x.

16
3.2 Specification of the Relationships

If we have n observations on y and x, we can
write equation (3.2) as
Our objective is to get estimates of the unknown
parameters and in equation (3.3)
given the n observations on y and x.

17
3.2 Specification of the Relationships

To do this we have to make some assumption about
the error terms . The assumptions we make
are
Zero mean.
Common variance.
Independence. and are independent for all

18
3.2 Specification of the Relationships

Independence of . and are
independent for all i and j. This assumption
automatically follows if
are considered nonrandom variables.
With reference to Figure 3.1, what this says is
that the distribution of u does not depend on the
value of x.
Normality, are normally distributed for all
i. In conjunction with assumptions 1, 2, and 3
this implies that are independently and
normally distributed with mean zero and a common
variance . We write this as

19
3.2 Specification of the Relationships

These are the assumptions with which we start. We
will, however, relax some of these assumptions in
later chapters.
Assumption 2 is relaxed in Chapter 5.
Assumption 3 is relaxed in Chapter 6.
Assumption 4 is relaxed in Chapter 9.

20
3.2 Specification of the Relationships

We will discuss three methods for estimating the
parameters and
1. The method of moments (MM).
2. The method of least squares (LS).
3. The method of maximum likelihood (ML).

21
3.3 The Method of Moments

The assumptions we have made about the error term
u imply that
In the method of moments, we replace these
conditions by their sample counterparts.
Let and be the estimators for and
, respectively. The sample counterpart of
is the estimated error (which is also
called the residual), defined as

22
3.3 The Method of Moments

The two equations to determine and are
obtained by replacing population assumptions by
their sample counterparts

23
3.3 The Method of Moments

In these and the following equations, denotes
. Thus we get the two equations
These equations can be written as (noting that
)

24
3.4 The Method of Least Squares

The method of least squares requires that we
should choose and as estimates of and
, respectively, so that
is a minimum.
Q is also the sum of squares of the
(within-sample) prediction errors when we predict
given and the estimated regression
equation.
We will show in the appendix to this chapter that
the least squares estimators have desirable
optimal properties.

25
3.4 The Method of Least Squares

or
or
(3.6)
and
(3.7)

26
3.4 The Method of Least Squares

Let us define
and

27
3.4 The Method of Least Squares

The residual sum of squares (to be denoted by
RSS) is given by

28
3.4 The Method of Least Squares

But .Hence we have
is usually denoted by TSS (total sum of
squares) and is usually denoted by ESS
(explained sum of squares).
Thus TSS ESS RSS
(total) (explained)
(residual)

29
3.4 The Method of Least Squares

The proportion of the total sum of squares
explained is denoted by ,where is
called the correlation coefficient.
Thus and
.If is high (close to 1), then x
is a good explanatory variable for y.
The term is called the correlation
determination and must fall between zero and 1
for any given regression.

30
3.4 The Method of Least Squares

If is close to zero, the variable x
explains very little of the variation in y. If
is close to 1, the variable x explains most
of the variation in y.
The coefficient of determination is given by

31
Appendix to Chapter 3

Prove the OLS estimators are BLUE.
The method of maximum likelihood.

32
3.9 Alternative Functional Forms for Regression
Equations

For instance, for the data points depicted in
Figure 3.7(a), where y is increasing more slowly
than x, a possible functional form is y a ß
logx.
This is called a semi-log form, since it involves
the logarithm of only one of the two variables x
and y.

33
3.9 Alternative Functional Forms for Regression
Equations

In this case, if we redefine a variable X log
x, the equation becomes y a ßX.
Thus we have a linear regression model with the
explained variable y and the explanatory variable
X log x.

34
3.9 Alternative Functional Forms for Regression
Equations

For the data points depicted in Figure 3.7(b),
where y is increasing faster than x, a possible
functional from is . In this case we
take logs of both sides and get another kind of
semi-log specification
If we define Y log y and , we
have
which is in the form of a linear regression
equation.

35
3.9 Alternative Functional Forms for Regression
Equations
36
3.9 Alternative Functional Forms for Regression
Equations

An alternative model one can use is
In this case taking logs of both sides, we get
In this case can be interpreted as an
elasticity. Hence this form is popular in
econometric work. This is call a double-log
specification since it involves logarithms of
both x and y. Now define Y log y, X log x, and
.We have
which is in the form of a linear regression
equation. An illustrative example is given at the
end of this section.

37
3.9 Alternative Functional Forms for Regression
Equations

Some other functional forms that are useful when
the data points are as shown in Figure 3.8 are
or
In the first case we define X1/x and in the
second case we define .In both case
the equation is linear in the variables after the
transformation.

38
3.9 Alternative Functional Forms for Regression
Equations

Some other nonlinearities can be handled by what
is known as search procedures.
For instance, suppose that we have the regress
equation
The estimates of , ,and are
obtained by minimizing

39
3.9 Alternative Functional Forms for Regression
Equations

We can reduce this problem to one of the simple
least squares as follows For each value of
,we define the variable and
estimate and by minimizing
We look at the residual sum of squares in each
case and then choose that value of for which
the residual sum of squares is minimum.
The corresponding estimates of and are
the least squares estimates of these parameters.
Here we are searching over different values of
.

40
3.9 Alternative Functional Forms for Regression
Equations

This search would be a convenient one only if we
had some prior notion of the range of this
parameter.
In any case there are convenient nonlinear
regression programs available nowadays.
Our purpose here is to show how some problems
that do not appear to fall in the framework of
simple regression at first sight can be
transformed into that framework by a suitable
redefinition of the variables.