Title: Dummy Variables and Truncated Variables
1Chapter 8
- Dummy Variables and Truncated Variables
2What is in this Chapter?
- This chapter relaxes the assumption made in
Chapter 4 that the variables in the regression
are observed as continuous variables. - Differences in intercepts and/or slope
coefficients - The linear probability model and the logit and
probit models. - Truncated variables, Tobit models
38.1 Introduction
- The variables we will be considering areÂ
- 1.Dummy variables.
- 2.Truncated variables.
- They can be used toÂ
- 1.Allow for differences in intercept terms.
- 2.Allow for differences in slopes.
- 3.Estimate equations with cross-equation
restrictions. - 4.Test for stability of regression coefficients.
48.2 Dummy Variables for Changes in the Intercept
Term
- Note that the slopes of the regression lines for
both groups are roughly the same but the
intercepts are different. - Hence the regression equations we fit will be
58.2 Dummy Variables for Changes in the Intercept
Term
- These equations can be combined into a single
equation - where
- The variable D is the dummy variable.
- The coefficient of the dummy variable measures
the difference in the two intercept terms
68.2 Dummy Variables for Changes in the Intercept
Term
78.2 Dummy Variables for Changes in the Intercept
Term
- If there are more groups, we have to introduce
more dummies. - For three groups we have
- These can be written as
- where
88.2 Dummy Variables for Changes in the Intercept
Term
- As yet another example, suppose that we have data
on consumption C and income Y for a number of
households. - In addition, we have data on
- S the sex of the head of the household.
- A the age of the head of the household, which is
given in three categories, lt25 years, 25 to 50
year, and gt50 years. - E the education of the head of the household,
also in three categories, lthigh school, ?high
school but lt college degree, ? college degree.
98.2 Dummy Variables for Changes in the Intercept
Term
- We include these qualitative variable in the form
of dummy variables
108.2 Dummy Variables for Changes in the Intercept
Term
- For each category the number of dummy variables
is one less than the number of classifications. - Then we run the regression equation
- The assumption made in the dummy variable method
is that it is the intercept that changes for each
group but not the slope coefficients (i.e.
coefficients of Y).
118.2 Dummy Variables for Changes in the Intercept
Term
- The intercept term for each individual is
obtained by substituting the appropriate values
for D1 through D5. - For instance, for a male, age lt 25, with a
college degree, we have D1 1, D2 1, D3 0, D4
0, D5 0 and hence the intercept is
. - For a female, age gt 50, with a college degree, we
have D1 0, D2 0, D3 0, D4 0, D5 0 and hence
the intercept is a.
128.2 Dummy Variables for Changes in the Intercept
Term
- The dummy variable method is also used if one has
to take care of seasonal factors. - For example, if we have quarterly data on C and
Y, we fit the regression equation
138.2 Dummy Variables for Changes in the Intercept
Term
- If we have monthly data, we use 11seasonal
dummies - If we feel that, say, December (because of
Christmas shopping) is the only moth with strong
seasonal effect, we use only one dummy variable
148.2 Dummy Variables for Changes in the Intercept
Term
- Two More Illustrative Examples
- We will discuss two more examples using dummy
variables. - They are meant to illustrate two points worth
noting, which are as follows - 1. In some studies with a large number of dummy
variables it becomes somewhat difficult to
interpret the signs of the coefficients because
they seem to have the wrong signs. (The first
example) - 2. Sometimes the introduction of dummy variables
produces a drastic change in the slope
coefficient. (The second example)
158.2 Dummy Variables for Changes in the Intercept
Term
- The first example is a study of the determinants
of automobile prices. - Griliches regressed the logarithm of new
passenger car prices on various specifications.
The results are shown in Table 8.1 - Since the dependent variable is the logarithm of
price, the regression coefficients can be
interpreted as the estimated percentage change in
the price for a unit change in a particular
quality, holding other qualities constant - For example, the coefficient of H indicates that
an increase in 10 units of horsepower, results in
a 1.2 increase in price
16(No Transcript)
178.2 Dummy Variables for Changes in the Intercept
Term
- However, some of the coefficients have to be
interpreted with caution - For example, the coefficient of P in the
equation for 1960 says that the presence of power
steering as "standard equipment" led to a 22.5
higher price in 1960 - In this case the variable P is obviously not
measuring the effect of power steering alone but
is measuring the effect of "luxuriousness" of the
car - It is also picking up the effects of A and B.
This explains why the coefficient of A is so low
in 1960. In fact. A, P, and B together can
perhaps be replaced by a single dummy that
measures "luxuriousness." These variables appear
to be highly intercorrelated
188.2 Dummy Variables for Changes in the Intercept
Term
- Another coefficient, at first sight puzzling, is
the coefficient of V, which, though not
significant, is consistently negative - Though a V-8 costs more than a six-cylinder
engine on a "comparable" car, what this
coefficients says is that, holding horsepower and
other variables constant, a V-8 is cheaper by
about 4 - Since the V-8's have higher horsepower, what this
coefficient is saying is that higher horse power
can be achieved more cheaply if one shifts to V-8
than by using the six-cylinder engine
198.2 Dummy Variables for Changes in the Intercept
Term
- It measures the decline in price per horsepower
as one shifts to V-8's even though the total
expenditure on horsepower goes up - This example illustrates the use of dummy
variables and the interpretation of seemingly
wrong coefficients
208.2 Dummy Variables for Changes in the Intercept
Term
- As another example consider the estimates of
liquid-asset demand by manufacturing
corporations - Vogel and Maddala computed regressions of the
form log C a ß log S, where C is the cash and S
the sales, on the basis of data from the Internal
Revenue Service, "Statistics of Income," for the
year 1960-1961. - The data consisted of 16 industry subgroups and
14 size classes, size being measured by total
assets.
218.2 Dummy Variables for Changes in the Intercept
Term
- The equations were estimated separately for each
industry, the estimates of ß ranged from 0.929 to
1.077. - The R2s were uniformly high, ranging from 0.985
to 0.998. - Thus one might conclude that the sales elasticity
of demand for cash is close to 1. - Also, when the data were pooled and a single
equation estimated for the entire set of 224
observations, the estimate of ß was 0.992 and
R20.897.
228.2 Dummy Variables for Changes in the Intercept
Term
- When industry dummies were added, the estimate of
ß was 0.995 and R20.992. - From the high R2s and relatively constant
estimate of ß one might be reassured that the
sales elasticity is very close to 1. - However, when asset-size dummies were introduced,
the estimate of ß fell to 0.334 with R2 of 0.996. - Also, all asset-size dummies were highly
significant.
238.2 Dummy Variables for Changes in the Intercept
Term
- The situation is described in Figure 8.2.
- That the sales elasticity is significantly less
than 1 is also confirmed by other evidence. - This example illustrates how one can be very
easily misled by high R2s and apparent constancy
of the coefficients.
248.2 Dummy Variables for Changes in the Intercept
Term
258.3 Dummy Variables for Changes in Slope
Coefficients
- and
- We can write these equations together as
-
- or
268.3 Dummy Variables for Changes in Slope
Coefficients
- where for all observations in the first
group - for all observations in the
second group - for all observations in the
first group - i.e., the respective value of
x for the second group - The coefficient of D1 measures the difference in
the intercept terms and coefficient of D2
measures the difference in the slope. -
278.3 Dummy Variables for Changes in Slope
Coefficients
- Suitable dummy variables can be defined when
there are change in slopes and intercepts at
different times. - Suppose that we have data for three periods and
in the second period only the intercept changed (
there was a parallel shift). - In the third period the intercept and the slope
changed.
288.3 Dummy Variables for Changes in Slope
Coefficients
- Then we write
- Then we can combine these equations and write the
model as
298.3 Dummy Variables for Changes in Slope
Coefficients
308.3 Dummy Variables for Changes in Slope
Coefficients
- Note that in all these examples e are assuming
that the error terms in the different groups all
have the same distribution. - That is why we combine the data from the
different groups and write an error term u as in
(8.4) or (8.6) and estimate the equation by least
squares.
318.3 Dummy Variables for Changes in Slope
Coefficients
- An alternative way of writing the equations
(8.5), which is very general, is to stack the y
variables and the error terms in columns. - Then write all the parameters a1, a2 , a3 , ß1 ,
ß2 down with their multiplicative factors stacked
in columns as follows
328.3 Dummy Variables for Changes in Slope
Coefficients
- What this says is
- where ( ) is used for multiplication, e.g.,
a3(0)a30.
338.3 Dummy Variables for Changes in Slope
Coefficients
- where the definitions of D1, D2, D3, D4, D5
are clear from equation(8.7). - For instance,
348.3 Dummy Variables for Changes in Slope
Coefficients
- Note that equation (8.8) has to be estimated
without a constant term. - In this method we define as many dummy variables
as there are parameters to estimate and we
estimate the regression equation with no constant
term. - Note that equations (8.6) and (8.8) are
equivalent.
358.7 Dummy Dependent Variables
- Until now we have been considering models where
the explanatory variables are dummy variables. - We now discuss models where the explained
variable is a dummy variable. - This dummy variable can take on two or more
values but we consider here the case where it
takes on only two values, zero or 1. - Considering the other cases is beyond the scope
of this book. Since the dummy variable takes on
two values, it is called a dichotomous variable - There are numerous examples of dichotomous
explained variables.
368.7 Dummy Dependent Variables
- There are several methods to analyze regression
model where the dependent variable is a zero or
1 variable. - The simplest procedure is to just use the usual
least squares method. - In this case the model is called the linear
probability model.
378.7 Dummy Dependent Variables
- Another method, called the linear discriminate
function, is related to the linear probability
model. - The other alternative is to say that there is an
underlying or latent variable y which we do not
observe. - What we observe is
- This is the idea behind the logit and probit
models. - First we these methods and then give an
illustrative example.
388.8 The Linear Probability Model and the Linear
Discriminant Function
- The Linear Probability Model
- Similarly, in an analysis of bankruptcy of firms,
we define - We write the model in the usual regression
framework as - with E(ui)0.
398.8 The Linear Probability Model and the Linear
Discriminant Function
- The condition expectation is equal
to . - This has to be interpreted in this case as the
probability that the even will occur given the
xi. - The calculated value if y from the regression
equation (i.e., ) will then give the
408.8 The Linear Probability Model and the Linear
Discriminant Function
- Since yi takes the value 1 or zero, the errors in
equation (8.11) can take only two values, (1-ßxi)
and (-ßxi). - Also, with the interpretation we have given
equation (8.11), and the requirement that
E(ui)0, the respective probabilities of these
events are ßxi and (1-ßxi). - Thus we have
418.8 The Linear Probability Model and the Linear
Discriminant Function
428.8 The Linear Probability Model and the Linear
Discriminant Function
- Because of this heteroskedasticity problem the
OLS estimates of ß from equation (8.11) will not
be efficient. - We use the following two-step procedure
- First estimate (8.11) by least squares.
- Net compute and use weighted least
squares, that is, defining - We regress
438.8 The Linear Probability Model and the Linear
Discriminant Function
- The problems with this procedure are
- in practice may be negative,
although in large samples this will be so with a
very small probability since is a
consistent estimator for
. - Since the error ui are obviously not normally
distributed, there is a problem with the
application of the usual tests of significance.
As we will see in the next section, on the linear
discriminant function, they can be justified only
under the assumption that the explanatory
variables have a multivariate normal distribution.
448.8 The Linear Probability Model and the Linear
Discriminant Function
- The most important criticism is with the
formulation itself that the conditional
expectation be interpreted as the
probability that the even will occur. In many
case cases lie outside the limits (0,
1). - The limitations of the linear probability model
are shown in Figure 8.3, which shows the bunching
up of points along y0 and y1. - The predicted values can easily lie outside the
interval (0, 1) and the prediction errors can be
very large.
458.8 The Linear Probability Model and the Linear
Discriminant Function
468.8 The Linear Probability Model and the Linear
Discriminant Function
- The Linear Discriminant Function
- Suppose that we have n individuals for whom we
have observations on k explanatory variables and
we observe that n1 of them belong to a second
group where n1n2n. - We want to construct a linear function of the k
variables that we can use to predict that a new
observation belongs to one of the twp groups. - This linear function is called the linear
discriminant function.
478.8 The Linear Probability Model and the Linear
Discriminant Function
- As an example suppose that we have data on a
number of loan applicants and we observe that n1
of them were granted loans and n2 of them were
denied loans. - We also have the socioeconomic characteristics on
the applicants
488.8 The Linear Probability Model and the Linear
Discriminant Function
- Let us define a linear function
- Then it is intuitively clear that to get the best
discrimination between the two groups, we would
want to choose the that the ratio
498.8 The Linear Probability Model and the Linear
Discriminant Function
- Fisher suggested an analogy between this problem
and multiple regression analysis. - He suggested that we define a dummy variable
508.8 The Linear Probability Model and the Linear
Discriminant Function
- Now estimate the multiple regression equation
- Get the residual sum of squares RSS.
- Then
- Thus, once we have the regression coefficients
and residual sum of squares from the dummy
dependent variable regression, we can very easily
obtain the discriminant function coefficients.
51Discriminant Analysis
- Discriminant analysis attempts to classify
customers into two groups - those that will default
- those that will not
- It does this by assigning a score to each
customer - The score is the weighted sum of the customer
data
52Discriminant Analysis
- Here, wi is the weight on data type i, and Xi,c,
is one piece of customer data. - The values for the weights are chosen to maximize
the difference between the average score of the
customers that later defaulted and the average
score of the customers who did not default
53Discriminant Analysis
- The actual optimization process to find the
weights is quite complex - The most famous discriminant scorecard is
Altman's Z Score. - For publicly owned manufacturing firms, the Z
Score was found to be as follows
54Discriminant Analysis
55Discriminant Analysis
- A company scoring less than 1.81 was "very
likely" to go bankrupt later - A company scoring more than 2.99 was "unlikely"
to go bankrupt. - The scores in between were considered inconclusive
56Discriminant Analysis
- This approach has been adopted by many banks.
- Some banks use the equation exactly as it was
created by Altman - But, most use Altman's approach on their own
customer data to get scoring models that are
tailored to the bank - To obtain the probability of default from the
scores, we group companies according their scores
at the beginning of a year, and then calculate
the percentage of companies within each group who
defaulted by the end of the year
578.9 The Probit and Logit Models
- An alternative approach is to assume that we have
a regression model - where is not observed.
- It is commonly called a latent variable.
- What we observe is a dummy variable yi defined by
588.9 The Probit and Logit Models
- The probit and logit model differ in the
specification of the distribution of the error
term u in equation (8.12). - The difference between the specification (8.12)
and the linear probability model is that in the
linear probability model we analyze the
dichotomous variables as they are, whereas in
(8.12) we assume the existence of an underlying
latent variable for which we observe a
dichotomous realization.
598.9 The Probit and Logit Models
- For instance, if the observed dummy variable is
whether or not the person is employed,
would be defined as propensity or ability to
find employment. - Similarly, if the observed dummy variable is
whether or not the person has bought a car, then
would be defined as desire or ability to
buy a car. - Note that in both the examples we have given,
there is desire and ability involved. - Thus the explanatory variables in (8.12) would
con tain variables that explain both these
elements.
608.9 The Probit and Logit Models
- Note from system(8.13) that multiplying by
any positive constant does not change yi. - Hence if we observe yi, we can estimate the ßs
in (8.12) only up to positive multiple. - Hence it is customary to assume var(ui)1.
- This fixed the scale of .
- From the relationship (8.12) and (8.13) we get
618.9 The Probit and Logit Models
-
- where F is the cumulative distribution
function of u.
628.9 The Probit and Logit Models
- If the distribution of u is symmetric, since
1-F(-Z), we can write - Since the observed yi are just realization of a
binomial process with probabilities given by
equation (8.14) and varying from trial to trial
(depend on zij), we can write the likelihood
function as
638.9 The Probit and Logit Models
- If the cumulative distribution of ui is logistic
we have what is known as the logit model. - In this case
- Hence
- Note that for logit model
648.9 The Probit and Logit Models
- If the errors ui in (8.12) follow a normal
distribution, we have the probit model (it should
more appropriately be called the normit model,
but the word probit was used in the biometrics
literature). - In this case
658.9 The Probit and Logit Models
- Maximization of the likelihood function (8.15)
for either the probit or the logit model is
accomplished by nonlinear estimation methods. - There are now several computer programs available
for probit and logit analysis, and these programs
are very inexpensive to run.
668.9 The Probit and Logit Models
- Illustrative Example
- As an illustration, we consider data on a sample
of 750 mortgage applications in the Columbia, SC,
metropolitan area. - There were 500 loan applications accepted and 250
loan applications rejected. - We define
678.9 The Probit and Logit Models
- Three model were estimated the linear
probability model, the logit model, and the
probit model. - The explanatory variables were
- AI applicants and coapplicants income (103
dollars) - XMDdebt minus mortgage payment (103 dollars)
- DFdummy variable,1 for female, 0 for male
- DRdummy variable,1 for nonwhite, 0 for white
- DSdummy variable,1 for single, 0 for
otherwise - DAage of house (102 dollars)
688.9 The Probit and Logit Models
- NNWP percent nonwhite in the neighborhood
(103) - NMFIneighborhood mean family income
(105dollars) - NAneighborhood average age of house (102
years) - The results are presented in Table 8.3.
-
698.9 The Probit and Logit Models
708.9 The Probit and Logit Models
- Measure Goodness of Fit
- There is a problem with the use of conventional
R2-type measures when the explained variable y
takes on only two values. - The predicted values are probabilities and
the actual values y are either 0 or 1. - We can also think of R2 in term of the
proportion of correct predictions.
718.9 The Probit and Logit Models
- Since the dependent variables is a zero or 1
variable, after we computer the we classify
the i-th observation as belonging to group 1 if
lt0.5 and group 2 if gt0.5. - We can then count the number of correct
predictions. - We can define a predicted value , which is
also a zero-one variable such that
728.9 The Probit and Logit Models
- (Provided that we calculate yi to enough
decimals, ties will be very unlikely.) - Now define
73Type I error vs. type II error
- It should be noted that the above count R2 values
obtained with the CAP and ROC curves treat the
costs of a type I error (classifying a
subsequently failing firm as non-failed) and a
type II error (classifying a subsequently
non-failed firm as failed) as the same. - However, in the credit market, the costs of
misclassifying a firm that subsequently fails are
much more serious than the costs of
misclassifying a firm that does not fail.
74Type I error vs. type II error
- In particular, in the first case, the lender can
lose up to 100 of the loan amount while, in the
latter case, the loss is just the opportunity
cost of not lending to that firm. - Accordingly, in assessing the practical utility
of failure prediction models, banks pay more
attention to the misclassification costs involved
in type I rather than type II errors.
75Type I error vs. type II error
- In particular, for every cutoff probability, the
type I error is defined as the percentage of
defaults that the model mistakenly classifies as
non-defaults and the percentage of non-defaults
that are mistakenly classified as defaults is the
type II error. - We can consider nineteen cutoff probabilities
0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40,
0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80,
0.85, 090 and 0.95, and use the average value of
errors as a criterion.
768.11 Truncated Variables The Tobit Model
- In our discussion of the logit and probit models
we talked about a latent variable which was
not observed, for which we could specify the
regression model - For simplicity of exposition we assuming that
there is only one explanatory variable. - In the logit and probit models, what we observe
is a dummy variable
778.11 Truncated Variables The Tobit Model
- Suppose, however, that is observed if
gt0 and is not observed if ?0. - Then the observed yi will be defined as
788.11 Truncated Variables The Tobit Model
- This is known as the tobit model (Tobins probit)
and was first analyzed in the econometrics
literature by Tobit. - It is also known as a censored normal regression
model because some observations on y (those for
which y ? 0)are censored (we are not allowed to
see them). - Our objective is to estimate the parameters ß
and s.
798.11 Truncated Variables The Tobit Model
- Some Examples
- The example that Tobin considered was that of
automobile expenditures . - Let y denote expenditures on automobile and x
denote income, and we postulate the regression
equation .
808.11 Truncated Variables The Tobit Model
- However, in the sample we would have a large
number of observations for which the expenditures
on automobiles are zero. - Tobin argued that we should use the censored
regression model. - We can specify the model as
- The structure of this model thus appears to be
the same as that in (8.19).
818.11 Truncated Variables The Tobit Model
- Another example hours worked ( H ) or wages ( W
) - If we have observations on a number of
individuals, some of whom are employed and others
not, we can specify the model for house worked as
828.11 Truncated Variables The Tobit Model
- Similarly, for wages we can specify the model
- The structure of these models again appears to be
the same as in (8.19).
838.11 Truncated Variables The Tobit Model
- Method Estimation
- Let us consider the estimation of ß and s by the
use of ordinary least squares. - We cannot use OLS with the positive observation
yi because when we write the model - the error term ui does not have a zero mean
- Since observations with are omitted, it
implies that only observations for which
are included in the sample.
848.11 Truncated Variables The Tobit Model
- Thus, the distribution of ui is a truncated
normal distribution shown in Figure 8.4 and its
mean is not zero. - In face, it depends in ß, s, and xi and is thus
different for each observation. - A method of estimation commonly suggested is the
maximum likelihood method, which is as follows.
858.11 Truncated Variables The Tobit Model
868.11 Truncated Variables The Tobit Model
- 1 . The positive values of y, for which we can
write down the normal density function as usual.
We note that has a standard
normal distribution. - 2. The zero observations of y for which all we
know is that .
Since has a standard normal
distribution, we will write this as
. The probability of this can be
written as , where F(z) is
the cumulative distribution function of the
standard normal.
878.11 Truncated Variables The Tobit Model
- Let us denote the density function of the
standard normal by f(?) and the cumulative
distribution function by f(?) . - Thus
- and
888.11 Truncated Variables The Tobit Model
- Using this notation we can write the likelihood
function for the tobit model as - Maximizing this likeihood function with respect
to ß and s, we get the ML estimates of these
parameters. - We will not go through the algebraic details of
the ML method here. - Instead, we discuss the situations under which
the tobit model is applicable and its
relationship to other models with truncated
variables.