Title: DummyVariable Regression Model
1Dummy-Variable Regression Model
2Multiple Regression Models
3Dummy-Variable Regression Model
- Involves categorical X variable with 2 or more
levels - e.g., male-female, college-no college etc.
- or firms or states or cities
- Each level is coded 0 or 1
- Assumes only intercept is different
- Slopes are constant across categories
- The number of dummy variables that are included
is 1- of levels
4Dummy-Variable Regression Model Example Coding
- Gender (2 levels) Male1 Female0 for variable
MALE - Marital Status (3 levels - requires 2 dummies)
- MARRIED Single0 Divorced0 Married1
- DIVORCED Single0 Divorced1 Married0
5Interpreting Dummy-Variable Model Equation
?
b
X
Y
b
?
?
Given
i
i
0
2
2
Y
?
Starting s
alary of c
ollege gra
d'
s
0
i
f Male
X
?
2
1
if Female
b0 mean Y for men since for each man
Yb0b2(0) b2 difference of means between men
and women since for women Yb0b2(1). b0b2
mean Y for women
6Comparison to other techniques
- This is identical to a t-test for the difference
of means. We test b20 to test if there is a
significant difference of means. - This is identical to a one-way ANOVA for a
difference of means.
7Dummy-Variable Model Relationships
Y
Means for males and females
Females
b0 b2
b0
Males
0
X1
0
8Interpreting Dummy-Variable Model Equation
?
Y
b
b
X
b
X
?
?
?
Given
i
i
i
0
1
1
2
2
Y
?
Starting s
alary of c
ollege gra
d'
s
X
?
GPA
1
0
i
f Male
X
?
2
1
if Female
Males (
)
X
?
0
2
?
Y
b
b
X
b
b
b
X
?
?
?
?
?
0
i
i
i
0
1
1
2
0
1
1
9Interpreting Dummy-Variable Model Equation
10Dummy-Variable Model Relationships
Y
Same slopes b1
Females
b0 b2
b0
Males
0
X1
0
11Dummy-Variable Model Example
Same slopes
12Interpretation
- The difference in mean output between men and
women is 7, holding constant GPA. - When there are more than two groups, the
interpretation of the coefficient is always the
difference of means between that group and the
EXCLUDED GROUP.
13How many dummy variables do you need?
- To compare union workers and nonunion workers?
- To compare whites, blacks, hispanics and asians?
- To compare months of the year?
14EXAMPLE
15EXAMPLE
16Interaction Regression Model
17Multiple Regression Models
18Interaction Regression Model
- Hypothesizes interaction between pairs of X
variables - Response to one X variable varies at different
levels of another X variable - Contains two-way cross product terms
- Can be combined with other models
- e.g., dummy variable model
19Effect of Interaction
- Given
- Without interaction term, effect of X1 on Y is
measured by ?1 - With interaction term, effect of X1 onY is
measured by ?1 ?3X2 - Effect changes as X2i increases
20Interaction Example
Y 1 2X1 3X2 4X1X2
Y
Y 1 2X1 3(1) 4X1(1) 4 6X1
12
8
Y 1 2X1 3(0) 4X1(0) 1 2X1
4
0
X1
0
1
0.5
1.5
Effect (slope) of X1 on Y does depend on X2 value
21Interaction Regression Model Worksheet
Multiply X1 by X2 to get X1X2. Run regression
with Y, X1, X2 , X1X2
22Interpretation when there are 3levels
a Mean Y for a single female (MALE,MARRIED,DIVOR
CED0) b1 Difference in means between males and
females (ab1mean Y for single males) b2
Difference in means between single and married
(holding gender constant) b3 Difference in means
between divorced and single b2-b3Difference in
means between married and divorced
23Interpretation when there are 3levels
- It is possible to interact the dummy variables.
This can give an identical result as a 2-way
ANOVA. - In this example, this would allow the effect of
marital status to vary with gender.
24Interpretation when there are 3levels
- MALE0 if female and 1 if male
25Interpretation when there are 3levels
- MALE0 if female and 1 if male
- MARRIED1 if married 0 if divorced or single
- DIVORCED1 if divorced 0 if single or married
- MALEMARRIED1 if male married 0 otherwise
(MALE times MARRIED) - MALEDIVORCED1 if male divorced 0
otherwise(MALE times DIVORCED)
26(No Transcript)
27Interpreting Results
Difference
- FEMALE
- Single
- Married
- Divorced
- MALE
- Single
- Married
- Divorced
Main Effects MALE (MARRIED and
DIVORCED) Interaction Effects MALEMARRIED and
MALEDIVORCED
28Interpreting results
- Testing for interaction Must do F-test of joint
hypothesis that - EXAMPLE
29Polynomial (Curvilinear) Regression Model
30Multiple Regression Models
31Curvilinear Regression Model
- Relationship between 1 response variable and 2 or
more explanatory variable is a polynomial
function - Useful when scatter diagram indicates non-linear
relationship - Curvilinear model
- The second explanatory variable is the square of
the 1st.
32Curvilinear Regression Model
Curvilinear models may be considered when scatter
diagram takes on the following shapes
Y
Y
Y
Y
X1
X1
X1
X1
?2 gt 0
?2 gt 0
?2 lt 0
?2 lt 0
?2 the coefficient of the quadratic term
33Testing for Significance Curvilinear Model
- Testing for Overall Relationship
- Similar to test for linear model
- F test statistic
- Testing the Curvilinear Effect
- Compare curvilinear model
- with the linear model
34Testing for Significance Curvilinear Model
- May require testing a portion of the model (e.g.
the linear and squared terms) when there are
other variables in the model - Here we must test to test for the
significance of X1 - an F-test for these two
variables
35Inherently Linear Models
- Non-linear models that can be expressed in linear
form - Can be estimated by LS in linear form
- Require data transformation
- Multiplicative model example
36Using Transformations
- Requires Data Transformation
- Either or Both Independent and Dependent
Variables May be Transformed - Can be based on theory, logic or scatter diagrams
37Square Root Transformation
?1 gt 0
Similarly for X2
?1 lt 0
Transforms one of above model to one that appears
linear. Often used to overcome heteroscedasticity.
38Logarithmic Transformation
?1 gt 0
Similarly for X2
?1 lt 0
39Exponential Transformation
Original Model
?1 gt 0
Similarly for X2
?1 lt 0
Transformed into
40Interpretation of coefficients
- The dependent variable is logged.
- The coefficient on the independent variable can
be approximately interpreted as a 1 unit change
in X leads to a b percentage change in Y. - The independent variable is logged.
- The coefficient on the independent variable can
be approximately interpreted as a 100 percent
change in X leads to a b unit change in Y.
41Interpretation of coefficients
- Both dependent and independent variables are
logged. - The coefficient on the independent variable can
be approximately interpreted as a 1 percent
change in X leads to a b percentage change in Y.
Therefore b is the elasticity of Y with respect
to a change in X.
42Income and Experience Scatter Plot
43Income and Experience Linear
44Income and Experience Log Independent Variable
45Income and Experience Income Logged
46Income and Experience Double Log
- Double Log - Elasticity Model (Note LFEXP is
already logged in this example)
47Income and Experience Quadratic
48Income and Experience Log plus Quadratic
49Income and Experience All Specifications
50Standardized and Unstandardized
- Many disciplines report ONLY standardized
coefficients - The usual coefficients are then referred to as
unstandardized coefficients - The standardized coefficient are often referred
to as beta weights - The t-tests for significance of the slopes are
identical for either of these two.
51Interpretation of coefficients
- If both Y and X are measured in standardized
form, and
Then the bs are called standardized
coefficients. They indicate the number of
standard deviations Y will change when X changes
by one standard deviation
52BETA Coefficients Example
53Comparison of coefficients
- In general, we should NOT compare coefficients
unless they are measured in the same units (e.g.
dollars or inches) - Two unit free measures are sometimes used to
compare coefficients - elasticities (percentage changes)
- standardized coefficients (Stand. Dev. Changes)