Title: Ceteris Paribus
1Ceteris Paribus
- We had economic growth of 2, unemployment was
low at 5, inflation was under control (2) and
there were no strikes in the previous year - What governments VOTE share would we predict?
- Growtht2, Unempt5, Infltiont2, Strikest-10
- What if the government does some gov expenditures
and the economy grows by 3 rather than 2? Other
things being equal (ceteris paribus)
2Role of the Intercept in Regressions
- The intercept is the y value that we would
predict if all explanatory variables take a value
equal to 0 - Example
- yab1x1b2x2b3x3b4x4e with x1x2x3x40
- yab10b20b30b40a
y
yab1x1
ya
x10
x1
3Role of the Intercept in Regressions
- If the range of x values does not include 0, the
intercept should not be interpreted as if x10
y
ya
yb1x1
x1
x10
- The intercept is nevertheless crucial
- Without the intercept, we assume a0, i.e. we
would force the regression line to go through the
origin (x, y)(0, 0) - ? Consequences on the estimated slope coefficient
b
4QM1 Week 7Dummy Variables
- Dr Alexander Moradi
- University of Oxford, Dept. of Economics
GPRG/CSAE - Email alexander.moradi_at_economics.ox.ac.uk
59. Dummy Variables
- Qualitative variables do not have an ordinal
scale - Dummy variables make it possible to incorporate
qualitative factors into regression models - A dummy variable has only 2 values ? 0, 1
- We have to define which event is assigned the
value one and which is assigned the value zero - Examples
- FEMALE1, if female FEMALE0, if male
- MALE1, if male MALE0, if female
- HISTORIAN1, if historian HISTORIAN0 otherwise
- UK1, if UK UK0 if other country
- SKILLED1, if skilled worker SKILLED0
otherwise - Dummy variables are also called binary variable
or zero-one variable
69.1 Dummy Variables
- Relationship between age and income in South India
Wage differential
Wages increase with the age of employees
Independent of age female employees earn less
79.1 Incorporating Qualitative Factors
- 1. Running two separate regressions, one
regression for each gender - WAGEMaMbMAGEMeM for men
- WAGEFaFbFAGEFeF for women
- Advantages
- Makes differences in the coefficients visible
- By using confidence intervals we can test for
significant differences, e.g. is aM and bM
significantly different from aF and bF
respectively - Disadvantages
- We need to look at a and b jointly small
differences in the slope coefficient b can create
large differences in the intercept a - Reduced number of observations in two separate
regressions ? higher SE ? lower t-values ? lower
precision of regression coefficients
89.1 Using a Dummy Variable
- 2. Alternative Only the intercept is different
(aM?aF), the slope coefficients are identical for
both groups (bM?bF) - Dummy variable
- FEMALE1, if female
- FEMALE0, if male
- Regression model WAGEab1FEMALEb2AGEe
- Interpretation
- Wage of a female employee at age 30
- WAGEab11b230ab1b230
- Wage of a male employee at age 30
- WAGEab10b230a b230
- b2 Effect of an increase in age by 1 year
- b1 Wage differential or what women earn holding
age like a man
99.1 Using a Dummy Variable
- The dummy variable is like a change in the
intercept (for female employees), parallel upward
(b1gt0)/downward (b1lt0) shift of the regression
line/plane - Advantages
- Larger sample size The wage-age pattern of both
men and women gives us more confidence - t-value for the coefficient of the dummy variable
b1 indicates significance of wage differential.
If significantly negative Holding other
characteristics like age constant, female
employees earn less
109.2 Using Dummy Variables for Multiple Categories
- Qualitative characteristics are not limited to
two dimensions - Multiple categories More than two categories
- Example
- Society Lower / Middle/ Upper class
- Industries Manufacturing/ Textiles/ Food
processing/ Chemicals etc. - If we use one variable and code it with certain
values, we would assume an order that is not
necessarily true - Example
- Variable CLASS with Lower class1, Middle
class2, Upper class3 - Middle class members (CLASS2) are twice as good
as lower class members (CLASS1). Upper class
members (CLASS3) are thrice as good as lower
class members - If we have g groups or categories, we need to
include g-1 dummy variables (including g groups
would result in the dummy variable trap)
119.3 Example
- Regression with two dummy variables
- LOWCLASS1, if lower class, 0 otherwise
- MIDCLASS1, if middle class, 0 otherwise
- What is the income of a member of the lower and
middle class respectively?
- Regression with an ordinal variable CLASS1, 2, 3
129.3 Reference Category
- The intercept reflects the predicted outcome of y
if all dummy variables are equal to 0 - ? Intercept picks up the category for which no
dummy variable was included. This category is the
reference category - The coefficients of the dummy variables express
the effect compared to the reference category - ? The interpretation does not change when we
choose a different reference category - ? Regression coefficients of the dummy variables
and intercept adjust accordingly
139.4 Interaction Terms
- In order to test the effect of a combination of
characteristics, we take the product of the two
dummy variables - ? creates a new dummy variable
- Value of the new dummy variable1, if value was 1
in both of the original dummy variables - Value of the new dummy variable0, if value was 0
in at least one of the original dummy variables - ? Regression coefficient indicates the effect
that the combination of characteristics has (as
opposed to the isolated effect that is given by
the coefficients of each of the two original
dummy variables)
149 Exercise Dummy Variables
- Dataset india.dta
- Data on income and background characteristics of
261 employees in a South Indian city - Estimate the model Ln(WI)ab1AGEb2
EDUb2FEMALEe. Interpret the results - Generate an interaction term FEMALE_EDU1 if
female and secondary, FEMALE_EDU0 otherwise - Add the interaction term to the model in (1).
Interpret the regression coefficient of
interaction term - What is the expected income of a unskilled,
female employee at age 30 years? - What is the expected income of a skilled, male
employee at age 30 years? - What is the expected income of a skilled, female
employee at age 30 years? - Data set weimar_election.dta
- Run a regression of Nazis percentage of votes on
the unemployment rate, share of workers,
Catholics, and farmers. Add dummy variables for
the time of the general election. Interpret the
results - What could have caused unemployment to become
insignificant? Hint Calculate the mean
unemployment and NAZI votes for each of the four
elections - Is the model specification with dummy variables
appropriate?
159 STATA commands
169 Homework Exercises Week 7
- Read chapter 10.1 of Feinstein Thomas (p.
280-291) - Do the following exercises from Feinstein
Thomas (p. 295-299) 2, 4 - Dataset 1699_RELIEF.DTA
- Commands to be used in 2
- gen sussex1 if county2
- replace sussex0 if county!2
- Hint You find Boyers regression model on p.
472 - regress relief cottind allotmnt london farmers
wealth density childall subsidy grain workhse
roundsmn labrate sussex - regress relief cottind allotmnt london farmers
wealth density childall subsidy grain workhse
roundsmn labrate
179 Homework Exercises Week 7
- Commands to be used in 4
- generate wageincome/2.6
- (Alternatively, you can try the menu under Data/
Create or change variables/ create new variable) - generate graind1 if graingt20
- replace graind0 if grainlt20
- Hint You can follow this procedure for coding
the LON dummy variables. Alternatively, you can - srecode LONlondon, min(0) max (100) step(25)
- (Hint srecode was introduced in Week 1)
- replace LON100 if londongt100
- Hint To get an idea about the coding of new
london_cat variable, try - tab london LON
- xi i.LON, noomit
- Hint Use the data browser to take a look at the
newly generated dummy variables - regress wage _ILON_0 _ILON_25 _ILON_50 _ILON_75
graind
188 Homework Exercises Week 7
- The effect of including a dummy variable for a
single observation is identical to excluding this
observation from the regression. Explain! - Use the dataset Depression.dta
- Estimate a multiple linear regression. Use the
exchange rate (EXCHANGE), the real wage
(REALWAGE) and the discount rate (INTEREST) to
explain industrial production in 1935 (1929100).
Interpret your results - Hint reg prod exchange realwage interest if
year1935 - Exclude insignificant variables from the
regression model - Compare your results with those obtained from
simple linear regressions Report the results in
one regression table with a column for each
regression and interpret the differences - The regression coefficient of the real wage
differs in 4b) and 4c). Is multicollinearity the
reason for this? Explore the correlations between
the explanatory variables. What would you
conclude? Does REALWAGE possibly pick up the
effect of EXCHANGE? - Hint check for correlations between the
independent variables using the corr command. Do
not forget to restrict the sample to year1935
198 Homework Exercises Week 7
- Run the following STATA commands. Explain what
the commands do Hint Use STATAs Help/Stata
Command - tsset country_id year
- generate dip((prod-L1.prod)/L1.prod)100
- label variable dip Annual Growth of Industrial
Production (in ) - generate inflation((prices-L1.prices)/L1.prices)
100 - replace drealwage ((realwage-L1.realwage)/L1.real
wage)100 - generate dexchange((exchange-L1.exchange)/L1.exch
ange)100 - regress dip inflation dexchange drealwage
- Interpret the regression results in 3e) Step 7.
Is the specification of variable DEXCHANGE
appropriate to model the effect of devaluation?
Can the effect of inflation be taken as
causality? - How rigid were nominal wages? Test whether
nominal wages adjusted to inflation, i.e. nominal
wages decrease with deflation. Interpret the
results - generate dwage ((wage-L1.wage)/L1.wage)100
- generate inflation_year_priorL1.inflation
- regress dwage inflation_year_prior