Final Review Session - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Final Review Session

Description:

Here TV exposure is not the problem, its laziness which is in the error term ... X TV exposure (#fast food ads seen by child) Y Child BMI. No. Don't think so... C ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: Art98
Category:
Tags: final | food | review | session | tv

less

Transcript and Presenter's Notes

Title: Final Review Session


1
Final Review Session
  • January 16th, 2009
  • Dainn Wie
  • Arturo Aguilar

2
I. Be sure to know
  • Important topics covered before the midterm to
    review
  • Interpreting coefficients (multivariate regs)
  • Hypothesis tests (t stats F stats)
  • Dummy variables
  • Interactions (when to use them and how to
    interpret them)
  • Non-linear regressions polynomial and logs
  • Panel data analysis (fixed effects)
  • HAC SEs
  • Internal and external validity

3
II. Final 2006 Part I
  • Using regression (1) in Table 2
  • Compute the estimated effect on the childs years
    of education of an increase of four years in the
    mothers education.
  • Compute a 95 confidence interval for your
    estimated effect in (a).

4
II. Final 2006 Part I
  • Compute the estimated effect on the childs years
    of education of an increase of four years in the
    mothers education.

The coefficient implies Caeteris paribus (all
else equal), the childs years of education are
expected to increase in 4x0.097 (0.0388) for a 4
yr increase in mothers education
5
II. Final 2006 Part I
  • Compute a 95 confidence interval
  • The new coefficient b4xb
  • Var(b) Var(4xb) 16xVar(b)
  • SE(b) sqrt(Var(b))sqrt(16xVar(b))4xSE(b)
  • SE(b)4x0.027 0.108
  • 95 CI 0.388 /- (1.96)(0.108)

6
II. Final 2006 Part I
  • Consider the relationship between the childs
    years of education and parental BMI, holding
    constant the regressors in Table 2, column (1)
    other than parental BMI.
  • Suggest a reason why this effect might be
    nonlinear.
  • Can you reject the null hypothesis that effect on
    the childs years of education of parental BMI is
    linear? Explain.

7
II. Final 2006 Part I
  • Can you reject the null hypothesis that effect on
    the childs years of education of parental BMI is
    linear? Explain.

Ho Mom BMI2 0 Dad BMI2 0 (Mom
BMI)x(DadBMI)0 We reject at a 5 level if
p-value is less than 0.05 gt Here we FAIL TO
REJECT
8
II. Final 2006 Part I
  • Consider the regressions in Table 1.
  • Explain why these regressions can be used to
    examine the proposition that the assignment
    process of adoptees to families was in effect
    random.

9
II. Final 2006 Part I
  • Using regressions (1) and (2), can you reject the
    hypothesis of random assignment? Explain.
  • Using regressions (3) and (4), can you reject the
    hypothesis of random assignment? Explain.

10
II. Final 2006 Part I
  • Explain what your answers to (b) and (c) imply
    about the program. Explain, in real-world,
    concrete terms, how you might reconcile any
    discrepancy between your answers to (b) and (c).
  • Once year of adoption dummy variables are
    included, the variable of log parents income
    looses its significance.
  • This might be the case if the log parents income
    variable was catching up a year effect.

11
II. Final 2006 Part I
  • The standard errors reported in Tables 1 and 2
    are clustered standard errors, clustered at the
    level of the household.
  • Explain specifically what this means, that is,
    what are clustered standard errors, clustered at
    the level of the household? Be precise.
  • Provide a reason why the clustered standard
    errors could be larger than the conventional
    heteroskedasticity-robust standard errors for the
    regressions in Table 2.

12
II. Final 2006 Part I
  • Important facts to have in mind about clustered
    SEs
  • Coefficients do not change since you are not
    adding or subtracting regressors
  • SEs of the coefficients change and this has an
    impact on t-stats, F-stats, CI, etc.
  • Clustered SEs mean that errors (ui in the reg)
    within a cluster (in this case within adoptees in
    the same hhold) are correlated, but not across
    clusters

13
II. Final 2006 Part I
  • Adding clustered SEs usually makes the SEs
    increase with respect to the heteroskedastic SEs.
    This is the case when the correlation between the
    errors within the cluster is positive.
  • In (b) this is the case if within adoptees
    living in the same household both get less
    attention for the presence of non-adoptee
    children or because the household received a
    negative shock (illness of a parent for example)

14
III. Final 2006 Part II
  • Consider a female adoptee whose med is 14, fed is
    16, parents income is 50,000, mBMI is 23, fBMI
    is 24, mom dad not drink. Also child was
    adopted in the initial program year.
  • Using reg(2) in Tbl 3, compute predicted
    probability that the adoptee grows up to be a
    drinker.
  • Using the coefficients
  • E(zX) -1.3 (0.013)(14) (0.022)(16)
    (0.079)(log50,000) (0)(23) (0)(24)
    (0.374)(0) (0.211)(0) (0.203)(0) 0.089

15
III. Final 2006 Part II
  • Probit uses the normal cdf. What I am predicting
    is the value of z.
  • So, to find the expected probability I need to
    find the cdf at z0.089 in the normal std tables.

Pr(Zlt0.089)0.535
16
III. Final 2006 Part II
  • What is the difference in the predicted
    probabilities of drinking for the adoptee in (a),
    compared with an adoptee whose parents have the
    same characteristics as those in (a) except that
    the mother drinks?
  • The predicted z will be the same, except here we
    add the 0.374 coefficient of Mother Drinks.
  • z2 0.0890.374 0.463
  • E(Zlt0.463)0.678
  • DProb 0.678 0.535 0.143

17
III. Final 2006 Part II
  • Remember you cannot directly interpret the
    coefficient in the Probit regression, but you can
    interpret the sign.
  • In this case the Mother Drink coefficient is
    positive which tells you z will increase. Since z
    increases, the expected proba will also increase

18
III. Final 2006 Part II
  • In the probit you will care also if the
    coefficient is significant, since this will tell
    you if the increase (or decrease) in probability
    is significant.
  • The logit is completely the same, the only
    difference is that it uses another cdf the
    cumulative standard logistic distrib fn.

19
III. Final 2006 Part II
  • Now use the LPM from Table 2 to estimate the
    change in predicted probabilities for the
    comparison in 1(b).
  • In the LPM case the coefficients can be directly
    interpreted as the change in probability of Y for
    an increase in X equal to 1.
  • Hence, the result is just to look at the Mother
    Drinks coefficient 0.135

20
III. Final 2006 Part II
  • Using the results in Tables 2 and 3, do you agree
    or disagree with the following statements?
    Explain.
  • Tables 2 and 3 show that high parental BMI and
    low parental education both are associated with
    worse outcomes for adoptees.
  • Since the assignment of children to parents
    seems to be as good as random, the statement
    seems to be truth since parents BMI and education
    happens to be significant towards adoptees
    education.

21
III. Final 2006 Part II
  • The results show that dieting by overweight
    mothers has benefits for children. Caeteris
    paribus, we would expect to see this weight loss
    lead to an increase in the childs education and
    probability of graduating from college.
  • Watch out with the wording!!!
  • In (a) we said yes to high parental BMI is
    associated with worse outcomes for adoptees, but
    here we are saying lowering parent BMI will CAUSE
    an increase in outcomes for adoptees.
  • It is not clear there is causality think in
    OVB. Example might not be BMI but family intake
    of iron

22
III. Final 2006 Part II
  • The results in Table 3 show that paternal
    characteristics are transmitted primarily through
    a genetic path, whereas maternal characteristics
    seem to be transmitted primarily through a
    non-genetic (environmental) path.
  • Solution key says yes since fathers coefficients
    turn out to be significant when using
    non-adoptees while mothers coefficients change
    very little from adoptees to non-adoptees.
  • However, you could argue that it is not a
    fathers genetic inheritance, but rather a
    distinction on how they treat adoptees vs
    non-adoptees, while mothers dont make much of a
    distinction.

23
III. Final 2006 Part II
  • Aside question
  • How would you need to modify this regression to
    see if Dad and Mom Drinking differently affects
    Male and Female adoptees? What would you test?
  • Answer You would need to add an two interaction
    terms Male x Dad Drinks Male x Mom Drinks
  • Then, you would carry an F test on the
    coefficients of these two interactions

24
IV. Final 2006 Part III
  • Suggest a reason why TV exposure might be
    endogenous in regression (1).
  • One possibility A kid that is lazy tends to do
    less sports and sit more to watch TV. Also, since
    he does less sports he has higher BMI.
  • Here TV exposure is not the problem, its laziness
    which is in the error term (since we dont
    include it in the regression)

25
IV. Final 2006 Part III
  • Regression (3) uses three variables as
    instrumental variables for TV exposure. For each
    instrument, explain whether, in your judgment,
    the instrument plausibly is exogenous
  • The Price of TV advertising in the county

26
IV. Final 2006 Part III
  • To see if a variable is a good instrument we
    need to check two conditions
  • Relevance cov(Z,X)0
  • Exogeneity cov(Z,u)0
  • Remember you need to think exogeneity in two
    ways
  • The instrument cannot be correlated with an
    omitted error, which in turn affects Y, since you
    would only replicate OVB
  • The instrument should not directly affect the
    dependent variable (Y), only through X

27
IV. Final 2006 Part III
  • Price of TV advertising in the county
  • Number of households with TV in the county
  • Average annual county Temperature.
  • Check for Relevance and exogeneity, where
  • X TV exposure (fast food ads seen by child)
  • Y Child BMI

28
IV. Final 2006 Part III
  • Consider regression (3).
  • Suppose the instruments in regression (3) are
    weak. If so, what would the consequence be for
    interpreting the results in column (3),
    specifically the coefficient on TV exposure and
    its SE?
  • Having weak instruments causes the estimates to
    be biased.
  • Weak instruments mean that the instruments are
    not relevant, that is, in your first stage
    regression, the coefficients for the instruments
    are not significant enough.

29
IV. Final 2006 Part III
  • Based on the results in Table 4, are the
    instruments weak, strong, or do you need more
    information before you can decide? Explain.
  • The steps are
  • Look at the first stage regression
  • Carry out an F test on the instruments (if only
    one instrument, the square of the t stat is the F
    stat)
  • If F-stat gt10 gt strong instruments

30
IV. Final 2006 Part III
  • Consider the J-statistic in column (3).
  • Suppose you were to reject the null hypothesis
    using this J-statistic. What would you conclude?
  • Ho cov(Z,u)0
  • gt Rejecting the null hypothesis means rejecting
    exogeneity (i.e. at least one of the instruments
    is not exogenous, which is one of the key
    assumptions
  • Note To be able to test for exogeneity you need
    to have overidentification (i.e. more instruments
    than endogenous variables)

31
IV. Final 2006 Part III
  • To test exogeneity you need to (just if over-id)
  • Estimate the errors of the regression using the
    instruments (errors from reg (3) in this case)
  • Regress the estimated errors vs the instruments
  • Calculate an F-stat with the coefficients of the
    instruments
  • Jstat (instruments) x (F-stat)
  • Look for the p-value using the obtained J-stat in
    the chi-square fn with degrees of freedom equal
    to (instrum endog variables).

32
IV. Final 2006 Part III
  • Note that in the previous test you are testing
    the exogeneity of m-k instruments (where m is
    instruments and k of endogenous variables).

33
IV. Final 2006 Part III
  • Using the J-statistic actually reported in column
    (3), do you reject the null hypothesis at the 5
    significance level? Explain how you reached this
    conclusion (be precise).
  • Jstat 0.308
  • Look in the chi-squared with (3-1) d.o.f.
  • The critical value at 5 is equal to 5.99
  • gt 0.308 lt 5.99, hence FAIL TO REJECT

34
IV. Final 2006 Part III
  • A researcher suggests using as instruments a full
    set of county binary variables (county dummy
    variables). What would be the effect of adding a
    full set of county dummy variables to regression
    (2)?
  • Problem Temperature, Price of TV advertising,
    and num. of hhs are all variables established at
    a county level
  • Hence, adding county dummy variables would
    generate perfect multicollinearity

35
IV. Final 2006 Part III
  • Perfect multicollinearity is when one of the
    regressors can be expressed as a linear function
    of the other regressors
  • Simplified example
  • Think you have 3 counties A has an avg
    temperature of 50, B of 60, and C of 70
  • Temperature could be expressed as a linear
    function of two of the dummies
  • Temperature 50 10 Dummy B 20 Dummy C

36
V. Final 2007 Part II
  • At the start of the semester half the students in
    Ec1010a are randomly selected and told they will
    receive a cash payment of 250 if they attend at
    least 90 of the lectures.
  • The objective is to estimate the causal effect of
    attendance on grades.
  • Regression of interest
  • Grade b0 b1 Attendance b2 Female
  • b3 Prior GPA b4 SAT b5 HS quality
  • Dummies for concentration

37
V. Final 2007 Part II
  • Provide a reason why the coefficient on
    Attendance could be a biased
  • Need to make up an OVB history
  • Example Having a job
  • Having a job is likely to be negatively
    correlated with attendance
  • If we add Having a job on the grades
    regression we would probably get a negative
    coefficient
  • gt This would make the attendance coefficient to
    be positively biased

38
V. Final 2007 Part II
  • Let Cash denote a binary variable that equals 1
    if the student is chosen to be eligible for a
    cash enticement and equals 0 otherwise. Is Cash
    plausibly a valid instrument for Attendance in
    the 2SLS regression of Grade on Attendance and W?
    Explain.
  • Relevant? Yes, I will be likely to attend if I
    get money for it
  • Exogenous? Since it was randomly distributed it
    should not be correlated with any of the other
    regressors
  • Watch out for Placebo effects here!! (John Henry)

39
V. Final 2007 Part II
  • Are the control variables W needed or useful in
    this regression? If not, should they be dropped
    from this regression? Explain.
  • The coefficient on attendance will be consistent
    either way (if you include W or not)
  • However, including W will allow you to get a
    better fit in the regression and reduce its SE

40
VI. Final thoughts
  • Two more things
  • A piece missing here is Panel data and Fixed
    effects
  • Advice Take a look at 2005 Final Part II
  • GOOD LUCK IN THE EXAM!
Write a Comment
User Comments (0)
About PowerShow.com