Introduction to Statistics: Political Science (Class 5) - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Introduction to Statistics: Political Science (Class 5)

Description:

Introduction to Statistics: Political Science (Class 5) Non-Linear Relationships Predict Obama Favorability (1-4) Coef. SE T P Strong Republican -1.632 0.161 -10.160 ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 39
Provided by: DavidD274
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Statistics: Political Science (Class 5)


1
Introduction to Statistics Political Science
(Class 5)
  • Non-Linear Relationships

2
Thus far
  • Focus on examining and controlling for linear
    relationships
  • Each one unit increase in an IV is associated
    with the same expected change in the DV
  • Ordinary-least-squares regression can only
    estimate linear relationships
  • But, we can trick regression into estimating
    non-linear relationships buy transforming our
    independent (and/or dependent) variables

3
When to transform an IV
  • Theoretical expectation
  • Look at the data (sometimes tricky in
    multivariate analysis or when you have thousands
    of cases)
  • Today three types of transformations
  • Logarithm
  • Squared terms
  • Converting to indicator variables

4
Logarithm
  • The power to which a base must be raised to
    produce a given value
  • Well focus on natural logarithms where ln(x) is
    the power to which e (2.718281) must be raised to
    get x
  • ln(4) 1.386 because e1.386 4

5
1 ? 5 in original measure 1.609 change in
logged value 5 ?10 in original measure .693
change in logged value 10 ? 15 in original
measure .405 change in logged value 15 ? 20 in
original measure .288 change in logged value
So the effect of a change in a 1 unit change x
depends on whether the change is from 1 to 2 or 2
to 3 ? ß0 ß1ln(x) u
6
When to log an IV
  • Diminishing returns as X gets large
  • Data is skewed e.g., income

7
Income and home value
  • 60,000/year ? 200,000 home
  • 120,000/year ? 400,000 home
  • Bill Gates makes about 175 million/year
  • 175,000,000 2917 x 60,000
  • Should we expect him to have a 2917 x 200,000
    (583,400,000) home?

8
(No Transcript)
9
(No Transcript)
10
TVs and Infant Mortality
  • TVs as proxy for resources or wealth
  • Biggest differences at the low end?
  • E.g., there are a couple of TVs in town and
    some people have TVs in their private homes

11
(No Transcript)
12
(No Transcript)
13
Coef. SE T P
TVs per capita -156.436 12.934 -12.100 0.000
Constant 74.810 3.419 21.880 0.000
R-squared 0.566
Coef. SE T P
TVs per capita (logged) -24.656 1.397 -17.640 0.000
Constant -11.151 3.346 -3.330 0.001
R-squared 0.748
14
(No Transcript)
15
(No Transcript)
16
Getting Predicted Values
Coef. SE T P
TVs per capita (logged) -24.656 1.397 -17.640 0.000
Constant -11.151 3.346 -3.330 0.001
TVs per capita Logged Predicted value
0.1 -2.303 45.621
0.2 -1.609 28.531
0.3 -1.204 18.534
0.4 -0.916 11.441
0.5 -0.693 5.939
0.6 -0.511 1.444
17
(No Transcript)
18
Quadratic (squared) models
  • Curved like logarithm
  • Key difference quadratics allow for U-shaped
    relationship
  • Enter original variable and squared term
  • Allows for a direct test of whether allowing the
    line to curve significantly improves the
    predictive power of the model

19
(No Transcript)
20
Age and Political Ideology
Coef. SE T P
Age -0.007 0.004 -1.740 0.082
Constant 0.122 0.209 0.580 0.561
What would we conclude from this analysis?
Coef. SE T P
Age -0.065 0.025 -2.630 0.009
Age-squared 0.001 0.000 2.390 0.017
Constant 1.554 0.635 2.450 0.015
21
Age and Political Ideology
Coef. SE T P
Age -0.065 0.025 -2.630 0.009
Age-squared 0.001 0.000 2.390 0.017
Constant 1.554 0.635 2.450 0.015
Age Age2 -0.065Age .0005574Age2 Constant Predicted Value
18 324 -1.178 0.181 1.554 0.557
28 784 -1.832 0.437 1.554 0.159
38 1444 -2.487 0.805 1.554 -0.128
48 2304 -3.141 1.284 1.554 -0.303
58 3364 -3.795 1.875 1.554 -0.366
68 4624 -4.450 2.577 1.554 -0.319
78 6084 -5.104 3.391 1.554 -0.159
22
(No Transcript)
23
Age and Political Ideology
Coef. SE T P
Age -0.065 0.025 -2.630 0.009
Age-squared 0.001 0.000 2.390 0.017
Constant 1.554 0.635 2.450 0.015
  • Note We are using two variables to measure the
    relationship between age and ideology.
  • Interpretation
  • statistically significant relationship between
    age and ideology (can confirm with an F-test)
  • squared term significantly contributes to the
    predictive power of the model.

24
If you add a linear and squared term (e.g., age
and age2) to a model and neither is independently
statistically significant
  • This does not necessarily mean that age is not
    significantly related to the outcome Why?
  • What we want to know is whether age and age2
    jointly improve the predictive power of the
    model. How can we test this?

25
Formula
F (SSRr - SSRur)/q
F SSRur/(n-(k1)
  • q of variables being tested
  • n number of cases
  • k number of IVs in unrestricted

Check whether value is above critical value in
the F-distribution depends on degrees of
freedom Numerator number of IVs being tested
Denominator N-(number of IVs)-1
26
Dont worry about the F-test formula
  • The point is
  • F-tests are a way to test whether adding a set of
    variables reduces the sum of squared residuals
    enough to justify throwing these new variables
    into the model
  • Depends on
  • How much sum of squared residuals is reduced
  • How many variables were adding
  • How many cases we have to work with
  • More acceptable to add variables if you have a
    lot of cases
  • Intuition explaining 10 cases with 10 variables
    v. explaining 1000 cases with 10 variables?

27
TVs and Infant Mortality
  • Squared term or logarithm?

Coef. SE T P
TVs per capita -380.088 29.949 -12.690 0.000
TVs per capita (squared) 410.957 51.629 7.960 0.000
Constant 90.197 3.353 26.900 0.000
28
Which is better?
  • Two basic ways to decide
  • Theory
  • Which yields a better fit?

29
Run two models and compare R-squared or
possibly
Coef. SE T P
TVs per capita -30.288 74.056 -0.410 0.683
TVs per capita (squared) 63.413 81.652 0.780 0.439
TVs per capita (logged) -24.635 5.155 -4.780 0.000
Constant -9.465 20.417 -0.460 0.644
What might we conclude from these model estimates?
Probably should also do an F-test of joint
significance of TVs per capita and TVs per
capita-squared. Why?
That F-test returned a significance level of
0.335. So we can conclude that
Ultimately youre best off relying on theory
about the shape of the relationship
30
Ordered IVs ? Indicators
  • Sometimes we have reason to expect the
    relationship between an IV and outcome to be more
    complex
  • Can address this using more polynomials (e.g.,
    variable3, variable4, etc)
  • We wont go there instead
  • Example Party identification and evaluations of
    candidates and issues

31
Standard branching PID Items
  • Generally speaking, do you usually think of
    yourself as a Republican, a Democrat, an
    Independent, or something else?
  • If Republican or Democrat ask Would you call
    yourself a strong (Republican/Democrat) or a not
    very strong (Republican/Democrat)?
  • If Independent or something else ask Do you
    think of yourself as closer to the Republican or
    Democratic party?

32
Party Identification Measure
People who say Democrat or Republican in
response to first question
Strong Republican Weak Republican Lean Republican Independent Lean Democrat Weak Democrat Strong Democrat
-3 -2 -1 0 1 2 3
Question Is the change from -2 to -1 (or 1 to 2)
the same as the change from 0 to 1 or 2 to 3?
33
Create Indicators
Party Identification (-3 to 3)
Seven Variables Strong Republican (1yes) Weak
Republican (1yes) Lean Republican (1yes) Pure
Independent (1yes) Lean Democrat (1yes) Weak
Democrat (1yes) Strong Democrat (1yes)
34
Predict Obama Favorability (1-4)
Coef. SE T P
Strong Republican -1.632 0.161 -10.160 0.000
Weak Republican -0.707 0.198 -3.580 0.000
Lean Republican -1.235 0.181 -6.810 0.000
Lean Democrat 0.674 0.197 3.430 0.001
Weak Democrat 0.494 0.187 2.640 0.009
Strong Democrat 0.595 0.159 3.750 0.000
Constant 2.940 0.134 21.870 0.000
Excluded category Pure Independents
35
Obama Favorability
36
Predict Obama Favorability (1-4)
Coef. SE T P
Strong Republican -0.397 0.150 -2.650 0.008
Weak Republican 0.528 0.189 2.790 0.006
Pure Independent 1.235 0.181 6.810 0.000
Lean Democrat 1.909 0.188 10.150 0.000
Weak Democrat 1.729 0.179 9.680 0.000
Strong Democrat 1.831 0.148 12.360 0.000
Constant 1.705 0.122 14.010 0.000
New excluded category Leaning Republicans
37
DV Obama Favorability
Coef. SE T P
Strong Republican -1.652 0.161 -10.290 0.000
Weak Republican -0.704 0.197 -3.580 0.000
Lean Republican -1.229 0.181 -6.790 0.000
Lean Democrat 0.654 0.195 3.340 0.001
Weak Democrat 0.457 0.187 2.440 0.015
Strong Democrat 0.579 0.158 3.650 0.000
Gender (female1) 0.072 0.087 0.830 0.405
Age -0.041 0.019 -2.140 0.033
Age2 0.044 0.018 2.430 0.015
Constant 3.784 0.509 7.430 0.000
Predicted value for Pure Independent Male, age 20?
Remember! Always interpret these coefficients as
the estimated relationships holding other
variables in the model constant (or controlling
for the other variables)
38
Notes and Next Time
  • Homework due next Thursday (11/18)
  • Next homework handed out next Tuesday
  • Not due until Tuesday after Fall Break
  • Next time
  • Dealing with situations where you expect the
    relationship between an IV and a DV to depend on
    the value of another IV
Write a Comment
User Comments (0)
About PowerShow.com