Title: Correlations
1Correlations
- Renan Levine
- POL 242
- July 12, 2006
2Association
3Today Correlations
- Correlation is a measure of a relationship
between variables. Measured with a coefficient
Pearsons r that ranges from -1 to 1. - Measure strength of relationship of interval or
ratio variables - r S(Zx Zy)/n 1
- ZxZ scores for X variable and Z scores for Y
variable. Sum the products and divide by number
of paired cases minus one. - How to calculate Z scores can be found on-line.
4Correlation r
- Absolute values closer to 0 indicate that there
is little or no linear relationship. - Generally, 0.2-0.4 is weak, 0.4-0.6 is okay, 0.6
or higher is strong. - If correlation is very high, then its probably
something related that you might considering
indexing or choosing just one variable. - The closer the coefficient is to the absolute
value of 1 the stronger the relationship between
the variables being correlated.
5Positive Relationship
- If two variables are related positively or
directly - r gt 0
- Variables track together high values on
Variable X are associated with high values on
Variable Y. - Low values on X associated with low values.
6Example
Robert D. Putnam Robert Leonardi Raffaella Y.
Nanetti Franco Pavoncello. Explaining
Institutional Success The Case of Italian
Regional Government. The American Political
Science Review 771 (Mar. 1983), pp. 55-74
More fun examples http//www.nationmaster.com/cor
relations/eco_gdp-economy-gdp-nominal
7Example II
r 0.84
8Negative or Inverse Relationship
- Variables can be inversely or negatively related
- High values of X are associated with low values
of Y.
9Example Negative / Inverse
r -0.68
Time/SRBI Oct 3-6, 08
red Republicans, blueDemocrats, grey
diamondsIndependents
10Data
- You need interval-level data.
- You will find many interval-level variables in
- Countries / World
- Provinces
- Election studies (feeling thermometers, odds of
party entering government, etc) - You can often use the index you created as an
interval-level variable.
11Compare
Lots more noise here. Typical of public opinion
data.
Most points close to a line.
12Differences between Public Opinion and Aggregate
Data
- Although it is not uncommon to have one/some
outliers in aggregate data, public opinion data
tends to be noisy. - Feeling thermometer example
- Many respondents gave both candidates a 50
- Quite a few respondents liked both candidates
- Even though most who liked McCain disliked Obama
- A high Pearsons r for public opinion data may be
low for an association in aggregate data.
13Guidelines for Public Opinion Data
14Rough Guidelines for Aggregate Data
15Very Strong or Worrisome??
- Public Opinion above 0.40
- Aggregate above 0.80
- But these are just guidelines. It depends on how
good the data is - Lots of variation in data
- Large scale (10, 20, 100 pts like prediction
odds, physicians per 100,000 people, feeling
thermometer scales) - Number of observations (N)
- Provinces dataset is small
16Outstanding or the same?
- You either have an outstanding relationship OR
the variables may be measuring the same idea. - Ex. unemployment and GDP both measure economic
health - Ex. Feeling thermometer Barack Obama and feeling
thermometer for Joe Biden both measure attitudes
towards the Democratic ticket - Also inverse relationship
- Example above Obama and McCain feeling
thermometers different sides of the same coin,
as both seem to measure partisanship.
17Use Yo Brain
- Computer cannot tell you if its a good, strong
relationship or two measures looking at the same
thing. - Need to understand what each variable is
measuring - Same thought process about the index creation.
- Use your knowledge of world and theory to decide
whether two variables measure the same thing or
two different things. - Example (above) Putnams relationship between
civic culture and government performance. - Failed states survey - appears that the higher
an indicator value, the worse off the country in
that particular field. - http//www.fundforpeace.org/web/index.php?optionc
om_contenttaskviewid99Itemid140
18Flip side
- Relationship you expect is strong is surprisingly
not ?!?!? - Make certain both variables are interval
- Double check that you cleaned up data
- Missing values are missing
- Next week there may be the need to qualify the
relationship as some sub-group of the data is not
like the others and those need to be identified. - Think about relationship maybe its not linear,
so that relationship is only present for part of
range.
19Usefulness
- Quick, easy way to look at several variables to
see if they are related. - With strong association, you can begin to think
about predicting values of Y based on a value of
X. - Ex. Positive correlation you know a high value
of X is associated with a high value of Y!
20Webstats Output
                     - - Correlation
Coefficients - -            Q375A1    Q305Â
    Q375A3    Q1005Q375A1      1.0000    Â
.2916Â Â Â Â Â .5320Â Â Â Â -.3163Â Â Â Â Â Â Â Â Â Â Â (Â
686)Â Â Â (Â 666)Â Â Â (Â 667)Â Â Â (Â 672)Â Â Â Â Â Â Â Â Â Â Â
P .      P .000   P .000   P .000Q305Â
      .2916    1.0000     .2679   Â
-.1272Â Â Â Â Â Â Â Â Â Â Â (Â 666)Â Â Â ( 2776)Â Â Â (Â
660)   ( 2721)           P .000   P .     Â
P .000Â Â Â P .000Q375A3Â Â Â Â Â Â .5320Â Â Â Â
.2679Â Â Â Â 1.0000Â Â Â Â -.2020Â Â Â Â Â Â Â Â Â Â Â (Â
667)Â Â Â (Â 660)Â Â Â (Â 682)Â Â Â (Â 666)Â Â Â Â Â Â Â Â Â Â Â
P .000   P .000   P .      P .000Q1005
      -.3163    -.1272    -.2020  Â
 1.0000           ( 672)   ( 2721)   (Â
666)Â Â Â ( 3181)Â Â Â Â Â Â Â Â Â Â Â P .000Â Â Â P .000Â Â Â
P .000   P . Â
N
Coefficients (Pearsons r)
21Significance?
- Webstats will tell you whether or not the
correlation coefficient is significant. - Remember that this is just telling you whether
the relationship may be due to chance. - Not the strength of the relationship
- Almost unheard of to have a strong relationship
that is insignificant when using survey data. - So, dont spend any time discussing significance.
22What if non-interval/non-ratio?
- Usually more appropriate to use the other
measures of association. - Webstats will perform a correlation. Be ready for
results to be less strong - Program may report (instead of Pearsons r)
- Spearman ordinal x ordinal
- Point-biserial one interval/ratio, one
dichotomous - Phi two dichotomous variables
- All interpreted the same way