Bivariate Data: Covariance, Correlation and Contingency Tables'

1 / 29
About This Presentation
Title:

Bivariate Data: Covariance, Correlation and Contingency Tables'

Description:

Covariance and correlation apply only when there is a set of ... e.g., probability of a female's being Republican. 2/5 = .4. 1. 3. 6. 2. F M. R. D. Terminology ... –

Number of Views:955
Avg rating:3.0/5.0
Slides: 30
Provided by: SocialSc2
Category:

less

Transcript and Presenter's Notes

Title: Bivariate Data: Covariance, Correlation and Contingency Tables'


1
Chapter 5
  • Bivariate Data Covariance, Correlation and
    Contingency Tables.

2
Covariance and Correlation
  • Notice that the two variables in our example are
    crucially related by individuals (aka
    observations)
  • Covariance and correlation apply only when there
    is a set of individuals each of whom has been
    measured twice.
  • It makes no sense to talk of the correlation
    between two different samples that are not
    linked together.
  • Causation versus correlation

3
  • The covariance is defined as the first
    cross-product moment m11 for a bivariate data set
    (x1, y1), , (xn, yn)
  • Just as m2(x) is called the variance of x,
    m11(x,y) is called the covariance of x and y.
  • Notice, incidentally, that m11(x,x) m2(x).

4
  • The correlation coefficient is then defined as
  • Notice that the factor 1/n has dropped out of rxy.

5
  • Importantly, notice that

6
  • Lets look at three simple examples on Eviews
  • Scatterplots
  • Covariance matrices
  • Correlation matrices

7
  • Here is how I want you to calculate the
    covariance, m11, for a given data set
  • (x1, y1), , (xn, yn)
  • Determine the means of x and y.
  • For each i, determine (xi ) and (yi )
  • For each i, determine the product pi (xi
    )(yi )
  • Add up all the pi p1 p2 pn Total
  • Divide by n m11 Total/n

8
  • ¼(2367) ¼(18) 4.5
  • ¼(0.2.4.4) ¼(1) .25
  • m11 ¼(2-4.5)(0-.25) (3-4.5)(.2-.25)
    (6-4.5)(.4-.25) (7-4.5)(.4-.25)

¼(-2.5)(-.25) (-1.5)(-.05) (1.5)(.15)
(2.5)(.15) ¼(.625) (.075) (.225)
(.375) ¼1.3 .325
9
  • Here is how I want you to calculate the
    correlation, r, for a given data set
  • (x1, y1), , (xn, yn)
  • Determine m11.
  • Determine both and
  • Multiply
  • Divide m11/

10
  • 4.5, .25, m11 .325
    ¼(0-.25)2 (.2-.25)2 (.4-.25)2
    (.4-.25)21/2
  • ¼(-.25)2 (-.05)2 (.15)2 (.15)21/2

¼.0625 .0025 .0225 .02251/2
¼.111/2 .02751/2 .1658
2.06
11
  • 4.5, .25, m11 .325
    .1658 2.06
  • .1658(2.06) .3415

r m11/ .325/.3415 .95
12
  • Then with these sorts of examples (or homeworks!)
    we plug the data into Eviews and check our
    answers.
  • The correlations come out right
  • Now check the standard deviations
  • Whats going on?
  • How do we fix this?

13
  • Lets take a brief peek at some real data.

14
  • What is the relationship between correlation and
    covariance of x and y?
  • Expressed in terms of moments, the correlation is
    the covariance divided by the product of the
    standard deviations of x and y
  • Expressed in terms of standardized scores, the
    correlation between x and y is the covariance of
    the standardizations of x and y

15
  • rxy is invariant over linear changes of
    measurement scales
  • I.e., if x a bx, and y c dy, then rxy
    rxy.
  • Also requires b, d gt 0, if we only know b, d ? 0,
    then we only have rxy rxy.
  • rxy is not always invariant over nonlinear
    changes of scale
  • E.g., if x x2, then (in all normal
    circumstances), rxy ? rxy
  • m11 is not invariant over linear changes of
    scale
  • I.e., if x a bx, and y c dy, then

16
The Correlation Coefficient
  • the standardized variables are invariant to
    changes in origin and scale of the original
    variables. With standardized variables we can be
    sure that any measure of shape that we derive
    will represent the shape of the scatter plot and
    will not be an artifact of the choice of units of
    measurement. (Ramsey, p. 125 cf. p. 135)

17
  • Roughly and intuitively, r measures how closely
    changes in one variable are tracked by changes in
    another variable
  • More precisely, r measures the degree to which
    the y-data is a linear function of the x-data
    i.e., the degree to which yi a bxi, for some
    a and b.
  • And vice-versa
  • Moreover, r2 gives the exact percentage of the
    amount of the (variation in the) y-data that can
    be captured by, explained by, or projected
    onto the (variation in the) x-data.

18
  • If y is a perfect function of x, i.e.,
  • yi a bxi
  • then rxy ?1.
  • If y is not a perfect function of x, i.e., if the
    best fitting value of a and b do not capture y
    exactly
  • yi a bxi ei
  • then rxy lt 1.
  • So while the covariance can be any number at all,
    r always lies between 1 and 1.

19
Bivariate Categorical Data
20
Terminology
  • Joint probability The probability of being in
    two specific categories (one for each variable)
  • e.g., being a Republican Female
  • 2/12 .16

F M
R D
21
Terminology
  • Marginal probability The probability of being in
    one specific category (of one of the variables)
  • e.g., being a female
  • 5/12 .41

F M
R D
22
Terminology
  • Conditional probability The probability of being
    in one specific category (of one variable), given
    a particular value for the other variable)
  • e.g., probability of a females being Republican
  • 2/5 .4

F M
R D
23
Terminology
  • Correlation For a 2 2 contingency table (where
    row/colum values are either 0 or 1, r reduces to
    the ? coefficient
  • E.g., correlation between being female and being
    republican

F M
R D
24
(No Transcript)
25
  • What proportion of the sample is made up of
    religious persons?
  • What proportion of the sample is made up of very
    happy persons?
  • What proportion of the sample is made up of very
    happy religious persons?
  • What proportion of very happy people are
    religious?
  • What proportion of religious people are very
    happy?
  • How strong is the association between being very
    happy and religious?

26
  • RELLIFE Please tell me whether you strongly
    agree, agree, disagree, or strongly disagree with
    the following statements I try hard to carry my
    religious beliefs over into all my other dealings
    in life.
  • RELEXP Did you ever have a religious or
    spiritual experience that changed your life?
  • EVOLVED Human beings, as we know them today,
    developed from earlier species of animals. (Is
    that true or false?)

27
Evolved True False
Religious Experience True False
28
Evolved True False
Religious Person Very/Moderate
Slight/Not
29
r .42
Write a Comment
User Comments (0)
About PowerShow.com