Bivariate Data Analysis I: Crosstabulation and Measures of Association - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Bivariate Data Analysis I: Crosstabulation and Measures of Association

Description:

Example: We would like to know if presidential vote choice in 2000 was related to race. ... Are Race and Vote Choice Related? Why? Measures of Association for ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 46
Provided by: rfor6
Category:

less

Transcript and Presenter's Notes

Title: Bivariate Data Analysis I: Crosstabulation and Measures of Association


1
Bivariate Data Analysis I Crosstabulation and
Measures of Association
  • Chapter 12 (JRM)
  • Chapters 9 (LeRoy)
  • Exercises due Thursday, 4/9

2
Types of Bivariate Relationships and Associated
Statistics
  • Nominal/Ordinal and Nominal/Ordinal (including
    dichotomous)
  • Crosstabulation (Lamda, Chi-Square Gamma, etc.)
  • Interval and Dichotomous
  • Difference of means test
  • Interval and Nominal/Ordinal
  • Analysis of Variance
  • Interval and Interval
  • Regression and correlation

3
Assessing Relationships between Variables
  • 1. Calculate appropriate statistic to measure the
    magnitude of the relationship in the sample
  • 2. Calculate additional statistics to determine
    if the relationship holds for the population of
    interest (statistical significance)
  • Substantive significance vs. Statistical
    significance

4
What is a Crosstabulation?
  • Crosstabulations are appropriate for examining
    relationships between variables that are nominal,
    ordinal, or dichotomous.

5
What is a Crosstabulation?
  • Example We would like to know if presidential
    vote choice in 2000 was related to race.
  • Vote choice Gore or Bush
  • Race White, Hispanic, Black

6
Are Race and Vote Choice Related? Why?
7
Are Race and Vote Choice Related? Why?
8
Measures of Association for Crosstabulations
  • Purpose to determine if nominal/ordinal
    variables are related in a crosstabulation
  • At least one nominal variable
  • Lamda
  • Chi-Square
  • Cramers V
  • Two ordinal variables
  • Tau
  • Gamma

9
Lamda
10
Lamda Rule 1 (prediction based solely on
knowledge of marginal distribution of dependent
variable partisanship)
11
Lamda Rule 2(prediction based on knowledge
provided by independent variable )
12
Lamda Calculation of Errors
  • Errors w/Rule 1 18 12 14 16 60
  • Errors w/Rule 2 16 10 14 10 50
  • Lamda (Errors R1 Errors R2)/Errors R1
  • Lamda (60-50)/6010/60.17

13
Lamda
  • PRE measure
  • Ranges from 0-1
  • Potential problems with Lamda
  • Underestimates relationship when variables (one
    or both) are highly skewed
  • Always 0 when modal category of Y is the same
    across all categories of X

14
Chi Square (c2)
  • Also appropriate for any crosstabulation with at
    least one nominal variable (and another
    nominal/ordinal variable)
  • Based on the difference between the empirically
    observed crosstab and what we would expect to
    observe if the two variables are statistically
    independent

15
Chi Square (c2)
16
Calculating Expected Frequencies
  • To calculate the expected cell frequency for NE
    Republicans
  • E/30 30/100, therefore E(3030)/100 9

17
Calculating the Chi-Square Statistic
  • The chi-square statistic is calculated as
  • ? (Obs. Frequencyik - Exp. Frequencyik)2 / Exp.
    Frequencyik
  • (25/9)(16/6)(9/9)(16/6)(0)(0)(16/12)(16/8)
    (25/9)16/6)(1/9)(0) 18

18
Interpreting the Chi-Square Statistic
  • The Chi-Square statistic ranges from 0 to
    infinity
  • 0 perfect statistical independence
  • Even though two variables may be statistically
    independent in the population, in a sample the
    Chi-Square statistic may be gt 0
  • Therefore it is necessary to determine
    statistical significance for a Chi-Square
    statistic (given a certain level of confidence)

19
Cramers V
  • Problem with Chi-Square not comparable across
    different sample sizes (and their associated
    crosstab)
  • Cramers V is a standardization of the Chi-Square
    statistic

20
Calculating Cramers V
  • V
  • Where R rows and C columns
  • V ranges from 0-1
  • Example (region and partisanship)
  • v.09 .30

21
Relationships between Ordinal Variables
  • There are several measures of association
    appropriate for relationships between ordinal
    variables
  • Gamma, Tau-b, Tau-c, Somers d
  • All are based on identifying concordant,
    discordant, and tied pairs of observations

22
Concordant PairsIdeology and Voting
  • Ideology - conserv (1), moderate (2), liberal (3)
  • Voting - never (1), sometimes (2), often (3)
  • Consider two hypothetical individuals in the
    sample with scores
  • Individual A Ideology1, Voting1
  • Individual B Ideology2, Voting2
  • Pair AB are considered a concordant pair because
    Bs ideology score is greater than As score, and
    Bs voting score is greater than As score

23
Concordant Pairs (contd)
  • All of the following are concordant pairs
  • A(1,1) B(2,2)
  • A(1,1) B(2,3)
  • A(1,1) B(3,2)
  • A(1,2) B(2,3)
  • A(2,2) B(3,3)
  • Concordant pairs are consistent with a positive
    relationship between the IV and the DV (ideology
    and voting)

24
Discordant Pairs
  • All of the following are discordant pairs
  • A(1,2) B(2,1)
  • A(1,3) B(2,2)
  • A(2,2) B(3,1)
  • A(1,2) B(3,1)
  • A(3,1) B(1,2)
  • Discordant pairs are consistent with a negative
    relationship between the IV and the DV (ideology
    and voting)

25
Identifying Concordant Pairs
  • Concordant Pairs for Never - Conserv (1,1)
  • Concordant 8070 8010 8020 8080
  • 14,400

26
Identifying Concordant Pairs
  • Concordant Pairs for Never - Moderate (1,2)
  • Concordant 1010 1080 900

27
Identifying Discordant Pairs
  • Discordant Pairs for Often - Conserv (1,3)
  • Discordant 010 010 070 010 0

28
Identifying Discordant Pairs
  • Discordant Pairs for Often - Moderate (2,3)
  • Discordant 2010 2010

29
Gamma
  • Gamma is calculated by identifying all possible
    pairs of individuals in the sample and
    determining if they are concordant or discordant
  • Gamma (C - D) / (C D)

30
Interpreting Gamma
  • Gamma 21400/24400 .88
  • Gamma ranges from -1 to 1
  • Gamma does not account for tied pairs
  • Tau (b and c) and Somers d account for tied
    pairs in different ways

31
Square tables
Non-Square tables
32
Example
  • NES 2004 What explains variation in ones
    political Ideology?
  • Income?
  • Education?
  • Religion?
  • Race?

33
Bivariate Relationships and Hypothesis Testing
(Significance Testing)
  • 1. Determine the null and alternative hypotheses
  • Null There is no relationship between X and Y (X
    and Y are statistically independent and test
    statistic 0).
  • Alternative There IS a relationship between X
    and Y (test statistic does not equal 0).

34
Bivariate Relationships and Hypothesis Testing
  • 2. Determine Appropriate Test Statistic (based on
    measurement levels of X and Y)
  • 3. Identify the type of sampling distribution for
    test statistic, and what it would look like if
    the null hypothesis were true.

35
Bivariate Relationships and Hypothesis Testing
  • 4. Calculate the test statistic from the sample
    data and determine the probability of observing a
    test statistic this large (in absolute terms) if
    the null hypothesis is true.
  • P-value (significance level) probability of
    observing a test statistic at least as large as
    our observed test statistic, if in fact the null
    hypothesis is true

36
Bivariate Relationships and Hypothesis Testing
  • 5. Choose an alpha level a decision rule to
    guide us in determining which values of the
    p-value lead us to reject/not reject the null
    hypothesis
  • When the p-value is extremely small, we reject
    the null hypothesis (why?). The relationship is
    deemed statistically significant,
  • When the p-value is not small, we do not reject
    the null hypothesis (why?). The relationship is
    deemed statistically insignificant.
  • Most common alpha level .05

37
Bottom Line
  • Assuming we will always use an alpha level of
    .05
  • Reject the null hypothesis if P-valuelt.05
  • Do not reject the null hypothesis if P-valuegt.05

38
An Example
  • Dependent variable Vote Choice in 2000
  • (Gore, Bush, Nader)
  • Independent variable Ideology
  • (liberal, moderate, conservative)

39
An Example
  • 1. Determine the null and alternative hypotheses.

40
An Example
  • Null Hypothesis There is no relationship between
    ideology and vote choice in 2000.
  • Alternative (Research) Hypothesis There is a
    relationship between ideology and vote choice
    (liberals were more likely to vote for Gore,
    while conservatives were more likely to vote for
    Bush).

41
An Example
  • 2. Determine Appropriate Test Statistic (based on
    measurement levels of X and Y)
  • 3. Identify the type of sampling distribution for
    test statistic, and what it would look like if
    the null hypothesis were true.

42
Sampling Distributions for the Chi-Squared
Statistic(under assumption of perfect
independence)df (rows-1)(columns-1)
43
Bivariate Relationships and Hypothesis Testing
  • 4. Calculate the test statistic from the sample
    data and determine the probability of observing a
    test statistic this large (in absolute terms) if
    the null hypothesis is true.
  • P-value (significance level) probability of
    observing a test statistic at least as large as
    our observed test statistic, if in fact the null
    hypothesis is true

44
Bivariate Relationships and Hypothesis Testing
  • 5. Choose an alpha level a decision rule to
    guide us in determining which values of the
    p-value lead us to reject/not reject the null
    hypothesis
  • When the p-value is extremely small, we reject
    the null hypothesis (why?). The relationship is
    deemed statistically significant,
  • When the p-value is not small, we do not reject
    the null hypothesis (why?). The relationship is
    deemed statistically insignificant.
  • Most common alpha level .05

45
In-Class Exercise
  • For some years now, political commentators have
    cited the importance of a gender gap in
    explaining election outcomes. What is the source
    of the gender gap?
  • Develop a simple theory and corresponding
    hypothesis (where gender is the independent
    variable) which seeks to explain the source of
    the gender gap.
  • Specifically, determine
  • Theory
  • Null and research hypothesis
  • Test statistic for a cross-tabulation to test
    your hypothesis
Write a Comment
User Comments (0)
About PowerShow.com