Title: Bivariate Data Analysis I: Crosstabulation and Measures of Association
1Bivariate Data Analysis I Crosstabulation and
Measures of Association
- Chapter 12 (JRM)
- Chapters 9 (LeRoy)
- Exercises due Thursday, 4/9
2Types of Bivariate Relationships and Associated
Statistics
- Nominal/Ordinal and Nominal/Ordinal (including
dichotomous) - Crosstabulation (Lamda, Chi-Square Gamma, etc.)
- Interval and Dichotomous
- Difference of means test
- Interval and Nominal/Ordinal
- Analysis of Variance
- Interval and Interval
- Regression and correlation
3Assessing Relationships between Variables
- 1. Calculate appropriate statistic to measure the
magnitude of the relationship in the sample - 2. Calculate additional statistics to determine
if the relationship holds for the population of
interest (statistical significance) - Substantive significance vs. Statistical
significance
4What is a Crosstabulation?
- Crosstabulations are appropriate for examining
relationships between variables that are nominal,
ordinal, or dichotomous.
5What is a Crosstabulation?
- Example We would like to know if presidential
vote choice in 2000 was related to race. - Vote choice Gore or Bush
- Race White, Hispanic, Black
6Are Race and Vote Choice Related? Why?
7Are Race and Vote Choice Related? Why?
8Measures of Association for Crosstabulations
- Purpose to determine if nominal/ordinal
variables are related in a crosstabulation - At least one nominal variable
- Lamda
- Chi-Square
- Cramers V
- Two ordinal variables
- Tau
- Gamma
9Lamda
10Lamda Rule 1 (prediction based solely on
knowledge of marginal distribution of dependent
variable partisanship)
11Lamda Rule 2(prediction based on knowledge
provided by independent variable )
12Lamda Calculation of Errors
- Errors w/Rule 1 18 12 14 16 60
- Errors w/Rule 2 16 10 14 10 50
- Lamda (Errors R1 Errors R2)/Errors R1
- Lamda (60-50)/6010/60.17
13Lamda
- PRE measure
- Ranges from 0-1
- Potential problems with Lamda
- Underestimates relationship when variables (one
or both) are highly skewed - Always 0 when modal category of Y is the same
across all categories of X
14Chi Square (c2)
- Also appropriate for any crosstabulation with at
least one nominal variable (and another
nominal/ordinal variable) - Based on the difference between the empirically
observed crosstab and what we would expect to
observe if the two variables are statistically
independent
15Chi Square (c2)
16Calculating Expected Frequencies
- To calculate the expected cell frequency for NE
Republicans - E/30 30/100, therefore E(3030)/100 9
17Calculating the Chi-Square Statistic
- The chi-square statistic is calculated as
- ? (Obs. Frequencyik - Exp. Frequencyik)2 / Exp.
Frequencyik - (25/9)(16/6)(9/9)(16/6)(0)(0)(16/12)(16/8)
(25/9)16/6)(1/9)(0) 18
18Interpreting the Chi-Square Statistic
- The Chi-Square statistic ranges from 0 to
infinity - 0 perfect statistical independence
- Even though two variables may be statistically
independent in the population, in a sample the
Chi-Square statistic may be gt 0 - Therefore it is necessary to determine
statistical significance for a Chi-Square
statistic (given a certain level of confidence)
19Cramers V
- Problem with Chi-Square not comparable across
different sample sizes (and their associated
crosstab) - Cramers V is a standardization of the Chi-Square
statistic
20Calculating Cramers V
- V
- Where R rows and C columns
- V ranges from 0-1
- Example (region and partisanship)
- v.09 .30
21Relationships between Ordinal Variables
- There are several measures of association
appropriate for relationships between ordinal
variables - Gamma, Tau-b, Tau-c, Somers d
- All are based on identifying concordant,
discordant, and tied pairs of observations
22Concordant PairsIdeology and Voting
- Ideology - conserv (1), moderate (2), liberal (3)
- Voting - never (1), sometimes (2), often (3)
- Consider two hypothetical individuals in the
sample with scores - Individual A Ideology1, Voting1
- Individual B Ideology2, Voting2
- Pair AB are considered a concordant pair because
Bs ideology score is greater than As score, and
Bs voting score is greater than As score
23Concordant Pairs (contd)
- All of the following are concordant pairs
- A(1,1) B(2,2)
- A(1,1) B(2,3)
- A(1,1) B(3,2)
- A(1,2) B(2,3)
- A(2,2) B(3,3)
- Concordant pairs are consistent with a positive
relationship between the IV and the DV (ideology
and voting)
24Discordant Pairs
- All of the following are discordant pairs
- A(1,2) B(2,1)
- A(1,3) B(2,2)
- A(2,2) B(3,1)
- A(1,2) B(3,1)
- A(3,1) B(1,2)
- Discordant pairs are consistent with a negative
relationship between the IV and the DV (ideology
and voting)
25Identifying Concordant Pairs
- Concordant Pairs for Never - Conserv (1,1)
- Concordant 8070 8010 8020 8080
- 14,400
26Identifying Concordant Pairs
- Concordant Pairs for Never - Moderate (1,2)
- Concordant 1010 1080 900
27Identifying Discordant Pairs
- Discordant Pairs for Often - Conserv (1,3)
- Discordant 010 010 070 010 0
28Identifying Discordant Pairs
- Discordant Pairs for Often - Moderate (2,3)
- Discordant 2010 2010
29Gamma
- Gamma is calculated by identifying all possible
pairs of individuals in the sample and
determining if they are concordant or discordant - Gamma (C - D) / (C D)
30Interpreting Gamma
- Gamma 21400/24400 .88
- Gamma ranges from -1 to 1
- Gamma does not account for tied pairs
- Tau (b and c) and Somers d account for tied
pairs in different ways
31Square tables
Non-Square tables
32Example
- NES 2004 What explains variation in ones
political Ideology? - Income?
- Education?
- Religion?
- Race?
33Bivariate Relationships and Hypothesis Testing
(Significance Testing)
- 1. Determine the null and alternative hypotheses
- Null There is no relationship between X and Y (X
and Y are statistically independent and test
statistic 0). - Alternative There IS a relationship between X
and Y (test statistic does not equal 0).
34Bivariate Relationships and Hypothesis Testing
- 2. Determine Appropriate Test Statistic (based on
measurement levels of X and Y) - 3. Identify the type of sampling distribution for
test statistic, and what it would look like if
the null hypothesis were true.
35Bivariate Relationships and Hypothesis Testing
- 4. Calculate the test statistic from the sample
data and determine the probability of observing a
test statistic this large (in absolute terms) if
the null hypothesis is true. - P-value (significance level) probability of
observing a test statistic at least as large as
our observed test statistic, if in fact the null
hypothesis is true
36Bivariate Relationships and Hypothesis Testing
- 5. Choose an alpha level a decision rule to
guide us in determining which values of the
p-value lead us to reject/not reject the null
hypothesis - When the p-value is extremely small, we reject
the null hypothesis (why?). The relationship is
deemed statistically significant, - When the p-value is not small, we do not reject
the null hypothesis (why?). The relationship is
deemed statistically insignificant. - Most common alpha level .05
37Bottom Line
- Assuming we will always use an alpha level of
.05 - Reject the null hypothesis if P-valuelt.05
- Do not reject the null hypothesis if P-valuegt.05
38An Example
- Dependent variable Vote Choice in 2000
- (Gore, Bush, Nader)
- Independent variable Ideology
- (liberal, moderate, conservative)
39An Example
- 1. Determine the null and alternative hypotheses.
40An Example
- Null Hypothesis There is no relationship between
ideology and vote choice in 2000. - Alternative (Research) Hypothesis There is a
relationship between ideology and vote choice
(liberals were more likely to vote for Gore,
while conservatives were more likely to vote for
Bush).
41An Example
- 2. Determine Appropriate Test Statistic (based on
measurement levels of X and Y) - 3. Identify the type of sampling distribution for
test statistic, and what it would look like if
the null hypothesis were true.
42Sampling Distributions for the Chi-Squared
Statistic(under assumption of perfect
independence)df (rows-1)(columns-1)
43Bivariate Relationships and Hypothesis Testing
- 4. Calculate the test statistic from the sample
data and determine the probability of observing a
test statistic this large (in absolute terms) if
the null hypothesis is true. - P-value (significance level) probability of
observing a test statistic at least as large as
our observed test statistic, if in fact the null
hypothesis is true
44Bivariate Relationships and Hypothesis Testing
- 5. Choose an alpha level a decision rule to
guide us in determining which values of the
p-value lead us to reject/not reject the null
hypothesis - When the p-value is extremely small, we reject
the null hypothesis (why?). The relationship is
deemed statistically significant, - When the p-value is not small, we do not reject
the null hypothesis (why?). The relationship is
deemed statistically insignificant. - Most common alpha level .05
45In-Class Exercise
- For some years now, political commentators have
cited the importance of a gender gap in
explaining election outcomes. What is the source
of the gender gap? - Develop a simple theory and corresponding
hypothesis (where gender is the independent
variable) which seeks to explain the source of
the gender gap. - Specifically, determine
- Theory
- Null and research hypothesis
- Test statistic for a cross-tabulation to test
your hypothesis