Cross Tabs and Chi-Squared - PowerPoint PPT Presentation

About This Presentation
Title:

Cross Tabs and Chi-Squared

Description:

Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables Cross Tabs and Chi-Squared The test you choose depends on level of measurement ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 34
Provided by: JamesD172
Learn more at: https://www.sjsu.edu
Category:
Tags: chi | cross | squared | table | tabs

less

Transcript and Presenter's Notes

Title: Cross Tabs and Chi-Squared


1
Cross Tabs and Chi-Squared
  • Testing for a Relationship Between
    Nominal/Ordinal Variables

2
Cross Tabs and Chi-Squared
  • The test you choose depends on level of
    measurement
  • Independent Dependent Statistical Test
  • Dichotomous Interval-ratio Independent Samples
    t-test
  • Dichotomous
  • Nominal Nominal Cross Tabs
  • Dichotomous Dichotomous
  • Nominal Interval-ratio ANOVA
  • Dichotomous Dichotomous
  • Interval-ratio Interval-ratio Correlation and
    OLS Regression
  • Dichotomous

3
Cross Tabs and Chi-Squared
  • We are asking whether there is a relationship
    between two nominal (or ordinal) variablesthis
    includes dichotomous variables.
  • (Even though one may use cross tabs for ordinal
    variables, it is generally better to treat them
    as interval variables and use more powerful
    statistical techniques whenever you can.)

4
Cross Tabs and Chi-Squared
  • Cross tabs and Chi-Squared will tell you whether
    classification on one nominal or ordinal variable
    is related to classification on a second nominal
    or ordinal variable.
  • For Example
  • Are rural Americans more likely to vote
    Republican in presidential races than urban
    Americans?
  • Classification of Region Party Vote
  • Are white people more likely to drive SUVs than
    blacks or Hispanics?
  • Race Type of Vehicle

5
Cross Tabs and Chi-Squared
  • The statistical focus will be on the number of
    people in a sample who are classified in
    patterned ways on two variables.
  • Why?
  • Means and standard deviations are meaningless for
    nominal variables.

6
Cross Tabs and Chi-Squared
  • The procedure starts with a cross
    classification of the cases in categories of
    each variable.
  • Example
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650

7
Cross Tabs and Chi-Squared
  • In the example, I can see that the campus is
    divided on the issue.
  • But are there associations between sex and
    attitudes?
  • Example
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650

8
Cross Tabs and Chi-Squared
  • But are there associations between sex and
    attitudes?
  • An easy way to get more information is to convert
    the frequencies to percentages.
  • Example
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 (41) 200 (44) 65 (14) 450 (99)
  • Male 80 (40) 65 (33) 55 (28) 200 (101)
  • Total 265 (41) 265 (41) 120 (18) 650 (100)
  • percentages do not add to 100 due to rounding

9
Cross Tabs and Chi-Squared
  • We can see that in the sample men are less likely
    to oppose football, but no more likely to say
    yes than womenmen are more likely to say
    maybe
  • Example
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 (41) 200 (44) 65 (14) 450 (99)
  • Male 80 (40) 65 (33) 55 (28) 200 (101)
  • Total 265 (41) 265 (41) 120 (18) 650 (100)
  • percentages do not add to 100 due to rounding

10
Cross Tabs and Chi-Squared
  • Data on male and female support for SJSU football
    from 650 students put into a matrix
  • Yes No Maybe Total
  • Female 185 (41) 200 (44) 65 (14) 450 (99)
  • Male 80 (40) 65 (33) 55 (28) 200 (101)
  • Total 265 (41) 265 (41) 120 (18) 650 (100)
  • percentages do not add to 100 due to rounding
  • Using percentages to describe relationships is
    valid statistical analysis These are
    descriptive statistics! However, they are not
    inferential statistics.
  • What can we say about the population?
  • Could we have gotten sample statistics like these
    from a population where there is no association
    between sex and attitudes about starting
    football?
  • This is where the Chi-Squared Test of
    Independence comes in handy.

11
Cross Tabs and Chi-Squared
  • The whole idea behind the Chi-Squared test of
    independence is to determine whether the patterns
    of frequencies in your cross classification table
    could have occurred by chance, or whether they
    represent systematic assignment to particular
    cells.
  • For example, were women more likely to answer
    no than men or could the deviation in responses
    by sex have occurred because of random sampling
    or chance alone?

12
Cross Tabs and Chi-Squared
  • A number called Chi-Squared, ?2, tells us whether
    the numbers in our sample deviate from what would
    be expected by chance.
  • Its formula
  • fo observed frequency in each cell fe expected
    frequency in each cell
  • A bigger ?2 will result as our sample data
    deviates more and more from what would be
    expected by chance.
  • A big ?2 will imply that there is a relationship
    between our two nominal variables.

?2 ? ((fo - fe)2 / fe)
13
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Calculating ?2 begins with the concept of a
    deviation of observed data from what is expected
    by chance alone.
  • Deviation in ?2 Observed frequency Expected
    frequency
  • Observed frequency is just the number of cases in
    each cell of the cross classification table. For
    example, 185 women said yes, they support
    football at SJSU. 185 is the observed frequency.
  • Expected frequency is the number of cases that
    would be in a cell of the cross classification
    table if people in each group of one variable had
    a propensity to answer the same as each other on
    the second variable.

14
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • Expected frequency (if our variables were
    unrelated)
  • Since females comprise 69.2 of the sample, wed
    expect 69.2 of the Yes answers to come from
    females, 69.2 of the No answers to come from
    females, and 69.2 of the Maybe answers to come
    from females. On the other hand, 30.8 of the
    Yes, No, and Maybe answers should come from
    Men.
  • Therefore, to calculate expected frequency for
    each cell you do this
  • fe cells row total / table total cells
    column total or
  • fe cells column total / table total cells
    row total
  • The idea is that you find the percent of persons
    in one category on the first variable, and
    expect to find that percent of those people in
    the other variables categories.

15
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • Now you know how to calculate the expected
    frequency (and the observed frequency is
    obvious).
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • You already saw how to calculate the deviations
    too.
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1

16
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1
  • Now, we want to add up the deviations
  • What would happen if we added these deviations
    together?
  • To get rid of negative deviations, we square each
    one (like in computing standard deviations).

17
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5
  • D2 200 183.5 16.5 D5 65 81.5 -16.5
  • D3 65 83.1 -18.1 D4 55 36.9
    18.1
  • To get rid of negative deviations, we square each
    one (like in standard deviations).
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61

18
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Squared Deviations
  • (D1)2 (1.5)2 2.25 (D4)2 (-1.5)2
    2.25
  • (D2)2 (16.5)2 272.25 (D5)2 (-16.5)2
    272.25
  • (D3)2 (-18.1)2 327.61 (D6)2 (18.1)2
    327.61
  • Just how large is each of these squared
    deviations?
  • The next step is to give the deviations a
    metric. The deviations are compared relative
    to the what was expected. In other words, we
    divide by what was expected.
  • Youve already calculated what was expected in
    each cell
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • Relative Deviations-squaredSmall values indicate
    little deviation from what was expected, while
    larger values indicate much deviation from what
    was expected
  • (D1)2 / fe1 2.25 / 183.5 0.012 (D4)2 /
    fe4 2.25 / 81.5 0.028
  • (D2)2 / fe2 272.25 / 183.5 1.484 (D5)2 / fe5
    272.25 / 81.5 3.340
  • (D3)2 / fe3 327.61 / 83.1 3.942 (D6)2 /
    fe6 327.61 / 36.9 8.878

19
Cross Tabs and Chi-Squared
?2 ? ((fo - fe)2 / fe)
  • Relative Deviations-squaredSmall values indicate
    little deviation from what was expected, while
    larger values indicate much deviation from what
    was expected
  • (D1)2 / fe1 2.25 / 183.5 0.012 (D4)2 /
    fe4 2.25 / 81.5 0.028
  • (D2)2 / fe2 272.25 / 183.5 1.484 (D5)2 / fe5
    272.25 / 81.5 3.340
  • (D3)2 / fe3 327.61 / 83.1 3.942 (D6)2 /
    fe6 327.61 / 36.9 8.878
  • The next step will be to see what the total
    relative deviations-squared are
  • Sum of
  • Relative Deviations-squared 0.012 1.484
    3.942 0.028 3.340 8.878 17.684
  • This number is also what we call Chi-Squared or
    ?2.
  • So
  • Of what good is knowing this number?

?2 ? ((fo - fe)2 / fe)
20
Cross Tabs and Chi-Squared
  • This value, ?2, would form an identifiable shape
    in repeated sampling if the two variables were
    unrelated to each other.
  • That shape depends only on the number of rows and
    columns. We technically refer to this as the
    degrees of freedom.
  • For ?2, df (rows 1)(columns 1)

21
Cross Tabs and Chi-Squared
  • For ?2, df (rows 1)(columns 1)
  • ?2 distributions

df 5
FYI This should remind you of the normal
distribution, except that, it changes shape
depending on the nature of your variables.
df 10
df 20
df 1
1 5 10 20
22
Cross Tabs and Chi-Squared
Think of the Power!!!!
  • We can use the known properties of the ?2
    distribution to identify the probability that we
    would get our samples ?2 if our variables were
    unrelated!
  • This is exciting!

23
Cross Tabs and Chi-Squared
  • If our ?2 in a particular analysis were under the
    shaded area or beyond, what could we say about
    the population given our sample?

5 of ?2 values
24
Cross Tabs and Chi-Squared
  • Answer Wed reject the null, saying that it is
    highly unlikely that we could get such a large
    chi-squared value from a population where the two
    variables are unrelated.

5 of ?2 values
25
Cross Tabs and Chi-Squared
  • So, what is the critical ?2 value?

5 of ?2 values
26
Cross Tabs and Chi-Squared
  • That depends on the particular problem because
    the distribution changes depending on the number
    of rows and columns.

df 5
df 10
df 20
df 1
1 5 10 20
Critical ?2 s
27
Cross Tabs and Chi-Squared
  • According to Table C, df 1, critical ?2
    3.84
  • with ?-level .05, if df 5, critical ?2
    11.07
  • df 10, critical ?2 18.31
  • df 20, critical ?2 31.41

df 5
df 10
df 20
df 1
1 5 10 20
28
Cross Tabs and Chi-Squared
  • In our football problem above, we had a
    chi-squared of 17.68 in a cross classification
    table with 2 rows and 3 columns.
  • Our chi-squared distribution for that table would
    have
  • df (2 1) (3 1) 2.
  • According to Table C, with ?-level .05,
    Critical Chi-Squared is 5.99.
  • Since 17.68 gt 5.99, we reject the null.
  • We reject that our sample could have come from a
    population where sex was not related to attitudes
    toward football.

29
Cross Tabs and Chi-Squared
  • Now lets get formal
  • 7 steps to Chi-squared test of independence
  • Set ?-level (e.g., .05)
  • Find Critical ?2 (depends on df and ?-level)
  • The null and alternative hypotheses
  • Ho The two nominal variables are independent
  • Ha The two variables are dependent on each
    other
  • Collect Data
  • Calculate ?2 ?2 ? ((fo - fe)2 / fe)
  • Make decision about the null hypothesis
  • Report the P-value

30
Cross Tabs and Chi-Squared
  • Afterwards, what have you found?
  • If Chi-Squared is not significant, your variables
    are unrelated.
  • If Chi-Squared is significant, your variables are
    related.
  • Thats All!
  • Chi-Squared cannot tell you anything like the
    strength or direction of association. For purely
    nominal variables, there is no direction of
    association.
  • Chi-Squared is a large-sample test. If dealing
    with small samples, look up appropriate tests. (A
    condition of the test no expected frequency
    lower than 5 in each cell)
  • The larger the sample size, the easier it is for
    Chi-Squared to be significant.
  • 2 x 2 table Chi-Square gives same result as
    Independent Samples t-test for proportion and
    ANOVA.

31
Cross Tabs and Chi-Squared
  • If you want to know how you depart from
    independence, you may
  • Check percentages (conditional distributions) in
    your cross classification table.
  • Do a residual analysis
  • The difference between observed and expected
    counts in a cell behaves like a significance test
    when divided by a standard error for the
    difference.
  • That s.e. ?fe(1-cells row ?)(1 cells
    column ?)
  • fo fe
  • Z s.e.

32
Cross Tabs and Chi-Squared
  • Residual Analysis
  • Lets do cell 5! s.e. ?fe(1-cells row ?)(1
    cells column ?)
  • fo fe 5 row ? 200/650
    .308, column ? 265/650 .408
  • Z s.e. s.e.
    ?81.5 (.692) (.592) 5.78
  • Z 65 81.5 / 5.78 -2.85 2.85 gt 1.96, there
    is a significant difference in cell 5
  • Data on male and female support for SJSU football
    from 650 students
  • Yes No Maybe Total
  • Female 185 200 65 450
  • Male 80 65 55 200
  • Total 265 265 120 650
  • fe1 (450/650) 265 183.5 fe4 (200/650)
    265 81.5
  • fe2 (450/650) 265 183.5 fe5 (200/650)
    265 81.5
  • fe3 (450/650) 120 83.1 fe6 (200/650)
    120 36.9
  • Deviations
  • Dc fo fe
  • D1 185 183.5 1.5 D4 80 81.5
    -1.5

33
Cross Tabs and Chi-Squared
  • Further topics you could explore
  • Strength of Association
  • Discussing outcomes in terms of difference of
    proportions
  • Reporting Odds Ratios (likelihood of a group
    giving one answer versus other answers or the
    group giving an answer relative to other groups
    giving that answer)
  • Strength and Direction of Association for Ordinal
    Variables
  • Gamma (an inferential statistic, so check for
    significance)
  • Ranges from -1 to 1
  • Valence indicates direction of relationship
  • Magnitude indicates strength of relationship
  • Chi-squared and Gamma can disagree when there is
    a nonrandom pattern that has no direction.
    Chi-squared will catch it, gamma wont.
  • Kendalls tau-b
  • Somers d
Write a Comment
User Comments (0)
About PowerShow.com