Categorical Data and Chi Square - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Categorical Data and Chi Square

Description:

Categorical Data and Chi Square. Chapter 6 Categorical Data and Chi ... There is one hitch to using the chi-square distribution when testing hypotheses ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 39
Provided by: johnba4
Category:

less

Transcript and Presenter's Notes

Title: Categorical Data and Chi Square


1
Chapter 6
  • Categorical Data and Chi Square

2
A Quick Look Back
  • Reminder about hypothesis testing
  • 1) Assume what you believe (H1) is wrong.
  • Construct H0 and accept it as a default.

3
Z-Test
  • Use when we have acquired some data set, then
    want to ask questions concerning the probability
    of certain specific data values (e.g., do certain
    values seem extreme?).
  • In this case, the distribution associated with H0
    is described by X and S2 because the data points
    reflect a continuous variable that is normally
    distributed.

4
Chi-Square (?2) Test
  • The Chi-square test is a general purpose test for
    use with discreet variables.
  • It has a number of uses, including the detection
    of bizarre outcomes given some a priori
    probability for binomial situation, and for
    multinomial situations.

5
Chi-Square (?2) Test
  • In addition, it allows us to go beyond questions
    of bizarreness, and move into the question of
    whether pairs of variables are related. For
    example
  • It does so by mapping the discreet variables unto
    a continuous distribution assuming H0, the
    chi-square distribution.

6
The Chi-Square Distribution
  • Lets reconsider a simple binomial problem. Say,
    we have a batter who hits .300 i.e.,
    P(Hit)0.30, and we want to know whether is is
    abnormal for him to go 6 for 10 (i.e., 6 hits in
    10 at bats).
  • We could do this using the binomial stuff that I
    did not cover in Chapter 5 (and for which you are
    not responsible)
  • But we can also do it with a chi-square test

7
The Chi-Square Way
  • We can put our values into a contingency table as
    follows
  • Then consider the distribution of the following
    formula given H0

8
The Chi-Square Distribution
  • In-Class Example

9
The Chi-Square Distribution
  • In-Class Example
  • Note that while the observed values are discreet,
    the derived score is continuous.
  • If we calculated enough of these derived scores,
    we could plot a frequency distribution which
    would be a chi-square distribution with 1 degree
    of freedom or ?2(1).
  • Given this distribution and appropriate tables,
    we can then find the probability associated with
    any particular ?2 value.

10
The Chi-Square Distribution
  • Continuing the Baseball Example
  • So if the probability of obtainning a
    ?2 of 4.29 or greater is less than
  • ?, then the observed outcome can be
    considered bizarre (i.e., the result of
    something other than a .300 hitter
    getting lucky).

11
The ?2 Table and Degrees of Freedom
  • There is one hitch to using the chi-square
    distribution when testing hypotheses the
    chi-square distribution is different for
    different numbers of degrees of freedom (df).
  • This means that in order to provide the areas
    associated with all values of for some number of
    df, we would need a complete table like the
    z-table for each level of df.

12
The ?2 Table and Degrees of Freedom
  • Instead of doing that, the table only shows
    critical values as Steve will now illustrate
    using the funky new overhead thingy.
  • Our example question has 1 df. Assuming we are
    using an level of .05, the critical ?2 value for
    rejecting the null is 3.84.
  • Thus, since our obtained ?2 value of 4.29 is
    greater than 3.84, we can reject H0 and assume
    that hitting 6 of 10 reflects more than just
    chance performance.

13
The ?2 Table and Degrees of Freedom
  • Going a Step Further
  • Suppose we complicate the previous example by
    taking walks and hit by pitches into account.
    That is, suppose the average batter gets a hit
    with a probability of 0.28, gets walked with a
    probability of .08, gets hit by a pitch (HBP)
    with a probability of .02, and gets out the rest
    of the time.

14
The ?2 Table and Degrees of Freedom
  • Now we ask, can you reject H0 (that this batter
    is typical of the average batter) given the
    following outcomes from 50 at bats?
  • 1) Calculate expected values (Np).
  • 2) Calculate ?2 obtained.
  • 3) Figure out the appropriate df (C-1).
  • 4) Find ?2critical and compare ?2 obtained to it.

15
Using Chi-Square to Test Independence
  • So far, all the tests have been to assess whether
    some observation or set of observations seems
    out-of-line with some expected distribution.
  • However, the logic of the chi-square test can be
    extended to examine the issue of whether two
    variables are independent (i.e., not
    systematically related) or dependent (i.e.,
    systematically related).

16
Using Chi-Square to Test Independence
  • Consider the following data set again
  • Are the variables of gender and opinion
    concerning the legalization of marijuana
    independent?

17
Using Chi-Square to Test Independence
18
Using Chi-Square to Test Independence
  • If these two variables are independent, then by
    the multiplicative law, we expect that

19
Using Chi-Square to Test Independence
  • If we do this for all four cells, we get

20
Using Chi-Square to Test Independence
  • Are the observed values different enough from the
    expected values to reject the notion that the
    differences are due to chance variation?

21
Degrees of Freedom for Two-Variable Contingency
Tables
  • The df associated with 2 variable contingency
    tables can be calculated using the formula
  • where C is the number of columns and R is the
    number of rows.
  • This gives the seemingly odd result that a 2x2
    table has 1 df, just like the simple binomial
    version of the chi-square test.

22
Degrees of Freedom for Two-Variable Contingency
Tables
  • However, as Steve will now show, this actually
    makes sense.
  • Thus, to finish our previous example, the ?2
    critical with alpha equal .05 and 1 df equals
    3.84. Since our is bigger than that (i.e.,
    6.04) we can reject H0 and conclude that opinions
    concerning the legalization of marijuana appear
    different across the males and females of our
    sample.

23
Assumptions of Chi-Square
  • Independence of observations
  • Chi-square analyses are only valid when the
    actual observations within the cells are
    independent.
  • This independence of observations is different
    from the issue of whether the variables are
    independent, that is what the chi-square is
    testing.

24
Assumptions of Chi-Square
  • Independence of observations
  • You know your observations are not independent
    when the grand total is larger than the number of
    subjects.
  • Example The activity level of 5 rats was tested
    over 4 days, producing these values

25
Assumptions of Chi-Square
  • Normality
  • Use of the chi-square distribution for finding
    critical values assumes that the expected values
    (i.e., Np) are normally distributed.
  • This assumption breaks down when the expected
    values are small (specifically, the distribution
    of Np becomes more and more positively skewed as
    Np gets small).

26
Assumptions of Chi-Square
  • Normality
  • Thus, one should be cautious using the chi-square
    test when the expected values are small.
  • How small? This is debatable but if expected
    values are as low as 5, you should be worried.

27
Assumptions of Chi-Square
  • Inclusion o f Non-Occurrences
  • The chi-square test assumes that all outcomes
    (occurrences and non-occurrences) are considered
    in the contingency table.
  • As an example of a failure to include a
    non-occurrence, see page 142 of the text.

28
A Tale of Tails
  • We only reject H0 when values of ?2 are larger
    than ?2 obtained.
  • This suggests that the ?2 test is always
    one-tailed and, in terms of the rejection region,
    it is.
  • In a different sense, however, the test is
    actually multiple tailed.

29
A Tale of Tails
  • Reconsider the following marking scheme
    example
  • If we do not specify how we expect the results to
    fall out then any outcome with a high enough ?2
    obtained can be used to reject H0.
  • However, if we specify our outcome, we are
    allowed to increase our alpha - in the example we
    can increase alpha to 0.30 if we specified the
    exact ordering (in advance) that was observed.

30
Measures of Association
  • The chi-square test only tells us whether two
    variables are independent, it does not say
    anything about the magnitude of the dependency if
    one is found to exist.
  • Stealing from the book, consider the following
    two cases, both of which produce a significant ?2
    obtained, but which imply different strengths of
    relation

31
Measures of Association
32
Cramers Phi (?c ) - A Measure of Association
  • There are a number of ways to quantify the
    strength of a relation (see sections in the text
    on the contingency coefficient, Phi, Odds
    Ratios), but the two most relevant to
    psychologists is Cramers Phi and Kappa.

33
Cramers Phi (?c ) - A Measure of Association
  • Cramers Phi can be used with any contingency
    table and is calculated as
  • Values of range from 0 to 1. The for the
    tables on the previous page are 0.12 and 0.60
    respectively, indicating a much stronger relation
    in the second example.

34
Kappa (k) - A Measure of Agreement
  • Often, in psychology, we will ask some judge to
    categorize things into specific categories.
  • For example, imagine a beer brewing competition
    where we asked a judge to categorize beers as
    Yucky, OK, or Yummy.
  • Obviously, we are eventually interested in
    knowing something about the beers after they are
    categorized.

35
Kappa (k) - A Measure of Agreement
  • However, one issue that arises is the judges
    abilities to tell the difference between the
    beers.
  • One way around this is to get two judges and show
    that a given beer is reliably rated across the
    judges (i.e., that both judges tend to categorize
    things in a similar way).

36
Kappa (k) - A Measure of Agreement
  • Such a finding would suggest that the judges are
    sensitive to some underlying quality of the beers
    as opposed to just guessing.

37
Kappa (k) - A Measure of Agreement
  • Note that if you just looked at the proportion of
    decisions that me and Judge 2 agreed on, it looks
    like we are doing OK

38
Kappa (k) - A Measure of Agreement
  • There is a problem here, however, because both
    judges are biased to judge a beer as OK such that
    even if they were guessing, the agreement would
    seem high because both would guess OK on a lot of
    trials and would therefore agree a lot.
Write a Comment
User Comments (0)
About PowerShow.com