ChiSquare Tests - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

ChiSquare Tests

Description:

Expected Pepsi Drinkers = expected Y = 48*.12 = 5.7 ... Restricting the population of interest to Coke, Pepsi, and their diet counterparts. ... – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 45
Provided by: MarkB9
Category:
Tags: chisquare | pepsi | tests

less

Transcript and Presenter's Notes

Title: ChiSquare Tests


1
Chapter 20
  • Chi-Square Tests

2
What is the chi-square test?
  • The chi-square test is like the binomial test
    except that events do not have to be dichotomous.
  • However, they do have to break down into distinct
    categories.
  • For example, possible events might be A, B, C or
  • A, B, C, D or
  • Any finite number of distinct outcomes.

3
Example
  • Recall that, in the chapter on the binomial
    distribution, we wanted to analyze dichotomous
    events.
  • However, many times there are more than two
    outcomes.
  • Suppose you are planning a large party and you
    want to know what soft drinks to get.
  • You scour the web for statistics on soft drink
    consumption and find the following data

4
Example
  • The data only covers up to 2003 but you figure
    things havent changed that much since then.
  • You decide to use the old data.
  • You have another problem.
  • There are a large number of soda products and you
    cant buy them all.

5
Example
  • So you decide to buy the appropriate amounts of
    the top two.
  • Everyone else, you serve beer.
  • Your party purchases break down as follows

6
Example
  • 48 people are coming to your party.
  • You figure out how many guests are in each
    category as follows
  • Expected Coke Drinkers expected X 48.18
    8.64
  • Expected Pepsi Drinkers expected Y 48.12
    5.7
  • Expected To Drink Beer expected Z 48.70
    33.6

7
Example
  • The day of your party finally comes and you get
    the requests for beverages as shown in the table
    below.
  • Being obsessed with statistics, you wonder if the
    market share for various soft drinks have changed
    since 2003 or if the differences between
    predicted and actual are simply due to sampling
    error.

8
A test statistic for the multinomial case
  • We need a test statistic that can handle any
    number of events.
  • But first some notation
  • ?2 is our test statistic
  • fei will be the expected number of events of type
    i.
  • foi will be the observed number of events of type
    i.

9
?2 in our example
  • .

10
?2 as an extension of the binomial test
  • Is the ?2 test something completely new?
  • No, ?2 is an extension of the binomial test.
  • To show this, we will prove that
  • in the case where there are just 2 possible
    events.

11
?2 as an extension of the binomial test
  • Show

12
?2 as an extension of the binomial test
  • So, in the case of 2 events, ?2 is just a
    different way of writing z2.
  • But the new form has the advantage of being
    generalizable.
  • Furthermore, we can now see the advantage of
    using the squared form.
  • If we didnt square each term, the positives and
    negatives might cancel each other out.

13
Correcting??2 for bias
  • Like the binomial z, ?2 must sometimes be
    corrected for bias.
  • Use the corrected formula when there are 2 events
    and Nlt100.

14
What distribution does the ?2 statistic follow?
  • It follows the ?2 distribution of course.
  • But what does it look like?
  • Assuming the null hypothesis, it looks like this

15
Back to our example
  • We were wondering if the mismatch between
    beverages and our guests requests was due to
    outdated market share statistics.
  • Let us now answer that question.

16
Back to our example
  • But what is ?2crit?
  • It is the ?2crit bounding .05 area of the ?2
    distribution.
  • Degrees of freedom are the number of possible
    events - 1.
  • Use table A14.

17
Example
  • So, what can we conclude about our party
    planning?

18
No preference hypothesis testing
  • Now suppose we want to do an analysis like we did
    in the binomial chapter.
  • There, executives were trying to find out if
    their pilot study gave significant results.
  • Suppose that, since the pilot was inconclusive,
    they decided to run a more comprehensive study.

19
No preference hypothesis testing
  • Since the top 4 soft drinks have the greatest
    impact on their profits, they decide to focus on
    those.
  • Restricting the population of interest to Coke,
    Pepsi, and their diet counterparts.
  • They run a taste test on 100 tasters and find the
    following breakdown of events.

20
No preference hypothesis testing
  • The number of predicted tasters are all the same
    because we are testing a null hypothesis that
    there is no difference in taste.
  • Otherwise we use the same procedure.

21
No preference hypothesis testing
  • Calculate ?2
  • df 4 - 1 3
  • ?2crit 7.81 from table A14

22
Exercises
  • Page 641
  • 1, 3, 4, 6, 7
  • Roll a die 30 times and test to see if it is
    loaded.

23
SPSS for Chi-squared
  • Using the grades.sav data.
  • Does the ethnicity of our class reflect the
    ethnicity of our (imaginary) University?
  • Analyze-gtNonparametric Tests-gtChi-Square
  • Add ethnic to the box.
  • Set Expected Frequencies to 20, 20, 20, 20, 25 as
    the ethnic makeup of Imaginary U.
  • OK
  • In the output, notice that ?2 is 51.14 and p
    .000.

24
SPSS Exercises
  • Using the grades.sav data
  • Are the section sizes significantly different
    from one another?
  • Are the freshman, sophmore, junior, and senior
    classes equally represented?
  • The US census says that the US population breaks
    down as follows (2004 data)
  • 236,057,761 Whites
  • 37,502,320 Blacks
  • 2,824,751 Native Americans
  • 12,326,216 Asians
  • 41,322,070 Hispanics
  • Is the class in grades.sav representative of the
    US population?

25
Two variable contingency tables
  • Obviously, not all of the complexity of life can
    be captured by observing one type of event, even
    if it has many possible outcomes.
  • Often there are multiple types of events.
  • And, sometimes those types of events are related
    in interesting ways.

26
Two variable contingency tableExample
  • A study by Krupinski et al (1998) investigated
    factors for suicides of inpatients with
    depressive psychoses.
  • They considered a number of factors that might be
    related to suicide
  • Gender
  • Age
  • Marital status
  • Whether they had children
  • Number of siblings
  • Time of last inpatient treatment
  • Previous suicide attempt
  • Abuse or addiction to alcohol or drugs
  • Stress
  • Voluntary treatment
  • Diagnoses

27
Two variable contingency tableExample
  • We will look at their contingency table for
    suicide vs. age group.

28
Two variable contingency tableExample
  • Our null hypothesis will be that the differences
    in the number of patients in each cell are due to
    sampling inhomogeneities.
  • Sampling inhomogeneities are not the same as
    sampling error.
  • We just happen to have more of some types than
    others available.

29
Two variable contingency tableExample
  • For example, we have more people not committing
    suicide than committing suicide in our available
    sample.
  • We also have more people gt 60 than any other age
    group.
  • These 2 facts together mean that we should expect
    a lot of people in the (gt60, No-Suicide) cell.
  • This is true, independent of the relationship
    between suicide and age.

30
Two variable contingency tableExample
  • Likewise we can find cells with small counts.

31
Two variable contingency tableExample
  • In general, the expected count in a cell depends
    on the row and column sums as follows
  • Where N is the total count over all cells.

32
Two variable contingency tableExample
  • So for example the expected count in the (gt60,
    No-Suicide) cell is

33
Two variable contingency tableExample
  • We can fill in (in parentheses) the expected
    count for all cells using this formula
  • Are we going to get statistical significance?

34
Two variable contingency tableExample
  • Our null hypothesis suicide and age are
    independent.
  • Our alternative hypothesis__________________

35
Two variable contingency tableExample
  • Calculating ?2
  • This is exactly as before except we will now sum
    over all cells.

36
Two variable contingency tableExample
  • Now we need an ?2crit
  • For which we need a df.
  • df (row-1)(columns-1) 51 5
  • ?2crit 11.7 (?.05) from table A14
  • p . 08

37
Two variable contingency tableExample
  • So what would you do if you were a hospital
    administrator?
  • ?2crit 9.24 (?.10) from table A14

38
Other predictors of suicide
  • .

39
Exercises
  • Page 650
  • 1, 2 a c, 3, 4

40
Cramers Phi
  • What measure do we have of the strength of
    association between 2 variables?

41
Cramers Phi
  • We could compute the correlation between the 2
    variables in our contingency table.
  • There is a shortcut to accomplish this based on
    ?2, which weve already computed.
  • Where N is the total sample size and L is the
    number of rows or columns, whichever is least.

42
Cramers Phi
  • Lets see if our suicide prediction study has a
    large association.
  • As correlations go, this is very small.

43
SPSS for contingency tables
  • Using grades.sav
  • Lets see if the ethnicity and sex are associated
    in our class.
  • In other words, can we predict the gender of a
    classmate from their ethnicity?
  • Analyze -gt Descriptive Statistics -gt Crosstabs.
  • Move sex into the upper box.
  • Move ethnic into the middle box.
  • In Cells
  • Select Observed and Expected.
  • In Statistics choose Phi and Cramers V.
  • OK.
  • In the output note Pearson Chi-Square, its
    Significance (p), and Cramers V which is
    Cramers Phi.

44
SPSS Exercises
  • Using divorce.sav
  • Test the following pairs of variables for
    significant association and for strength of
    association.
  • Current family income vs current marital status
  • Current family income vs employment status
  • Current marital status vs employment status
Write a Comment
User Comments (0)
About PowerShow.com