CHI SQUARE (?2) - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

CHI SQUARE (?2)

Description:

CHI SQUARE ( 2) Dangerous Curves Ahead! – PowerPoint PPT presentation

Number of Views:112
Avg rating:3.0/5.0
Slides: 27
Provided by: LisaM174
Category:

less

Transcript and Presenter's Notes

Title: CHI SQUARE (?2)


1
CHI SQUARE (?2)
Dangerous Curves Ahead!
2
Why Chi ? (?2)
  • We want to compare two variables, but
  • Not all variables are interval-level, so we
    cannot use regression.
  • Hypothesis Tests for Difference of Means and
    Difference of Proportions only allow us to
    compare two groups with one value.
  • We need something else. . .

3
Imagine a a bag that contained 90 white marbles
and 10 black marbles. If you drew 10 marbles, how
many would you expect to come up white, and how
many black? We expect 9 white marbles and 1
black. But there is some probability that we
will get 8/2 and some probability we will get 7/3

4
What do we do?
  • We can compare what we would expect by chance to
    what we actually observed.
  • We can make a probabilistic statement about the
    chances of observing what we did based on our
    expectations.
  • Finally, we test the hypothesis that there is no
    real difference between what we observed and what
    we expected (using the 6 steps of hypothesis
    testing.

Expected Observed
White 9 ???
Black 1 ???
5
Basic Assumption of the Null Hypothesis
  • There is no difference in the population, the
    difference you observe is just the chance
    variation of your sample.
  • Expected score Observed score 0 SE
  • We are comparing observed values (frequency
    actually observed in our sample, written fo) to
    some set of expected by chance frequencies
    (written fe).

6
Chi Square (?2)
  • The test statistic for testing hypothesis
    comparing 2 or more nominal categories
  • The Chi Square Statistic compares nominal values
    in a cross-tabulation table, making what are
    called row by column comparisons or r x c
    tables.

7
A Nominal variable
  • is a categorical variable with mutually
    exclusive categories. For example gender where
    male 1 and female 2.

8
Approval for President Obama by Race
BLACKS WHITES
APPROVE 69 156
DISAPPROVE 21 144
9
The formula for c2 is OR, sometimes
written Where fo is the observed frequency of
each category in each cell of a table.
10
O or fo is what we observe from our sample, the
observed frequency. NOTE that c2 works with
frequencies in each cell. E or fe is the
expected frequency, the number of people who
would show up in each cell IF the null hypothesis
were true, if there was no racial difference in
approval, if the frequencies were due solely to
chance.
11
For each cell in the table we are to compare what
we observe to what we should expect by chance
  • Subtract the value of the hypothetical expectancy
    (fe) from the observed frequency (fo) for each
    cell.
  • Square each of these deviations.
  • Divide each of the squared differences by the
    expected value of each cell.
  • Finally, take the sum of the squared fo- f e
    differences to get ?2 .

12
The Chi Square statistic tests
  • Whether the difference between what you observe
    and what chance would predict is due to sampling
    error.
  • The greater the deviation of what we observe to
    what we would expect by chance, the greater the
    probability that the difference is NOT due to
    chance.

13
DIFFERENCE BETWEEN EXPENSIVE AND CHEEP BEER
  • Consumer Reports routinely finds that many people
    who claim they can taste the difference cant
    they are influenced by the label.
  • How would you test the idea that people cannot
    really tell the difference, and that they are
    really responding to the price label information.
    How do we disentangle the label effect from taste?

14
What is the null? gt No difference We expect
beer 1 rootbeer 2 rootbeer 3Study Design
Sample 150 rootbeer drinkers. Place before them 3
bottles, one labeled with name of well-known
high-priced rootbeer, another a medium-priced
rootbeer, and the third a low priced rootbeer.
Bottles counter balanced to control for order
effects. All 150 Subjects taste each rootbeer
and state preference.
15
The Full Table
High Priced RootBeer Medium Priced RootBeer Low Priced RootBeer
Observed fo 77 41 32
Expected fe 50 50 50
16
Step 1. HypothesisNull the proportions
preferring each rootbeer should be equal IF
indeed the rootbeers are equal and if preferences
are not influenced by the label. Here, chance
would predict 50 people in each group if label
did not matter. The ratios of O to E values
should be the same across all 3 comparisons if
label does not matter. The O E ratios in each
column should be the same. Our alternative
hypothesis is that preferences will follow the
status of rootbeer 1 gt rootbeer 2 gt rootbeer 3.
17
Step 2. The Distribution. Since we are
interested in the effect of one nominal variable
on another nominal variable the c2 distribution
is appropriate -- we are doing a row by column r
c analysis. Step 3. Level of
Significance Set alpha at .05 for 95
confidence.
18
Step 4. Determine Critical Value of c2 The chi
square distribution changes shape by degrees of
freedom, just as does the t distribution. Degrees
of freedom change as a function of the number of
comparisons made.
19
Formula for degrees of freedom of c2df (r -
1) x (c - 1)where r number of rows c number
of columnsWe have a 3 by 2 table, so df (3 -
1) x (2 - 1) 2. (Also when doing a One-way
Chi-square just subtract k-1 categories.) Step
5. Decision Let's fill in the table

20
RootBeer Hi Priced Med Priced Lo Priced
Observed 77 41 32
Expected 50 50 50
O-E 27 -9 -18
(O-E)2 729 81 324
(O-E)2 / E 14.58 1.62 6.48
c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68 c2 S(O-E)2 / E 14.58 1.62 6.48 22.68
21
Look up our p-value of c2 22.68 in Chi Square
table at 2 df. Find that the 22.68 is even
beyond .01 significance. The probability is plt
.0005, that is, less that 5 chances in 10,000
would produce a difference this big just by
chance. Or better, less than 5 samples 10,000 of
the same size would produce a difference this
big.
22
Step 6. Interpret The Chi Square value of
22.68 is beyond the critical value of
5.991.Therefore reject the null hypothesis of
equality. People do respond to price label
information.
23
Summing up the properties of the c2 Distribution
  • c2 distribution ranges from zero to some positive
    value, i.e., no difference to some big
    difference.
  • c2 distribution is not symmetrical, but skewed to
    the right, from zero to a large positive c2. Chi
    square looks at differences from zero. Its value
    depends on the number of comparisons made, that
    is, the number of df. Note that the critical
    value of chi square gets bigger as the df get
    bigger, just because the more comparisons made
    the more likely you are to find differences, so
    df corrects for this.
  • There are many different c2 distributions. Like
    the t distribution, c2 varies with degrees of
    freedom.

24
Another Example
  • Levels of political activism by ideology
  • Are conservative college students more likely to
    participate in activism on campus?
  • If this is true, we should see a disproportionate
    number of conservative student activists. If
    not, the distribution of activists by ideology
    should be random.

25
Student Activists
Observed Expected
Conservative 33 20
Liberal 7 20
Total 40 40
Null hypothesis Alternative hypothesis
26
  • Critical Value of c2 at a.05 and 1df c2 3.84
  • Observed c2 (33-20)2 / 20 (7 20) 2 / 20
    8.45 8.45 16.9
  • The observed value of c2 exceeds the critical
    value c2 (16.9gt3.84).
  • Therefore reject the null-hypothesis.
Write a Comment
User Comments (0)
About PowerShow.com