Chi-Square Tests - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Chi-Square Tests

Description:

Week 10 Objectives On completion of this module you should be able to: perform and interpret a 2 test for the difference between two or more proportions perform and ... – PowerPoint PPT presentation

Number of Views:139
Avg rating:3.0/5.0
Slides: 51
Provided by: Lind51
Category:
Tags: chi | square | test | tests

less

Transcript and Presenter's Notes

Title: Chi-Square Tests


1
Chi-Square Tests
  • Week 10

2
Objectives
  • On completion of this module you should be able
    to
  • perform and interpret a ?2 test for the
    difference between two or more proportions
  • perform and interpret a ?2 test of independence
    and
  • perform and interpret a ?2 goodness of fit test.

3
?2 test for the difference between two proportions
  • This week we will compare two (or more)
    proportions using a frequency of success
    approach.
  • Note that we do not cover the Z test for
    differences between two proportions in this
    course (feel free to look at Section 9.3 of the
    text if you are interested).
  • We will make use of a contingency table (a
    cross-classification table).
  • This is best explained by an example.

4
Example 10-1
  • A sample of 400 ex-students was taken and the
    students were asked Did you enjoy your
    university experience?
  • The results are given in the table below.
  • Is there evidence of a significant difference
    between the proportions of males and females who
    enjoyed their university experience?
  • (Use the 0.05 level of significance).

5
Example 10-1
Gender Gender
Enjoyed university experience? Male Female Total
Yes 214 107 321
No 66 13 79
Total 280 120 400
6
Example 10-1
  • We have used similar tables in our module on
    probability (Week 4). This is a 2 ? 2 table.
  • The two variables (each with two outcomes) are
    gender (male or female) and enjoyed university
    experience (yes or no).
  • We will conduct a hypothesis test (using a
    similar procedure as last week) to test whether
    there is a significant difference between the
    proportion of males and females who enjoyed their
    university experience.

7
Solution 10-1
  • Firstly we establish our hypotheses
  • H0 p1 p2
  • H1 p1 ? p2
  • We are testing that there is no difference
    between the two populations using sample data to
    make the conclusion.

8
Finding the rejection region
  • The test statistic, which we will calculate
    shortly, approximately follows a chi-square
    distribution with one degree of freedom.
  • We will reject the null hypothesis, if the test
    statistic is greater than the
    upper-tail critical value from the chi-square
    distribution with one degree of freedom.
  • Chi-square tests are only ever one-tailed since
    we are only interested in (large) differences
    between proportions.

9
Finding the rejection region
  • The rejection rule is
  • Reject H0 if otherwise do not
    reject H0.
  • The value of is found using Table E.4 in the
    text based on the number of degrees of freedom
    and the confidence level.

10
Solution 10-1
  • We are told ?0.05 and know that for a 2 ? 2
    contingency table there is one degree of freedom.
  • Using Table E.4 we find the critical value is
  • The rejection rule is
  • Reject H0 if ?2 gt 3.841 otherwise do not reject
    H0.

11
The test statistic
  • The test statistic is given by
  • where fo is the observed frequency (taken from
    the table) and fe is the expected (theoretical)
    frequency if the null hypothesis is true.

12
Computing the expected frequencies
  • The average proportion (of success) is given by

Column variable Column variable
Row variable 1 2 Total
Successes X1 X2 X
Failures n1 - X1 n2 X2 n - X
Total n1 n2 n
  • Note if the column variable corresponds to
    success/failure, the average proportion will be
    calculated using values from the success column
    rather than row as demonstrated here.

13
Example 10-1
Gender Gender
Enjoyed university experience? Male Female Total
Yes 214 107 321
No 66 13 79
Total 280 120 400
14
Computing the expected frequencies
  • To obtain the expected frequencies for the
    success cells the sample size (column total)
    for each group is multiplied by the average
    proportion.

fo fe
214 0.8025 ? 280 224.7
107 0.8025 ? 120 96.3
66
13
15
Computing the expected frequencies
  • To obtain the expected frequencies for the
    failure cells the sample size (column total)
    for each group is multiplied by one minus the
    average proportion.

fo fe
214 224.7
107 96.3
66 (1 - 0.8025) ? 280 55.3
13 (1 - 0.8025) ? 120 23.7
16
Solution 10-1
  • Now we can calculate the test statistic (8.5995)
    via a table as follows

fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
214 224.7 214 - 224.7 -10.7 114.49 114.49 / 224.7 0.5095
107 96.3 107 96.3 10.7 114.49 114.49 / 96.3 1.1889
66 55.3 10.7 114.49 2.0703
13 23.7 -10.7 114.49 4.8308
8.5995
17
Solution 10-1
  • We compare our test statistic to the critical
    region.
  • Since 8.5995 gt 3.841, we reject the null
    hypothesis.
  • We conclude that there is sufficient evidence to
    believe that there is a difference between the
    proportion of males and females who enjoyed their
    university experience.

18
Assumptions
  • Whenever we use a ?2 test we assume that each
    expected frequency is at least 5.
  • With larger tables than the 2 ? 2, (for example
    comparing more than two proportions) some
    statisticians require that expected cells be at
    least 1.
  • Often cells are combined to meet these
    requirements.

19
?2 test for the difference in more than two
proportions
  • Hypotheses
  • H0 p1 p2 pc
  • H1 Not all pj are equal
  • Be careful with setting up H1. If only one
    proportion is different from the others we want
    to reject the null hypothesis.
  • Average proportion

20
?2 test for the difference in more than two
proportions
  • Degrees of freedom
  • where r is the number of rows and c is the
    number of columns in the contingency table.
  • For the 2 2 table we have
  • degree of freedom (as we used earlier).
  • For a 2 c contingency table we have
  • degree of freedom.

21
Example 10-2
  • A city is supported by three major IT companies.
  • There have been rumours of price-fixing and
    collusion between the companies.
  • In order to investigate these accusations, a
    consumer watchdog organisation conducted a survey
    of 500 consumers of these IT services.
  • The results of this survey are summarised in the
    table below.

22
Example 10-2
  • At the ? 0.05 level of significance, determine
    whether there is evidence of a significant
    difference between the consumer satisfaction of
    the three IT companies.

IT Company IT Company IT Company
Satisfied with service BMI Unitses Pear Total
Yes 115 99 108 322
No 55 102 21 178
Total 170 201 129 500
23
Solution 10-2
  • The hypotheses are
  • H0 p1 p2 p3
  • H1 Not all pj are equal
  • For a 2 3 contingency table there are
  • degrees of freedom.
  • Given ? 0.05, the critical value is

24
Solution 10-2
  • The decision rule is
  • Reject H0 if ?2 gt 5.991, otherwise do not
    reject H0.
  • The average proportion (of successes) is
  • We will use a table to calculate the test
    statistic.

25
Solution 10-2
fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
115 5.52 30.4704 0.2783
99 -30.444 926.837136 7.1601
108 24.924 621.205776 7.4776
55 -5.52 30.4704 0.5035
102 30.444 926.837136 12.9526
21 -24.924 621.205776 13.5268
41.8989
26
Solution 10-2
  • Note that all expected frequencies were greater
    than five, so the chi-square test is appropriate.
  • Since 41.8989 gt 5.991, we reject the null
    hypothesis.
  • We can conclude that there is sufficient evidence
    to believe that there is a difference in the
    proportion of satisfied clients for the three IT
    companies.

27
?2 test of independence
  • Hypotheses
  • H0 the two categorical variables are
    independent (i.e. there is no relationship
    between them).
  • H1 the two categorical variables are dependent
    (i.e. there is a relationship between them).
  • Computing the expected frequencies

28
Example 10-3
  • A group of researchers is interested in
    determining whether students who enrol in a
    university degree straight from school perform
    better than those who take a year off before
    beginning university (sometimes call a gap
    year).
  • The following information was gathered from a
    sample of 400 students.

29
Example 10-2
Enrolment group Enrolment group
Lowest grade received in first year of study School leaver Gap year Total
HD 27 13 40
D 42 18 60
C 85 35 120
P 121 19 140
F 25 15 40
Total 300 100 400
30
Example 10-3
  • At the 0.01 level of significance, determine
    whether there is evidence of a significant
    relationship between the lowest grade a student
    receives in their first year of study and whether
    they have come directly from school or had a gap
    year.
  • Interpret your result.

31
Solution 10-3
  • The hypotheses are
  • H0 there is no relationship between lowest
    grade received in first year of study and
    enrolment group.
  • H1 there is a relationship between lowest
    grade received in first year of study and
    enrolment group.
  • There are (r - 1)(c - 1) (5 - 1)(2 - 1) 4
    degrees of freedom.

32
Solution 10-3
  • The critical value at a significance level of
    0.01 is
  • The rejection rule is
  • Reject H0 if ?2 gt 13.277, otherwise do not
    reject H0.
  • Calculate the test statistic via a table as
    follows (checking that all expected frequencies
    are at least 5).

33
Solution 10-3
fo fe fo - fe (fo - fe)2 (fo - fe)2 / fe
27 -3 9 0.3
13 3 9 0.9
42 -3 9 0.2
18 3 9 0.6
? ? ? ? ?
16.1968
34
Solution 10-3
  • Since 16.1968 gt 13.277 we reject the null
    hypothesis.
  • We conclude that there is sufficient evidence to
    indicate that there is a relationship between
    lowest grade received in first year of study and
    enrolment group.
  • To interpret a result such of this, it can be
    helpful to view the observed and expected cell
    values together.

35
Lowest grade received in first year of study Lowest grade received in first year of study Enrolment group Enrolment group
Lowest grade received in first year of study Lowest grade received in first year of study School leaver Gap year Total
HD Obs 27 13 40
(Exp) (30) (10)
D Obs 42 18 60
(Exp) (45) (15)
C Obs 85 35 120
(Exp) (90) (30)
P Obs 121 19 140
(Exp) (105) (35)
F Obs 25 15 40
(Exp) (30) (10)
Total Total 300 100 400
36
Solution 10-3
  • We now look over this table for any unusual
    differences between observed and expected values.
  • Some comments
  • School leavers seem to have more P grades than
    expected and those who enrol after a gap year
    have less than expected.
  • School leavers have slightly lower than expected
    numbers of HD, D, C and F grades whilst gap year
    students appear slightly more than expected in
    these same grades.

37
Solution 10-3
  • This is a mixed result and so it is difficult to
    determine whether one group is doing better than
    the other.
  • Perhaps lowest grade in the first year is not a
    good measure of a students overall performance
    and so the experiment is flawed.
  • What would be a better way to measure
    performance? GPA? A students sense of
    satisfaction with their performance? Or what?

38
?2 goodness of fit tests
  • Often we want to know whether some data that we
    have can be described by a certain distribution.
  • We can do this in part by testing the assumptions
    of the distribution (eg normal is symmetrical,
    bell-shaped, reveals a straight line in a normal
    probability plot etc).
  • We can test how close observed frequencies are to
    frequencies that would be theoretically expected
    (given that the data followed a particular
    distribution) using a chi-square test.

39
?2 goodness of fit tests
  • The following steps are required
  • decide what distribution is believed to be
    appropriate for the data set.
  • estimate parameters of the distribution (the
    mean, the proportion etc).
  • determine the expected values for each category
    if the data followed the distribution.
  • use the chi-square test to determine whether
    there is a significant difference between actual
    and expected values.

40
?2 goodness of fit tests
  • We will demonstrate the method for determining
    goodness of fit for a Poisson distribution here.
  • Section 11.7 from the text (on CD) demonstrates
    how to test goodness of fit for the normal
    distribution also, so ensure you are familiar and
    confident with this also.

41
Example 10-4
  • The number of people arriving at a particular
    Automatic Teller Machine (ATM) per minute is
    recorded during business hours for a 35 hour week
    (35 hours 60 minutes 2100 minutes).
  • The following table summarises the results.
  • Does the distribution of people arriving at the
    ATM follow a Poisson distribution?
  • Test at the 0.05 level of significance.

42
Example 10-4
Number of people arriving at ATM Frequency
0 120
1 180
2 270
3 390
4 310
5 330
6 250
7 150
8 80
9 15
10 5
Total 2100
43
Solution 10-4
  • The hypotheses are
  • H0 the number of people arriving at the ATM
    per minute follows a Poisson distribution.
  • H1 the number of people arriving at the ATM
    per minute does not follow a Poisson
    distribution.
  • The (one) parameter of the Poisson distribution
    is the mean (which we must estimate).

44
Solution 10-4
  • This is frequency data so we must use that
    formula for the mean
  • Since the tables in the text give the Poisson
    distribution parameter, ? (the mean), to one
    decimal place, we will use ? 3.9.

45
Number of people arriving at ATM Actual frequency fo P(X) for Poisson distribution with ? 3.9 Theoretical frequency fe nP(X)
0 120 0.0202 2100 0.0202 42.42
1 180 0.0789 165.69
2 270 0.1539 323.19
3 390 0.2001 420.21
4 310 0.1951 409.71
5 330 0.1522 319.62
6 250 0.0989 207.69
7 150 0.0551 115.71
8 80 0.0269 56.49
9 15 0.0116 24.36
10 5 0.0045 9.45
11 or more 0 0.0023 4.83
46
Solution 10-4
  • Note that not all expected frequencies are
    greater than 5, but the smallest (4.83) is close
    and much greater than 1.
  • The degrees of freedom is given by k - p 1
    where k is the number of categories and p is the
    number of parameters that were estimated.
  • Here k 12 and p 1 and so there are k - p
    1 12 1 1 10 degrees of freedom.

47
Solution 10-4
  • At the 0.05 level of significance, the critical
    value is
  • The rejection rule is
  • Reject H0 if ?2 gt 18.307, otherwise do not
    reject H0.

48
Number of people arriving at ATM
0 120 42.42 77.58 6018.6564 141.88252
1 180 165.69 14.31 204.7761 1.23590
2 270 323.19 -53.19 2829.1761 8.75391
3 390 420.21 -30.21 912.6441 2.17188
4 310 409.71 -99.71 9942.0841 24.26615
5 330 319.62 10.38 107.7444 0.33710
6 250 207.69 42.31 1790.1361 8.61927
7 150 115.71 34.29 1175.8041 10.16165
8 80 56.49 23.51 552.7201 9.78439
9 15 24.36 -9.36 87.6096 3.59645
10 5 9.45 -4.45 19.8025 2.09550
11 or more 0 4.83 -4.83 23.3289 4.83000
217.73472
49
Solution 10-4
  • Since 217.73472 gt 18.307, we reject H0.
  • We conclude that the number of people arriving
    per minute at this particular ATM does not follow
    a Poisson distribution.

50
After the lecture each week
  • Review the lecture material
  • Complete all readings
  • Complete all of recommended problems (listed in
    SG) from the textbook
  • Complete at least some of additional problems
  • Consider (briefly) the discussion points prior to
    tutorials
Write a Comment
User Comments (0)
About PowerShow.com